Skip to main content
Science of Computer ProgrammingVolume 230, August 2023, Article number 102999

Towards a systematic approach to manual annotation of code smells(Article)(Open Access)

  Save all to author list
  • Department of Computing and Control Engineering, Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia

Abstract

Code smells are structures in code that may indicate maintainability issues. They are challenging to define, and software engineers detect them differently. Mitigation of this problem could be an AI code smell detector. However, to develop it, we need a standardized benchmark dataset. Existing datasets suffer from (1) annotation subjectivity, (2) lack of ground-truth consensus among annotators, and (3) reproducibility issues. This paper aims to develop a systematic manual code smell annotation procedure that addresses these issues. We tailored the prescriptive natural language processing annotation methodology to code smell detection: (1) we cross-validate annotations to mitigate subjectivity, (2) we develop clear annotation guidelines to reach the ground-truth consensus, and (3) we follow literature recommendations for reproducibility and open-source our tools and dataset. We extracted the annotation guidelines from existing empirical code smell research. The annotators refined the guidelines and their understanding of the task through proof-of-concept annotation encompassing retrospective discussion and disagreement resolution and then performed full annotation. We confirmed that the ground-truth consensus was reached by measuring annotation consistency. Our contributions are the proposed annotation procedure, a novel code smell dataset of open-source C# projects, the annotators' experience report, and the open-sourced supporting tool. © 2023 Elsevier B.V.

Author keywords

Code smellsManual annotationNovel datasetSoftware quality

Indexed keywords

Engineering controlled terms:Natural language processing systemsOdorsOpen source softwareOpen systems
Engineering uncontrolled termsBenchmark datasetsCode smellGround truthManual annotationManual codesNatural languagesNovel datasetOpen-sourceReproducibilitiesSoftware Quality
Engineering main heading:Computer software selection and evaluation

Funding details

Funding sponsor Funding number Acronym
451-03-47/2023-01/200156
Science Fund of the Republic of Serbia6521051
  • 1

    This research was supported by the Science Fund of the Republic of Serbia , Grant No. 6521051 , AI-Clean CaDET and the Ministry of Science, Technological Development and Innovation through project no. 451-03-47/2023-01/200156 “Innovative scientific and artistic research from the FTS (activity) domain.” Our funders had no involvement in the study design, collection, analysis, and interpretation of the data, writing of the report, or the decision to submit the article for publication.

  • ISSN: 01676423
  • CODEN: SCPGD
  • Source Type: Journal
  • Original language: English
  • DOI: 10.1016/j.scico.2023.102999
  • Document Type: Article
  • Publisher: Elsevier B.V.

  Slivka, J.; Department of Computing and Control Engineering, Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia;
© Copyright 2023 Elsevier B.V., All rights reserved.

Cited by 5 documents

Mao, F. , Zhong, K. , Cheng, L.
Bmco-o: a smart code smell detection method based on co-occurrences
(2025) Automated Software Engineering
Luburić, N. , Dorić, L. , Slivka, J.
An Intelligent Tutoring System to Support Code Maintainability Skill Development
(2025) IEEE Transactions on Learning Technologies
Prokić, S. , Luburić, N. , Slivka, J.
Prescriptive procedure for manual code smell annotation
(2024) Science of Computer Programming
View details of all 5 citations
{"topic":{"name":"Refactoring; Computer Software Selection and Evaluation; Open Source Software","id":5965,"uri":"Topic/5965","prominencePercentile":94.68596,"prominencePercentileString":"94.686","overallScholarlyOutput":0},"dig":"65815cfe2273b1e829ac57ab2449612e9347dc72695f726cef3ad8bfd73f0ac9"}

SciVal Topic Prominence

Topic:
Prominence percentile: