Skip to main content
Neural Computing and ApplicationsVolume 36, Issue 16, June 2024, Pages 9203-9220

Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#(Article)

  Save all to author list
  • Department of Computing and Control Engineering, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, Novi Sad, 21000, Serbia

Abstract

Code smells are poorly designed code structures indicating that the code may need to be refactored. Recognizing code smells in practice is complex, and researchers strive to develop automatic code smell detectors. An obstacle to developing these solutions is the datasets’ limitations. Manually labeled datasets were collected to investigate the developers’ perceptions of code smells. They are characterized by a high label disagreement that hurts the performance of Machine Learning (ML) models trained using them. Furthermore, all large, manually labeled datasets are developed for Java. We recently created a novel dataset for C# to alleviate these issues. This paper evaluates ML code smell detection approaches on our novel dataset. We consider two feature representations to train ML models: (1) code metrics and (2) CodeT5 embeddings. This study is the first to consider the CodeT5 state-of-the-art neural source code embedding for code smell detection in C#. To prove the effectiveness of ML, we consider multiple metrics-based heuristics as alternatives. In our experiments, the best-performing approach was the ML classifier trained on code metrics (F-measure of 0.87 for Long Method and 0.91 for Large Class detection). However, the performance improvement over CodeT5 features is negligible if we consider the advantages of automatically inferring features. Finally, our ML model surpassed less experienced annotators and nearly matched the most experienced annotator, suggesting it can assist less experienced developers under tight deadlines. To the best of our knowledge, this is the first study to compare the performance of automatic smell detectors against human performance. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

Author keywords

Code embeddingsCode smellMachine learningSoftware qualitySource code metrics

Indexed keywords

Engineering controlled terms:Computer software selection and evaluationEncoding (symbols)Large datasetsMachine learningOdors
Engineering uncontrolled termsCode embeddingCode metricsCode smellEmbeddingsLabeled datasetMachine learning modelsMachine-learningPerformanceSoftware QualitySource code metrics
Engineering main heading:Embeddings

Funding details

Funding sponsor Funding number Acronym
Science Fund of the Republic of Serbia6521051
Science Fund of the Republic of Serbia
451-03-47/2023-01/200156
  • 1

    This research was supported by the Science Fund of the Republic of Serbia, Grant No 6521051, AI-Clean CaDET and the Ministry of Science, Technological Development and Innovation through project no. 451-03-47/2023-01/200156 \u201CInnovative scientific and artistic research from the FTS (activity) domain\u201D.

  • ISSN: 09410643
  • Source Type: Journal
  • Original language: English
  • DOI: 10.1007/s00521-024-09551-y
  • Document Type: Article
  • Publisher: Springer Science and Business Media Deutschland GmbH

  Slivka, J.; Department of Computing and Control Engineering, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, Novi Sad, Serbia;
© Copyright 2024 Elsevier B.V., All rights reserved.

Cited by 4 documents

Mao, F. , Zhong, K. , Cheng, L.
Bmco-o: a smart code smell detection method based on co-occurrences
(2025) Automated Software Engineering
Rashid, M.M. , Osman, M.H. , Sharif, K.Y.
Interpretable Deep Learning for Efficient Code Smell Prioritization in Software Development
(2025) IEEE Access
Nguyen, D. , Le, B.
Novel stochastic algorithms for privacy-preserving utility mining
(2024) Applied Intelligence
View details of all 4 citations
{"topic":{"name":"Refactoring; Computer Software Selection and Evaluation; Open Source Software","id":5965,"uri":"Topic/5965","prominencePercentile":94.68596,"prominencePercentileString":"94.686","overallScholarlyOutput":0},"dig":"f9e5b0fc58d85e58f59aa8b24e8bcd2e4426ce415ca3e7b5dbc81406a7a9cb46"}

SciVal Topic Prominence

Topic:
Prominence percentile: