A Preliminary Study on Using Text- and Image-Based Machine Learning to Predict Software Maintainability
Authors: Markus Schnappinger, Simon Zachau, Arnaud Fietzke and Alexander Pretschner
Machine learning has emerged as a useful tool to aid software quality control. It can support identifying problematic code snippets or predicting maintenance eﬀorts. The majority of these frameworks rely on code metrics as input. However, evidence suggests great potential for text- and image-based approaches to predict code quality as well. Using a manually labeled dataset, this preliminary study examines the use of ﬁve text- and two image-based algorithms to predict the readability, understandability, and complexity of source code. While the overall performance can still be improved, we ﬁnd Support Vector Machines (SVM) outperform sophisticated text transformer models and image-based neural networks. Furthermore, text-based SVMs tend to perform well on predicting readability and understandability of code, while image-based SVMs can predict code complexity more accurately. Our study both shows the potential of text- and image-based algorithms for software quality prediction and outlines their weaknesses as a starting point for further research.
Company: Technische Universität München
Talk language: English