Improving Quality of Code Review Datasets - Token-based Feature Extraction Method

Authors: Miroslaw Staron, Wilhelm Meding, Ola Söder, Miroslaw Ochodek

Machine learning is used increasingly frequent in software engineering to automate tasks and improve the speed and quality of software products. One of the areas where machine learning starts to be used is the analysis of software code. The goal of this paper is to evaluate a new method for creating machine learning feature vectors, based on the content of a line of code. We designed a new feature extraction algorithm and evaluated it in an industrial case study. Our results show that using the new feature extraction technique improves the overall performance in terms of MCC (Matthews Correlation Coefficient) by 0.39 - from 0.31 to 0.70, while reducing the precision by 0.05. The implications of this is that we can improve overall prediction accuracy for both true positives and true negatives significantly. This increases the trust in the predictions by the practitioners and contributes to its deeper adoption in practice.

Vorgetragen von: Miroslaw Staron
Unternehmen: University of Gothenburg

Vortragssprache: Englisch
Level: Fortgeschrittene

Partner der Konferenz 2020

ASQF e.V ATB - Austrian Testing Board Blekinge Institute of Technology CON.ECT Eventmanagement dpunkt.verlag GmbH Fortiss GmbH Heise Medien GmbH & Co. KG IREB GmbH iSQI GmbH IT Verlag für Informationstechnik GmbH Österreichische Computer Gesellschaft (OCG) Software Quality Lab GmbH TU Wien, Inst f. Information Systems Eng., CDL-SQ Verband Österreichischer Software Industrie (VÖSI)