Automated Code Review Comment Classification to Improve Modern Code Reviews

Authors: Mirosław Ochodek, Miroslaw Staron, Wilhelm Meding and Ola Söder

Modern Code Reviews (MCRs) are a widely-used quality assurance mechanism in continuous integration and deployment. Unfortunately, in medium and large projects, the number of changes that need to be integrated, and consequently the number of comments triggered during MCRs could be overwhelming. Therefore, there is a need for quickly recognizing which comments are concerning issues that need prompt attention to guide the focus of the code authors, reviewers, and quality managers. The goal of this study is to design a method for automated classification of review comments to identify the needed change faster and with higher accuracy. We conduct a Design Science Research study on three open-source systems. We designed a method (CommentBERT) for automated classification of the code-review comments based on the BERT (Bidirectional Encoder Representations from Transformers) language model and a new taxonomy of comments. When applied to 2,672 comments from Wireshark, The Mono Framework, and Open Network Automation Platform (ONAP) projects, the method achieved accuracy, measured using Matthews Correlation Coecient, of 0.46–0.82 (Wireshark), 0.12–0.8 (ONAP), and 0.48–0.85 (Mono). Based on the results, we conclude that the proposed method seems promising and could be potentially used to build machine-learning-based tools to support MCRs as long as there is a sucient number of historical code-review comments to train the model.

Presented by: Miroslaw Staron
Company: University of Gothenburg

Talk language: English
Level: Advanced
Target group:

Partner der Konferenz 2022

ASQF e.V ATB - Austrian Testing Board coderskitchen dpunkt.verlag GmbH Fortiss GmbH GTB - German Testing Board Heise Medien GmbH & Co. KG iSQI GmbH IT Verlag für Informationstechnik GmbH SIGS DATACOM GmbH TU Wien, Institut für Information Systems Engineering, CDL-SQI WKO - Wirtschaftskammer