Automated Code Review Comment Classification to Improve Modern Code Reviews
Authors: Mirosław Ochodek, Miroslaw Staron, Wilhelm Meding and Ola Söder
Modern Code Reviews (MCRs) are a widely-used quality assurance mechanism in continuous integration and deployment. Unfortunately, in medium and large projects, the number of changes that need to be integrated, and consequently the number of comments triggered during MCRs could be overwhelming. Therefore, there is a need for quickly recognizing which comments are concerning issues that need prompt attention to guide the focus of the code authors, reviewers, and quality managers. The goal of this study is to design a method for automated classiﬁcation of review comments to identify the needed change faster and with higher accuracy. We conduct a Design Science Research study on three open-source systems. We designed a method (CommentBERT) for automated classiﬁcation of the code-review comments based on the BERT (Bidirectional Encoder Representations from Transformers) language model and a new taxonomy of comments. When applied to 2,672 comments from Wireshark, The Mono Framework, and Open Network Automation Platform (ONAP) projects, the method achieved accuracy, measured using Matthews Correlation Coeﬃcient, of 0.46–0.82 (Wireshark), 0.12–0.8 (ONAP), and 0.48–0.85 (Mono). Based on the results, we conclude that the proposed method seems promising and could be potentially used to build machine-learning-based tools to support MCRs as long as there is a suﬃcient number of historical code-review comments to train the model.
Company: University of Gothenburg
Talk language: English