SCDetector: Software Functional Clone Detection Based on Semantic Tokens Analysis

Yueming Wu, Deqing Zou, Shihan Dou, Siru Yang, Wei Yang, Feng Cheng, Hong Liang, Hai Jin

Abstract

Code clone detection is to excavate code fragments with similar functionalities, which has been more and more important in software engineering. Many approaches have been proposed for detecting code clones, in which token-based methods are the most scalable but cannot handle semantic clones because of the lack of consideration of program semantics. To address the issue, researchers conduct program analysis to distill the program semantics into a graph representation and detect clones by matching the graphs. However, such approaches suffer from low scalability since graph matching is typically time-consuming. In this paper, we propose SCDetector to combine the scalability of token-based methods with the accuracy of graph-based methods for software functional clone detection. Given a function source code, we first extract the control flow graph by static analysis. Instead of traditional heavyweight graph matching, we treat the graph as a social network and apply social-network-centrality analysis to dig out the centrality of each basic block. Then we assign the centrality to each token in a basic block and sum the centrality of the same token in different basic blocks. By this a graph is turned into certain tokens with graph details (i.e., centrality), called semantic tokens. In final, these semantic tokens are fed into a Siamese architecture neural network to train a model, and use it to detect code clones. We evaluate SCDetector on two large datasets of functionally similar code. Experimental results indicate that our system is superior to state-of-the-art methods and the time cost of SCDetector is more than 14 times less than the state-of-the-art approach in detecting semantic clones

Type

Conference paper

Publication

In the 35th IEEE/ACM International Conference on Automated Software Engineering.

Date

July, 2020

Links

PDF Code