Towards Robust Similarity Detection of Smart Contracts with Masked Language Modelling
- Tian, ZZ; Ke, XQ
- 2023
- 点赞
- 收藏
【Author】 Tian, Zhenzhou; Ke, Xianqun
【Source】ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022
【影响因子】
【Abstract】Smart contracts are programs that run on blockchains. The whole smart contract ecosystem tends to be highly homogeneous, due to the immutable nature of contracts once deployed, as well as the copy-paste practice in developing smart contracts. Thus, similarity detection between smart contracts is of great value, which facilitates the quality assurance of the whole ecosystem, by providing a way to identify and track clones among the smart contracts. To this end, this work presents SoliSim, which encodes smart contracts into informative semantic vectors for effective and efficient similarity detection. The smart contract encoding procedure is enforced with masked language modelling on the Solidity programming language, which pre-trains a bert-like model by feeding in normalized token sequences extracted from the smart contracts' abstract syntax trees (ASTs); while the similarity detection procedure is enforced via simply calculating a score on the encoded numerical vectors. As the experimental results show, the pre-trained strategy adopted by SoliSim is capable of capturing the contextual and semantic information of smart contracts' code. The similarity scores calculated with SoliSim on pairs of real cloned contracts all exceed 96%, while the values between non-clone pairs are all below 50%.
【Keywords】Smart contract; Similarity detection; Pre-trained model
【发表时间】2023
【收录时间】2023-05-08
【文献类型】理论模型
【主题类别】
区块链技术-核心技术-智能合约
评论