【Author】 Yin, Hao Hua Sun; Langenheldt, Klaus; Harlev, Mikkel; Mukkamala, Raghava Rao; Vatrapu, Ravi
【Source】JOURNAL OF MANAGEMENT INFORMATION SYSTEMS
【Abstract】Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an owning entity's real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for de-anonymizing the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilized a sample of 957 entities (with approximate to 385 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 12 categories. Our main finding is that we can indeed predict the type of a yet-unidentified entity. Using the Gradient Boosting algorithm with default parameters, we achieve a mean cross-validation accuracy of 80.42% and F1-score of approximate to 79.64%. We show two examples, one where we predict on a set of 22 clusters that are suspected to be related to cybercriminal activities, and another where we classify 153,293 clusters to provide an estimation of the activity on the Bitcoin ecosystem. We discuss the potential applications of our method for organizational regulation and compliance, societal implications, outline study limitations, and propose future research directions. A prototype implementation of our method for organizational use is included in the appendix.
【Keywords】cryptocurrencies; Bitcoin; blockchain; cybersecurity; supervised machine learning; online anonymity; cybercrime
【标题】规范加密货币:一种监督机器学习方法去匿名化比特币区块链
【摘要】比特币是一种加密货币,其交易记录在分布式、公开可访问的账本上。在比特币区块链上,拥有实体的真实身份隐藏在一个假名后面,即所谓的地址。因此,比特币被广泛认为具有高度的匿名性,这是其频繁用于非法活动的驱动因素。本文提出了一种新的方法,通过使用监督机器学习预测尚未识别的实体类型来对比特币区块链进行去匿名化。我们利用957个实体样本(约3.85亿次交易)作为训练集数据,并构建了区分12类的分类器。我们的主要发现是,我们确实可以预测一个尚未识别的实体的类型。使用带有默认参数的梯度Boosting算法,交叉验证的平均准确率为80.42%,F1-score接近79.64%。我们展示了两个例子,其中一个是我们对22个被怀疑与网络犯罪活动有关的集群进行预测,另一个是我们对153,293个集群进行分类,以提供对比特币生态系统活动的估计。我们讨论了我们的方法在组织规范和合规方面的潜在应用,社会影响,概述了研究的局限性,并提出了未来的研究方向。我们的组织使用方法的原型实现包括在附录中。
【关键词】加密货币;比特币;区块链;网络安全;监督机器学习;在线匿名;网络犯罪
【发表时间】2019
【收录时间】2022-04-23
【文献类型】Article
【论文大主题】链上数据分析
【论文小主题】交易实体识别
【期刊级别】SCI二区
【影响因子】7.582
【翻译者】王佳鑫
评论