【Author】
Doku, Ronald; Rawat, Danda B.; Liu, Chunmei
【Source】2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019)
【Abstract】In the past few years, data has proliferated to astronomical proportions; as a result, big data has become the driving force behind the growth of many machine learning innovations. However, the incessant generation of data in the information age poses a needle in the haystack problem, where it has become challenging to determine useful data from a heap of irrelevant ones. This has resulted in a quality over quantity issue in data science where a lot of data is being generated, but the majority of it is irrelevant. Furthermore, most of the data and the resources needed to effectively train machine learning models are owned by major tech companies, resulting in a centralization problem. As such, federated learning seeks to transform how machine learning models are trained by adopting a distributed machine learning approach. Another promising technology is the blockchain, whose immutable nature ensures data integrity. By combining the blockchain's trust mechanism and federated learning's ability to disrupt data centralization, we propose an approach that determines relevant data and stores the data in a decentralized manner.
【Keywords】Federated Learning Approach; Data Relevance; Big Data Analytics; Machine Learning
【摘要】在过去的几年里,数据激增到天文数字。因此,大数据已成为许多机器学习创新增长的驱动力。然而,信息时代不断生成的数据给大海捞针带来了麻烦,从一堆不相关的数据中确定有用的数据变得具有挑战性。这导致了数据科学中质量大于数量的问题,其中大量数据正在生成,但其中大部分是无关紧要的。此外,有效训练机器学习模型所需的大部分数据和资源都归大型科技公司所有,从而导致了中心化问题。因此,联邦学习旨在通过采用分布式机器学习方法来改变机器学习模型的训练方式。另一项有前途的技术是区块链,其不可变的性质确保了数据的完整性。通过结合区块链的信任机制和联邦学习破坏数据中心化的能力,我们提出了一种确定相关数据并以去中心化方式存储数据的方法。
【关键词】联邦学习方法;数据相关性;大数据分析;机器学习
评论