Algorithmic Tools for Understanding the Motif Structure of Networks
【Author】 Chen, Tianyi; Matejek, Brian; Mitzenmacher, Michael; Tsourakakis, Charalampos E.
【Source】MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II
【影响因子】
【Abstract】Motifs are small subgraph patterns that play a key role towards understanding the structure and the function of biological and social networks. The current de facto approach towards assessing the statistical significance of a motif M relies on counting its occurrences across the network, and comparing that count to its expected count under some null generative model. This approach can be misleading due to combinatorial artifacts. That is, there may be a large count for a motif due to multiple copies sharing many vertices and edges connected to a subgraph, such as a clique, that completes the multiple copies of the motif. In this work we introduce the novel concept of an (f, q)-spanning motif. A motif M is (f, q)-spanning if there exists a q-fraction of the nodes that induces an f-fraction of the occurrences of M in G. Intuitively, when f is close to 1, and q close to 0, most of the occurrences of M are localized in a small set of nodes, and thus its statistical significance is likely to be due to a combinatorial artifact. We propose efficient heuristics for finding the maximum f for a given q and minimum q for a given f for which a motif is (f, q)-spanning and evaluate them on real-world datasets. Our methods successfully identify combinatorial artifacts that otherwise go undetected using the standard approach for assessing statistical significance. Finally, we leverage the motif structure of a network to design MOTIF-SCOPE, an algorithm that takes as input a graph and two motifs M-1, M-2, and finds subgraphs of the graph where M-1, M-2 occur infrequently and frequently respectively. We show that a good selection of M-1, M-2 allows us to find anomalies in large networks, including bipartite cliques in social graphs, and subgraphs rated with distrust in Bitcoin markets.
【Keywords】Motifs; Graph mining; Statistical significance; Anomaly detection
【发表时间】2023
【收录时间】2023-06-28
【文献类型】
【主题类别】
--
评论