Partition-Merge: Distributed Inference and Modularity Optimization

Year

2018

Type(s)

Journal Article

Source

Submitted to JMLR (under review).

Url

https://arxiv.org/pdf/1309.6129.pdf

yes

Download File

This paper presents a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our novel randomized partitioning scheme, runs the centralized algorithm on each partition separately, and then {\em stitches} the resulting solutions to produce a global solution. We demonstrate the efficiency of the PM algorithm on two popular problems: computation of Maximum A Posteriori (MAP) assignment in an arbitrary pairwise Markov Random Field (MRF), and modularity optimization for community detection. We show that the resulting distributed algorithms for these problems essentially run in time linear in the number of nodes in the graph, and perform as well — or even better — than the original centralized algorithm as long as the graph has geometric structures. More precisely, if the centralized algorithm is a C-factor approximation with constant C > 1, the resulting distributed algorithm is a (C+d)-factor approximation for any small d>0; and even if the centralized algorithm is a non-constant (e.g. logarithmic) factor approximation, then the resulting distributed algorithm becomes a constant factor approximation. For general graphs, we compute explicit bounds on the loss of performance of the resulting distributed algorithm with respect to the centralized algorithm. To show efficiency of our algorithm, we conducted extensive experiments both on real-world networks and on synthetic networks. For the centralized algorithms, we consider the sequential tree-reweighted max-product message passing for the MAP inference, and Girvan-Newman, Clauset-Newman-Moore, and Louvain-Method for the modularity optimization problem. The experiments demonstrate that PM provides a good trade-off between accuracy and running time. It particularly achieves better efficiency when it is applied to well-distributed regular networks and when the centralized algorithm has high complexity.