A Feasible Graph Partition Framework for Random Walks Implemented by Parallel Computing in Big Graph
摘要：Graph partition is a fundamental problem of parallel computing for big graph data. Many graph partition algorithms have been proposed to solve the problem in various applications, such as matrix computations and Page Rank, etc., but none has pay attention to random walks. Random walks is a widely used method to explore graph structure in lots of fields. The challenges of graph partition for random walks include the large number of times of communication between partitions, too many replications of the vertices, unbalanced partition, etc. In this paper, we propose a feasible graph partition framework for random walks implemented by parallel computing in big graph. The framework is based on two optimization functions to reduce the bandwidth, memory and storage cost in the condition that the load balance is guaranteed. In this framework, several greedy graph partition algorithms are proposed. We also use five metrics from different perspectives to evaluate the performance of these algorithms. By running the algorithms on the big graph data set of real world, the experimental results show that these algorithms in the framework are capable of solving the problem of graph partition for random walks for different needs, e.g. the best result is improved more than 70 times in reducing the times of communication.