Graph Mining using MapReduce Hadoop computing

Enyue Lu, Department of Mathematics and Computer Science, Salisbury University
Stephen Krucelyak, Department of Mathematics and Computer Science, Salisbury University
Corbin McNeill, Department of Mathematics and Computer Science, Salisbury University
Achuachua Tesoh-Snowsel, Department of Mathematics and Computer Science, Salisbury University
Chris Joseph, Stony Brook University
Matthias K. Gobbert, Department of Mathematics and Statistics, UMBC

Analyzing patterns in large-scale graphs, such as social and cyber networks (e.g. Facebook, Linkedin, Twitter), with millions, even billions of edges has many important applications such as community detection, blog analysis, intrusion and spamming detections, and many more. Currently, it is impossible to process information in real-world large-scale networks with millions even billions of objects with a single processor. To overcome single processor limitations, a cluster of computers with multiple processing elements operated in parallel connected by a distributed network are used to solve large-size problems and reduce processing time.

In this project, students will try to enumerate and identify important graph patterns. The network is modeled as a graph. Each person is represented as a vertex and a mutual friendship between people is represented as an edge in the graph. Finding a pattern in a real-world network is equivalent to finding a subgraph in a large-scale graph. We will map graph decomposing operations into a series of MapReduce processes. The proposed MapReduce algorithms will be implemented in Haddop. We will also do performance comparison and analysis for the proposed MapReduce algorithms and simulation results.