Distributed local clustering coefficient algorithm (MapReduce/Hadoop) Distributed local clustering coefficient algorithm (MapReduce/Hadoop) hadoop hadoop

Distributed local clustering coefficient algorithm (MapReduce/Hadoop)


In the paper "MapReduce in MPI for large scale graph algorithms" the authors give a nice description of a MapReduce implementation of Triangle Counting. The paper is available here: http://www.sciencedirect.com/science/article/pii/S0167819111000172 but you may need an account to access the paper. (I'm on a University system that's paid for the subscription, so I never know what I only have access to because they've already paid.) The authors may have a draft of the paper posted on the personal website(s).

There is another way you could count triangles--probably much less efficient unless your graph is fairly dense. First, construct the adjacency matrix of your graph, A. Then compute A^3 (you could do the matrix multiplication in parallel pretty easily). Then, sum up the (i,i) entries of A^3 and divide the answer by 6. That'll give you the number of triangles because the i,j entry of A^k counts the number of length k walks from i to j and since we are only looking at length 3 walks, any walk that starts at i and ends at i after 3 steps is a triangle... overcounting by a factor of 6. This is mainly less efficient because the size of the matrix will be very large compared to the size of an edgelist if your graph is sparse.