Hadoop Matrix Multiplication Hadoop Matrix Multiplication hadoop hadoop

Hadoop Matrix Multiplication


In TestMatrixMultiply.java, from the web site you linked, which presumably contains the code you're using to encode your matrices into the expected IndexPair-backed file format, there is the function writeMatrix:

public static void writeMatrix (int[][] matrix, int rowDim, int colDim, String pathStr)    throws IOException{    Path path = new Path(pathStr);    SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path,         MatrixMultiply.IndexPair.class, IntWritable.class,         SequenceFile.CompressionType.NONE);    MatrixMultiply.IndexPair indexPair = new MatrixMultiply.IndexPair();    IntWritable el = new IntWritable();    for (int i = 0; i < rowDim; i++) {        for (int j = 0; j < colDim; j++) {            int v = matrix[i][j];            if (v != 0) { // !!! well, that would be why we aren't writing 0s                indexPair.index1 = i;                indexPair.index2 = j;                el.set(v);                writer.append(indexPair, el);            }        }    }    writer.close();}

Comment inserted in the second line of the inner for loop.

Your mapper is reading no 0s because your input files contain no 0s.

The code is heavily designed to assume that all missing values are 0, and it performs additional checks to avoid emitting 0s, to attempt to minimize network traffic.

Everything below here is mostly wrong because I misunderstood the algorithm
(the part above is still useful, though)

From the linked page, you're using strategy 3. Strategy 3 relies on partitioner behavior and sort order. Unfortunately, the partitioner is wrong! This is separate from the 0s not being printed out. The partitioner is just straight-up wrong here, and you're getting matrices full of 0s because it's multiplying by data that was previously initialized to 0 and never overwritten with the correct data for the block. This is hidden on 1-machine operation, because the partitioner is a null operation, but breaks in most clusters.

The partitioner maps intermediate key (kb, jb, ib) to a reducer r as follows:

r = (jb*KB + kb) mod R

It needs to guarantee that all keys for the same block are assigned to the same reducer. Unfortunately, it guarantees that this will not happen unless KB % numReducers == 0:

map (key, value)   if from matrix A with key=(i,k) and value=a(i,k)      for 0 <= jb < NJB         emit (k/KB, jb, i/IB), (i mod IB, k mod KB, a(k,j)) // compare this...   if from matrix B with key=(k,j) and value=b(k,j)       emit (k/KB, j/JB, -1), (k mod KB, j mod KB, b(k,j))  // ...to this

For matrix A, key jb is being iterated. For matrix B, key jb is being calculated. Since assignment to a reducer is round-robin, this guarantees that the A-matrix keys will not be assigned to the same reducer as the B-matrix keys. Therefore, the algorithm fails, since it assumes something about assignment and ordering of keys that is provably incorrect. It's correct about key order if and only if there is one reducer to which all keys are assigned, but the partitioner is wrong!

The partitioner must be modified to use kb % numReducers for Strategy 3. This is not a very good partitioner, but it is the only partitioner that will work because non-identical keys are required to be sorted to the same reducer in a particular order.

The code that you have actually put inside the question is unrelated to where the bug actually exists.