Hadoop send record to all reducers Hadoop send record to all reducers hadoop hadoop

Hadoop send record to all reducers


The partitioner doesn't work that way. Its job is to look at the key (usually) and the value (rarely) to determine which reducer the pair should be sent to. This happens after the mapper and before the reducer.

Instead, you (the mapper) should be able to ask the context for the configuration which can answer the total number of reducers (partitions). Your mapper can then output a complex key comprising the actual key you want and a partition number. You know how many times to write this out because the mapper can find out the number of reducers (see above). All the partitioner has to do is breakdown the composite key value, extract the target reducer index and return that index.

By the way, this means that if you're using this technique to send out counts (if you're sorting) or other metadata to be used later in the processing then your real data keys have to follow the same composite format. In fact, you'll probably have to include in the composite key an indicator describing the kind of key/value pair it is (e.g. 1=real data, 0=processing metadata).