Spark: Custom key compare method for reduceByKey

scala hadoop apache-spark key-value reduce

You can't override the comparison of reduceByKey because it would not be able to use the fact that your data is often shuffled by key on separate executors all over your cluster. You can though change the key (and be aware that depending on the transformation/actions that you use this is likely to re-shuffle the data around).

There is a nifty method in RDD to do this called keyBy, so you can do something like this:

val data: RDD[MyClass] = ...    // Same code you have now.val byId2 = data.keyBy(_.id2)   //Assuming your ids are Longs, will produce a RDD[(Long,MyClass)]

scala hadoop apache-spark key-value reduce

If you are able to alter your class, then reduceByKey uses equals and hashCode. So, you can make sure those are defined and that will result in the correct comparisons being used.

scala hadoop apache-spark key-value reduce

Can't you just map the RDD so that the first element of the pair is the key you want to use?

case class MyClass(id1: Int, id2: Int)val rddToReduce: Rdd[(MyClass, String)] = ... //An RDD with MyClass as keyrddToReduce.map {  case (MyClass(id1, id2), value) => (id2, (id1, value)) //now the key is id2} .reduceByKey {  case (id1, value) => //do the combination here  ...} .map {  case (id2, (id1, combinedValue)) =>  (MyClass(id1, id2), combinedValue) //rearrange so that MyClass is the key again}

CodeHunter

Spark: Custom key compare method for reduceByKey

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last