Spark - How to count number of records by key

You're nearly there! All you need is a countByValue:

val countOfFemalesByCountry = femaleOnly.map(_(13)).countByValue()// Prints (Australia, 230), (America, 23242), etc.

(In your example, I assume you meant x(10) rather than x._10)

All together:

sc.textFile("/home/cloudera/desktop/file.txt")    .map(_.split(","))    .filter(x => x(10) == "Female")    .map(_(13))    .countByValue()

hadoop apache-spark cloud

Have you considered manipulating your RDD using the Dataframes API ?

It looks like you're loading a CSV file, which you can do with spark-csv.

Then it's a simple matter (if your CSV is titled with the obvious column names) of:

import com.databricks.spark.csv._val countryGender = sqlContext.csvFile("/home/cloudera/desktop/file.txt") // already splits by field  .filter($"gender" === "Female")  .groupBy("country").count().show()

If you want to go deeper in this kind of manipulation, here's the guide:https://spark.apache.org/docs/latest/sql-programming-guide.html

hadoop apache-spark cloud

You can easily create a key, it doesn't have to be in the file/database. For example:

val countryGender = sc.textFile("/home/cloudera/desktop/file.txt")                .map(_.split(","))                .filter(x => x._10 == "Female")                .map(x => (x._13, x._10))    // <<<< here you generate a new key                .groupByKey();

CodeHunter

Spark - How to count number of records by key

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last