aggregate function Count usage with groupBy in Spark

java scala apache-spark pyspark apache-spark-sql

count() can be used inside agg() as groupBy expression is same.

With Python

import pyspark.sql.functions as funcnew_log_df.cache().withColumn("timePeriod", encodeUDF(new_log_df["START_TIME"]))   .groupBy("timePeriod")  .agg(     func.mean("DOWNSTREAM_SIZE").alias("Mean"),      func.stddev("DOWNSTREAM_SIZE").alias("Stddev"),     func.count(func.lit(1)).alias("Num Of Records")   )  .show(20, False)

pySpark SQL functions doc

With Scala

import org.apache.spark.sql.functions._ //for count()new_log_df.cache().withColumn("timePeriod", encodeUDF(col("START_TIME")))   .groupBy("timePeriod")  .agg(     mean("DOWNSTREAM_SIZE").alias("Mean"),      stddev("DOWNSTREAM_SIZE").alias("Stddev"),     count(lit(1)).alias("Num Of Records")   )  .show(20, false)

count(1) will count the records by first column which is equal to count("timePeriod")

With Java

import static org.apache.spark.sql.functions.*;new_log_df.cache().withColumn("timePeriod", encodeUDF(col("START_TIME")))   .groupBy("timePeriod")  .agg(     mean("DOWNSTREAM_SIZE").alias("Mean"),      stddev("DOWNSTREAM_SIZE").alias("Stddev"),     count(lit(1)).alias("Num Of Records")   )  .show(20, false)

CodeHunter

aggregate function Count usage with groupBy in Spark

With Python

With Scala

With Java

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last