Column alias after groupBy in pyspark

python scala apache-spark pyspark apache-spark-sql

You can use agg instead of calling max method:

from pyspark.sql.functions import maxjoined_df.groupBy(temp1.datestamp).agg(max("diff").alias("maxDiff"))

Similarly in Scala

import org.apache.spark.sql.functions.maxjoined_df.groupBy($"datestamp").agg(max("diff").alias("maxDiff"))

joined_df.groupBy($"datestamp").agg(max("diff").as("maxDiff"))

python scala apache-spark pyspark apache-spark-sql

This is because you are aliasing the whole DataFrame object, not Column. Here's an example how to alias the Column only:

import pyspark.sql.functions as funcgrpdf = joined_df \    .groupBy(temp1.datestamp) \    .max('diff') \    .select(func.col("max(diff)").alias("maxDiff"))

python scala apache-spark pyspark apache-spark-sql

In addition to the answers already here, the following are also convenient ways if you know the name of the aggregated column, where you don't have to import from pyspark.sql.functions:

grouped_df = joined_df.groupBy(temp1.datestamp) \                      .max('diff') \                      .selectExpr('max(diff) AS maxDiff')

See docs for info on .selectExpr()

grouped_df = joined_df.groupBy(temp1.datestamp) \                      .max('diff') \                      .withColumnRenamed('max(diff)', 'maxDiff')

See docs for info on .withColumnRenamed()

This answer here goes into more detail: https://stackoverflow.com/a/34077809

CodeHunter

Column alias after groupBy in pyspark

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last