Filter Pyspark dataframe column with None value

python apache-spark dataframe pyspark apache-spark-sql

You can use Column.isNull / Column.isNotNull:

df.where(col("dt_mvmt").isNull())df.where(col("dt_mvmt").isNotNull())

If you want to simply drop NULL values you can use na.drop with subset argument:

df.na.drop(subset=["dt_mvmt"])

Equality based comparisons with NULL won't work because in SQL NULL is undefined so any attempt to compare it with another value returns NULL:

sqlContext.sql("SELECT NULL = NULL").show()## +-------------+## |(NULL = NULL)|## +-------------+## |         null|## +-------------+sqlContext.sql("SELECT NULL != NULL").show()## +-------------------+## |(NOT (NULL = NULL))|## +-------------------+## |               null|## +-------------------+

The only valid method to compare value with NULL is IS / IS NOT which are equivalent to the isNull / isNotNull method calls.

python apache-spark dataframe pyspark apache-spark-sql

Try to just use isNotNull function.

df.filter(df.dt_mvmt.isNotNull()).count()

python apache-spark dataframe pyspark apache-spark-sql

To obtain entries whose values in the dt_mvmt column are not null we have

df.filter("dt_mvmt is not NULL")

and for entries which are null we have

df.filter("dt_mvmt is NULL")

CodeHunter

Filter Pyspark dataframe column with None value

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last