Filtering a pyspark dataframe using isin by exclusion [duplicate]

It looks like the ~ gives the functionality that I need, but I am yet to find any appropriate documentation on it.

df.filter(~col('bar').isin(['a','b'])).show()+---+---+| id|bar|+---+---+|  4|  c||  5|  d|+---+---+

python apache-spark pyspark pyspark-sql

Also could be like this

df.filter(col('bar').isin(['a','b']) == False).show()

python apache-spark pyspark pyspark-sql

Got a gotcha for those with their headspace in Pandas and moving to pyspark

 from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext spark_conf = SparkConf().setMaster("local").setAppName("MyAppName") sc = SparkContext(conf = spark_conf) sqlContext = SQLContext(sc) records = [     {"colour": "red"},     {"colour": "blue"},     {"colour": None}, ] pandas_df = pd.DataFrame.from_dict(records) pyspark_df = sqlContext.createDataFrame(records)

So if we wanted the rows that are not red:

pandas_df[~pandas_df["colour"].isin(["red"])]

Looking good, and in our pyspark DataFrame

pyspark_df.filter(~pyspark_df["colour"].isin(["red"])).collect()

So after some digging, I found this: https://issues.apache.org/jira/browse/SPARK-20617So to include nothingness in our results:

pyspark_df.filter(~pyspark_df["colour"].isin(["red"]) | pyspark_df["colour"].isNull()).show()

CodeHunter

Filtering a pyspark dataframe using isin by exclusion [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last