Access element of a vector in a Spark DataFrame (Logistic Regression probability vector) [duplicate]

python apache-spark pyspark spark-dataframe apache-spark-ml

Update:

It seems like there is a bug in spark that prevents you from accessing individual elements in a dense vector during a select statement. Normally you should would be able to access them just like you would a numpy array, but when trying to run the code previously posted, you may get the error pyspark.sql.utils.AnalysisException: "Can't extract value from probability#12;"

So, one way to handle this to avoid this silly bug is to use a udf. Similar to the other question, you can define a udf in the following way:

from pyspark.sql.functions import udffrom pyspark.sql.types import FloatTypefirstelement=udf(lambda v:float(v[0]),FloatType())cv_predictions_prod.select(firstelement('probability')).show()

Behind the scenes this still accesses the elements of the DenseVector like a numpy array, but it doesn't throw the same bug as before.

Since this is getting a lot of upvotes, I figured I should strike through the incorrect portion of this answer.

~~Original answer:A dense vector is just a wrapper for a numpy array. So you can access the elements in the same way that you would access the elements of a numpy array.~~

There are several ways to access individual elements of an array in a dataframe. One is to explicitly call the column cv_predictions_prod['probability'] in your select statement. By explicitly calling the column, you can perform operations on that column, like selecting the first element in the array. For example:
cv_predictions_prod.select(cv_predictions_prod['probability'][0]).show()

~~should solve the problem.~~

CodeHunter

Access element of a vector in a Spark DataFrame (Logistic Regression probability vector) [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last