Pyspark: Exception: Java gateway process exited before sending the driver its port number
One possible reason is JAVA_HOME is not set because java is not installed.
I encountered the same issue. It says
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 51.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:643) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:277) at java.net.URLClassLoader.access$000(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:212) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:296) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:406)Traceback (most recent call last): File "<string>", line 1, in <module> File "/opt/spark/python/pyspark/conf.py", line 104, in __init__ SparkContext._ensure_initialized() File "/opt/spark/python/pyspark/context.py", line 243, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway raise Exception("Java gateway process exited before sending the driver its port number")Exception: Java gateway process exited before sending the driver its port number
at sc = pyspark.SparkConf()
. I solved it by running
sudo add-apt-repository ppa:webupd8team/javasudo apt-get updatesudo apt-get install oracle-java8-installer
which is from https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04
this should help you
One solution is adding pyspark-shell to the shell environment variable PYSPARK_SUBMIT_ARGS:
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
There is a change in python/pyspark/java_gateway.py , which requires PYSPARK_SUBMIT_ARGS includes pyspark-shell if a PYSPARK_SUBMIT_ARGS variable is set by a user.
Had this error message running pyspark on Ubuntu, got rid of it by installing the openjdk-8-jdk
package
from pyspark import SparkConf, SparkContextsc = SparkContext(conf=SparkConf().setAppName("MyApp").setMaster("local"))^^^ error
Install Open JDK 8:
apt-get install openjdk-8-jdk-headless -qq
On MacOS
Same on Mac OS, I typed in a terminal:
$ java -versionNo Java runtime present, requesting install.
I was prompted to install Java from the Oracle's download site, chose the MacOS installer, clicked on jdk-13.0.2_osx-x64_bin.dmg
and after that checked that Java was installed
$ java -versionjava version "13.0.2" 2020-01-14
EDIT To install JDK 8 you need to go to https://www.oracle.com/java/technologies/javase-jdk8-downloads.html (login required)
After that I was able to start a Spark context with pyspark.
Checking if it works
In Python:
from pyspark import SparkContext sc = SparkContext.getOrCreate() # check that it really works by running a job# example from http://spark.apache.org/docs/latest/rdd-programming-guide.html#parallelized-collectionsdata = range(10000) distData = sc.parallelize(data)distData.filter(lambda x: not x&1).take(10)# Out: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Note that you might need to set the environment variables PYSPARK_PYTHON
and PYSPARK_DRIVER_PYTHON
and they have to be the same Python version as the Python (or IPython) you're using to run pyspark (the driver).