importing pyspark in python shell importing pyspark in python shell python python

importing pyspark in python shell


Here is a simple method (If you don't bother about how it works!!!)

Use findspark

  1. Go to your python shell

    pip install findsparkimport findsparkfindspark.init()
  2. import the necessary modules

    from pyspark import SparkContextfrom pyspark import SparkConf
  3. Done!!!


If it prints such error:

ImportError: No module named py4j.java_gateway

Please add $SPARK_HOME/python/build to PYTHONPATH:

export SPARK_HOME=/Users/pzhang/apps/spark-1.1.0-bin-hadoop2.4export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH


Turns out that the pyspark bin is LOADING python and automatically loading the correct library paths. Check out $SPARK_HOME/bin/pyspark :

# Add the PySpark classes to the Python path:export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

I added this line to my .bashrc file and the modules are now correctly found!