importing pyspark in python shell
Here is a simple method (If you don't bother about how it works!!!)
Use findspark
Go to your python shell
pip install findsparkimport findsparkfindspark.init()
import the necessary modules
from pyspark import SparkContextfrom pyspark import SparkConf
Done!!!
If it prints such error:
ImportError: No module named py4j.java_gateway
Please add $SPARK_HOME/python/build to PYTHONPATH:
export SPARK_HOME=/Users/pzhang/apps/spark-1.1.0-bin-hadoop2.4export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
Turns out that the pyspark bin is LOADING python and automatically loading the correct library paths. Check out $SPARK_HOME/bin/pyspark :
# Add the PySpark classes to the Python path:export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
I added this line to my .bashrc file and the modules are now correctly found!