Difference between SparkContext, JavaSparkContext, SQLContext, and SparkSession?

java scala apache-spark rdd apache-spark-dataset

sparkContext is a Scala implementation entry point and JavaSparkContext is a java wrapper of sparkContext.

SQLContext is entry point of SparkSQL which can be received from sparkContext.Prior to 2.x.x, RDD ,DataFrame and Data-set were three different data abstractions.Since Spark 2.x.x, All three data abstractions are unified and SparkSession is the unified entry point of Spark.

An additional note is , RDD meant for unstructured data, strongly typed data and DataFrames are for structured and loosely typed data. You can check

Is there any method to convert or create Context using Sparksession ?

yes. its sparkSession.sparkContext() and for SQL, sparkSession.sqlContext()

Can I completely replace all the Context using one single entry SparkSession ?

yes. you can get respective contexs from sparkSession.

Does all the functions in SQLContext, SparkContext,JavaSparkContext etc are added in SparkSession?

Not directly. you got to get respective context and make use of it.something like backward compatibility

How to use such function in SparkSession?

get respective context and make use of it.

How to create the following using SparkSession?

RDD can be created from sparkSession.sparkContext.parallelize(???)
JavaRDD same applies with this but in java implementation
JavaPairRDD sparkSession.sparkContext.parallelize(???).map(//making your data as key-value pair here is one way)
Dataset what sparkSession returns is Dataset if it is structured data.

java scala apache-spark rdd apache-spark-dataset

Explanation from spark source code under branch-2.1

SparkContext: Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.

Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one. This limitation may eventually be removed; see SPARK-2243 for more details.

JavaSparkContext: A Java-friendly version of [[org.apache.spark.SparkContext]] that returns [[org.apache.spark.api.java.JavaRDD]]s and works with Java collections instead of Scala ones.

Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one. This limitation may eventually be removed; see SPARK-2243 for more details.

SQLContext: The entry point for working with structured data (rows and columns) in Spark 1.x.

As of Spark 2.0, this is replaced by [[SparkSession]]. However, we are keeping the class here for backward compatibility.

SparkSession: The entry point to programming Spark with the Dataset and DataFrame API.

java scala apache-spark rdd apache-spark-dataset

I will talk about Spark version 2.x only.

SparkSession: It's a main entry point of your spark Application. To run any code on your spark, this is the first thing you should create.

from pyspark.sql import SparkSessionspark = SparkSession.builder.master("local").appName("Word Count")\.config("spark.some.config.option", "some-value")\.getOrCreate()

SparkContext: It's a inner Object (property) of SparkSession. It's used to interact with Low-Level API Through SparkContext you can create RDD, accumlator and Broadcast variables.

for most cases you won't need SparkContext. You can get SparkContext from SparkSession

val sc = spark.sparkContext

CodeHunter

Difference between SparkContext, JavaSparkContext, SQLContext, and SparkSession?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last