What are SparkSession Config Options

SparkSession

To get all the "various Spark parameters as key-value pairs" for a SparkSession, “The entry point to programming Spark with the Dataset and DataFrame API," run the following (this is using Spark Python API, Scala would be very similar).

import pysparkfrom pyspark import SparkConffrom pyspark.sql import SparkSessionspark = SparkSession.builder.getOrCreate()SparkConf().getAll()

or without importing SparkConf:

spark.sparkContext.getConf().getAll()

Depending on which API you are using, see one of the following:

You can get a deeper level list of SparkSession configuration options by running the code below. Most are the same, but there are a few extra ones. I am not sure if you can change these.

spark.sparkContext._conf.getAll()

SparkContext

To get all the "various Spark parameters as key-value pairs" for a SparkContext, the "Main entry point for Spark functionality," ... "connection to a Spark cluster," ... and "to create RDDs, accumulators and broadcast variables on that cluster,” run the following.

import pysparkfrom pyspark import SparkConf, SparkContext spark_conf = SparkConf().setAppName("test")spark = SparkContext(conf = spark_conf)SparkConf().getAll()

Depending on which API you are using, see one of the following:

Spark parameters

You should get a list of tuples that contain the "various Spark parameters as key-value pairs" similar to the following:

[(u'spark.eventLog.enabled', u'true'), (u'spark.yarn.appMasterEnv.PYSPARK_PYTHON', u'/<yourpath>/parcels/Anaconda-4.2.0/bin/python'), ... ... (u'spark.yarn.jars', u'local:/<yourpath>/lib/spark2/jars/*')]

Depending on which API you are using, see one of the following:

For a complete list of Spark properties, see:
http://spark.apache.org/docs/latest/configuration.html#viewing-spark-properties

Setting Spark parameters

Each tuple is ("spark.some.config.option", "some-value") which you can set in your application with:

SparkSession

spark = (    SparkSession    .builder    .appName("Your App Name")    .config("spark.some.config.option1", "some-value")    .config("spark.some.config.option2", "some-value")    .getOrCreate())sc = spark.sparkContext

SparkContext

spark_conf = (    SparkConf()    .setAppName("Your App Name")    .set("spark.some.config.option1", "some-value")    .set("spark.some.config.option2", "some-value"))sc = SparkContext(conf = spark_conf)

spark-defaults

You can also set the Spark parameters in a spark-defaults.conf file:

spark.some.config.option1 some-valuespark.some.config.option2 "some-value"

then run your Spark application with spark-submit (pyspark):

spark-submit \--properties-file path/to/your/spark-defaults.conf \--name "Your App Name" \--py-files path/to/your/supporting/pyspark_files.zip \--class Main path/to/your/pyspark_main.py

json apache-spark spark-notebook

This is how it worked for me to add spark or hive settings in my scala:

{    val spark = SparkSession        .builder()        .appName("StructStreaming")        .master("yarn")        .config("hive.merge.mapfiles", "false")        .config("hive.merge.tezfiles", "false")        .config("parquet.enable.summary-metadata", "false")        .config("spark.sql.parquet.mergeSchema","false")        .config("hive.merge.smallfiles.avgsize", "160000000")        .enableHiveSupport()        .config("hive.exec.dynamic.partition", "true")        .config("hive.exec.dynamic.partition.mode", "nonstrict")        .config("spark.sql.orc.impl", "native")        .config("spark.sql.parquet.binaryAsString","true")        .config("spark.sql.parquet.writeLegacyFormat","true")        //.config(“spark.sql.streaming.checkpointLocation”, “hdfs://pp/apps/hive/warehouse/dev01_landing_initial_area.db”)        .getOrCreate()}

json apache-spark spark-notebook

The easiest way to set some config:

spark.conf.set("spark.sql.shuffle.partitions", 500).

Where spark refers to a SparkSession, that way you can set configs at runtime. It's really useful when you want to change configs again and again to tune some spark parameters for specific queries.

CodeHunter

What are SparkSession Config Options

SparkSession

SparkContext

Spark parameters

Setting Spark parameters

SparkSession

SparkContext

spark-defaults

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last