spark 2.1.0 session config settings (pyspark) spark 2.1.0 session config settings (pyspark) python python

spark 2.1.0 session config settings (pyspark)


You aren't actually overwriting anything with this code. Just so you can see for yourself try the following.

As soon as you start pyspark shell type:

sc.getConf().getAll()

This will show you all of the current config settings. Then try your code and do it again. Nothing changes.

What you should do instead is create a new configuration and use that to create a SparkContext. Do it like this:

conf = pyspark.SparkConf().setAll([('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.cores.max', '3'), ('spark.driver.memory','8g')])sc.stop()sc = pyspark.SparkContext(conf=conf)

Then you can check yourself just like above with:

sc.getConf().getAll()

This should reflect the configuration you wanted.


update configuration in Spark 2.3.1

To change the default spark configurations you can follow these steps:

Import the required classes

from pyspark.conf import SparkConffrom pyspark.sql import SparkSession

Get the default configurations

spark.sparkContext._conf.getAll()

Update the default configurations

conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')])

Stop the current Spark Session

spark.sparkContext.stop()

Create a Spark Session

spark = SparkSession.builder.config(conf=conf).getOrCreate()


You could also set configuration when you start pyspark, just like spark-submit:

pyspark --conf property=value

Here is one example

-bash-4.2$ pysparkPython 3.6.8 (default, Apr 25 2019, 21:02:35) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linuxWelcome to      ____              __     / __/__  ___ _____/ /__    _\ \/ _ \/ _ `/ __/  '_/   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0-cdh6.2.0      /_/Using Python version 3.6.8 (default, Apr 25 2019 21:02:35)SparkSession available as 'spark'.>>> spark.conf.get('spark.eventLog.enabled')'true'>>> exit()-bash-4.2$ pyspark --conf spark.eventLog.enabled=falsePython 3.6.8 (default, Apr 25 2019, 21:02:35) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linuxWelcome to      ____              __     / __/__  ___ _____/ /__    _\ \/ _ \/ _ `/ __/  '_/   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0-cdh6.2.0      /_/Using Python version 3.6.8 (default, Apr 25 2019 21:02:35)SparkSession available as 'spark'.>>> spark.conf.get('spark.eventLog.enabled')'false'