Google News
logo
PySpark - Interview Questions
How to create SparkSession?
To create SparkSession, we use the builder pattern. The SparkSession class from the pyspark.sql library has the getOrCreate() method which creates a new SparkSession if there is none or else it returns the existing SparkSession object. The following code is an example for creating SparkSession:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[1]")
                   .appName('InterviewBitSparkSession')
                   .getOrCreate()?

Here,

* master() : This is used for setting up the mode in which the application has to run - cluster mode (use the master name) or standalone mode. For Standalone mode, we use the local[x] value to the function, where x represents partition count to be created in RDD, DataFrame and DataSet. The value of x is ideally the number of CPU cores available.

* appName() : Used for setting the application name

* getOrCreate() : For returning SparkSession object. This creates a new object if it does not exist. If an object is there, it simply returns that.


If we want to create a new SparkSession object every time, we can use the newSession method as shown below:
import pyspark
from pyspark.sql import SparkSession
spark_session = SparkSession.newSession?
Advertisement