How can we create DataFrames in PySpark?

PySpark - Interview Questions

We can do it by making use of the createDataFrame() method of the SparkSession.

data = [('Harry', 20),
       ('Ron', 20),
       ('Hermoine', 20)]
columns = ["Name","Age"]
df = spark.createDataFrame(data=data, schema = columns)?

This creates the dataframe as shown below:

+-----------+----------+
| Name      | Age      |
+-----------+----------+
| Harry     | 20       |
| Ron       | 20       |
| Hermoine  | 20       |
+-----------+----------+?

We can get the schema of the dataframe by using df.printSchema()

>> df.printSchema()
root
|-- Name: string (nullable = true)
|-- Age: integer (nullable = true)?