Google News
logo
PySpark - Interview Questions
How can we create DataFrames in PySpark?
We can do it by making use of the createDataFrame() method of the SparkSession.
data = [('Harry', 20),
       ('Ron', 20),
       ('Hermoine', 20)]
columns = ["Name","Age"]
df = spark.createDataFrame(data=data, schema = columns)?

This creates the dataframe as shown below:
+-----------+----------+
| Name      | Age      |
+-----------+----------+
| Harry     | 20       |
| Ron       | 20       |
| Hermoine  | 20       |
+-----------+----------+?

We can get the schema of the dataframe by using df.printSchema()
>> df.printSchema()
root
|-- Name: string (nullable = true)
|-- Age: integer (nullable = true)?
Advertisement