Is it possible to create PySpark DataFrame from external data sources?

PySpark - Interview Questions

Yes, it is! Realtime applications make use of external file systems like local, HDFS, HBase, MySQL table, S3 Azure etc. Following example shows how we can create DataFrame by reading data from a csv file present in the local system:

df = spark.read.csv("/path/to/file.csv")?

PySpark supports csv, text, avro, parquet, tsv and many other file extensions.