Google News
logo
PySpark - Interview Questions
Is it possible to create PySpark DataFrame from external data sources?
Yes, it is! Realtime applications make use of external file systems like local, HDFS, HBase, MySQL table, S3 Azure etc. Following example shows how we can create DataFrame by reading data from a csv file present in the local system:
df = spark.read.csv("/path/to/file.csv")?

PySpark supports csv, text, avro, parquet, tsv and many other file extensions.
Advertisement