Google News
logo
PySpark - Interview Questions
What is the common workflow of a spark program?
The most common workflow followed by the spark program is:

* The first step is to create input RDDs depending on the external data. Data can be obtained from different data sources.
* Post RDD creation, the RDD transformation operations like filter() or map() are run for creating new RDDs depending on the business logic.
* If any intermediate RDDs are required to be reused for later purposes, we can persist those RDDs.
* Lastly, if any action operations like first(), count() etc are present then spark launches it to initiate parallel computation.
Advertisement