Google News
logo
PySpark - Interview Questions
What are the advantages of PySpark RDD?
PySpark RDDs have the following advantages :

In-Memory Processing : PySpark’s RDD helps in loading data from the disk to the memory. The RDDs can even be persisted in the memory for reusing the computations.
 
Immutability : The RDDs are immutable which means that once created, they cannot be modified. While applying any transformation operations on the RDDs, a new RDD would be created.

Fault Tolerance : The RDDs are fault-tolerant. This means that whenever an operation fails, the data gets automatically reloaded from other available partitions. This results in seamless execution of the PySpark applications.

Lazy Evolution : The PySpark transformation operations are not performed as soon as they are encountered. The operations would be stored in the DAG and are evaluated once it finds the first RDD action.
Partitioning: Whenever RDD is created from any data, the elements in the RDD are partitioned to the cores available by default.
Advertisement