Google News
logo
PySpark - Quiz(MCQ)
Which of the following is used to cache an RDD in memory in PySpark?
A)
persist()
B)
cache()
C)
collect()
D)
saveAsTextFile()

Correct Answer :   persist()


Explanation : persist() is used to cache an RDD in memory in PySpark. It stores the RDD in memory and/or on disk so that it can be reused efficiently in subsequent operations. Other RDD operations in PySpark include mapPartitions(), sortByKey(), reduceByKey(), and aggregateByKey().

Advertisement