Google News
logo
Spark - Interview Questions
What are the different levels of persistence in Spark?
DISK_ONLY : Stores the RDD partitions only on the disk
 
MEMORY_ONLY_SER : Stores the RDD as serialized Java objects with a one-byte array per partition
 
MEMORY_ONLY : Stores the RDD as deserialized Java objects in the JVM. If the RDD is not able to fit in the memory available, some partitions won’t be cached
 
OFF_HEAP : Works like MEMORY_ONLY_SER but stores the data in off-heap memory
 
MEMORY_AND_DISK : Stores RDD as deserialized Java objects in the JVM. In case the RDD is not able to fit in the memory, additional partitions are stored on the disk
 
MEMORY_AND_DISK_SER : Identical to MEMORY_ONLY_SER with the exception of storing partitions not able to fit in the memory to the disk
Advertisement