Google News
logo
PySpark - Interview Questions
What do you understand by Lineage Graph in PySpark?
The Lineage Graph is a collection of RDD dependencies. There are separate lineage graphs for every Spark application. The lineage graph recompiles RDDs on-demand and restores misplaced data from persisted RDDs. An RDD lineage graph lets us assemble a new RDD or restore data from a lost persisted RDD. It was created by using changes to the RDD and generating a regular execution plan.
Advertisement