Which of the following is used to cache an RDD in memory in PySpark? - PySpark Quiz

PySpark - Quiz(MCQ)

Which of the following is used to cache an RDD in memory in PySpark?

A)

persist()

B)

cache()

C)

collect()

D)

saveAsTextFile()

Correct Answer : persist()

Explanation : persist() is used to cache an RDD in memory in PySpark. It stores the RDD in memory and/or on disk so that it can be reused efficiently in subsequent operations. Other RDD operations in PySpark include mapPartitions(), sortByKey(), reduceByKey(), and aggregateByKey().

Recently Updated in PySpark Questions

____ represents a set of named columns and distributed data.

A)

pyspark.sql.DataFrame

B)

pyspark.sql.Row

C)

pyspark.sql.Column

D)

pyspark.sql.GroupedData

Correct Answer : pyspark.sql.DataFrame

Explanation : pyspark.SQL.DataFrame represents a set of named columns and distributed data.

DataFrame and SQL functionality is accessed through ____.

A)

pyspark.sql.Row

B)

pyspark.sql.Column

C)

pyspark.sql.DataFrame

D)

pyspark.sql.SparkSession

Correct Answer : pyspark.sql.SparkSession

Explanation : DataFrame and SQL functionality are accessed through pyspark.sql.SparkSession.

A UDF extends Spark SQL's DSL vocabulary for transforming DataFrames by defining a new ____-based function.

A)

Tuple

B)

Row

C)

Column

D)

None of the above

Correct Answer : Column

Explaination : A UDF extends Spark SQL's DSL vocabulary for transforming DataFrames by defining a new column-based function.

New Technologies MCQ's

Machine Learning

Artificial Intelligence

Ethical Hacking

Microsoft Azure

Cloud Computing

Quantum Computing

Neural Networks

Virtual Reality

Augmented Reality

Advertisement