Which of the following is a transformation operation that shuffles data in PySpark? - PySpark Quiz

PySpark - Quiz(MCQ)

Which of the following is a transformation operation that shuffles data in PySpark?

A)

groupByKey()

B)

map()

C)

filter()

D)

reduce()

Correct Answer : groupByKey()

Explanation : groupByKey() is a transformation operation that shuffles data in PySpark. It groups the values of each key in an RDD and creates a new RDD of (key, value) pairs. Other shuffling operations in PySpark include sortByKey(), reduceByKey(), and aggregateByKey().

Recently Updated in PySpark Questions

____ represents a set of named columns and distributed data.

A)

pyspark.sql.DataFrame

B)

pyspark.sql.Row

C)

pyspark.sql.Column

D)

pyspark.sql.GroupedData

Correct Answer : pyspark.sql.DataFrame

Explanation : pyspark.SQL.DataFrame represents a set of named columns and distributed data.

DataFrame and SQL functionality is accessed through ____.

A)

pyspark.sql.Row

B)

pyspark.sql.Column

C)

pyspark.sql.DataFrame

D)

pyspark.sql.SparkSession

Correct Answer : pyspark.sql.SparkSession

Explanation : DataFrame and SQL functionality are accessed through pyspark.sql.SparkSession.

A UDF extends Spark SQL's DSL vocabulary for transforming DataFrames by defining a new ____-based function.

A)

Tuple

B)

Row

C)

Column

D)

None of the above

Correct Answer : Column

Explaination : A UDF extends Spark SQL's DSL vocabulary for transforming DataFrames by defining a new column-based function.

New Technologies MCQ's

Machine Learning

Artificial Intelligence

Ethical Hacking

Microsoft Azure

Cloud Computing

Quantum Computing

Neural Networks

Virtual Reality

Augmented Reality

Advertisement