Correct Answer : 2009
Explanation : Spark is one of Hadoop's sub project developed in 2009 in UC Berkeley's AMPLab by Matei Zaharia.
Correct Answer : Spark SQL
Explanation : Spark SQL introduces a new data abstraction called SchemaRDD, which provides support for structured and semi-structured data.
Explanation : There are three ways of Spark deployment :- Standalone , Hadoop Yarn, Spark in MapReduce.
Correct Answer : DataFrame
Correct Answer : All of the above
Explanation : Apache Spark has following features.: speed, Supports multiple languages ,Advanced Analytics.
Correct Answer : 2
Explanation : Spark uses Hadoop in two ways : one is storage and second is processing.
Correct Answer : GraphX
Explanation : GraphX started initially as a research project at UC Berkeley AMPLab and Databricks, and was later donated to the Spark project.
Correct Answer : Pascal
Explanation : The Spark engine runs in a variety of environments, from cloud services to Hadoop or Mesos clusters.
Correct Answer : Executor Nodes
Correct Answer : SIMR
Explanation : With SIMR, users can start experimenting with Spark and use its shell within a couple of minutes after downloading it.
Correct Answer : Real-time
Explanation : Spark is best suited for real-time data whereas Hadoop is best suited for structured data.
Correct Answer : True
Correct Answer : Execution
Correct Answer : False
Correct Answer : DataFrame in Apache Spark is behind RDD
Correct Answer : RDD
Correct Answer : No
Correct Answer : Scala
Correct Answer : 100 times faster
Correct Answer : Decision Trees
Correct Answer : Logistic Regression
Correct Answer : DataFrames provide a more user-friendly API than RDDs.
Correct Answer : The ways to send result from executors to the driver
Correct Answer : The data required to compute resides on the single partition.
Correct Answer : Both 2 and 3
Correct Answer : Dataset
Correct Answer : Sqoop
Correct Answer : Dstream
Correct Answer : Tanimoto distance
Correct Answer : Spark Streaming
Correct Answer : DAG execution engine and in-memory computation
Correct Answer : Spark is an open source framework which is written in Java
Correct Answer : It enables users to run SQL / HQL queries on the top of Spark.
Correct Answer : It is the kernel of Spark
Correct Answer : SparkSession
Correct Answer : Abstract syntax tree
Correct Answer : acbd
Correct Answer : Apache Flink
Correct Answer : Using SQL we can query data,only from inside a Spark program and not from external tools.
Correct Answer : Either DataFrame or Dataset
Correct Answer : Java and Scala
Correct Answer : The optimizer helps us to run queries much faster than their counter RDD part.
Correct Answer : Map allows returning 0, 1 or more elements from map function.
Correct Answer : mapPartitionWithIndex()
Correct Answer : CountByValue()
Correct Answer : foreach()
Correct Answer : The processing of each batch has no dependency on the data of previous batches.
Correct Answer : Uses data or intermediate results from previous batches and computes the result of the current batch.
Correct Answer : It is the scalable machine learning library which delivers efficiencies
Correct Answer : Both (A) and (B)
Correct Answer : Upon action
Correct Answer : Either fine-grained or coarse-grained
Correct Answer : Coarse-grained
Correct Answer : DAG (Directed Acyclic Graph)
Correct Answer : Takes RDD as input and produces one or more RDD as output.
Correct Answer : Only one
Correct Answer : 3
Correct Answer : It is cost efficient
Correct Answer : Window length, sliding interval
Correct Answer : ReduceByKeyAndWindow
Correct Answer : MEMORY_ONLY
Correct Answer : Both have their own file system