Google News
logo
Hadoop - Quiz(MCQ)
A)
Bell Labs
B)
Sun Microsystems
C)
Apache Software Foundation
D)
Hadoop Software Foundation

Correct Answer :   Apache Software Foundation


Explanation :

According to its co-founders, Doug Cutting and Mike Cafarella, the genesis of Hadoop was the Google File System paper that was published in October 2003.
 
The Apache Software Foundation has stated that only software officially released by the Apache Hadoop Project can be called Apache Hadoop or Distributions of Apache Hadoop. The naming of products and derivative works from other vendors and the term "compatible" are somewhat controversial within the Hadoop developer community.

A)
1st April 2005
B)
1st April 2006
C)
1st April 2007
D)
1st April 2008

Correct Answer :   1st April 2006

A)
Apache License 1.0
B)
Apache License 2.0
C)
Apache License 2.3
D)
Apache License 2.7

Correct Answer :   Apache License 2.0


Explanation : Hadoop is Open Source, released under Apache 2 license.

A)
C
B)
C++
C)
Python
D)
Java

Correct Answer :   Java


Explanation : The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and Command-line utilities written as shell scripts.

A)
Unix-like
B)
Debian
C)
Bare metal
D)
Cross-platform

Correct Answer :   Cross-platform


Explanation : Hadoop has support for cross-platform operating system.

A)
ZFS
B)
Operating system
C)
RAID
D)
Standard RAID levels

Correct Answer :   RAID


Explanation : With the default replication value, 3, data is stored on three nodes: two on the same rack, and one on a different rack.

A)
MapReduce
B)
Google
C)
Facebook
D)
Functional programming

Correct Answer :   MapReduce


Explanation : MapReduce engine uses to distribute work around a cluster.

A)
Artificial intelligence
B)
Machine learning
C)
Pattern recognition
D)
Statistical classification

Correct Answer :   Machine learning


Explanation : The Apache Mahout project’s goal is to build a scalable machine learning tool.

A)
JAX-RS
B)
Distributed file system
C)
Java Message Service
D)
Relational Database Management System

Correct Answer :   Distributed file system


Explanation : The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to the user.

A)
Google
B)
Google Variations
C)
Google Latitude
D)
Android (operating system)

Correct Answer :   Google


Explanation : Google and IBM Announce University Initiative to Address Internet-Scale.

A)
Improved data warehousing functionality
B)
Improved data storage and information retrieval
C)
Improved security, workload management, and SQL support
D)
Improved extract, transform and load features for data integration

Correct Answer :   Improved security, workload management, and SQL support


Explanation : Adding security to Hadoop is challenging because all the interactions do not follow the classic client-server pattern.

A)
Management of Hadoop clusters
B)
Collecting and storing unstructured data
C)
Data warehousing and business intelligence
D)
Big data management and data mining

Correct Answer :   Big data management and data mining


Explanation : Data warehousing integrated with Hadoop would give a better understanding of data.

A)
MapReduce, Heron and Trumpet
B)
MapReduce, Hummer and Iguana
C)
MapReduce, MySQL and Google Apps
D)
MapReduce, Hive and HBase

Correct Answer :   MapReduce, Hive and HBase


Explanation : To use Hive with HBase you’ll typically want to launch two clusters, one to run HBase and the other to run Hive.

A)
Cutting’s high school rock band
B)
The toy elephant of Cutting’s son
C)
Creator Doug Cutting’s favorite circus act
D)
A sound Cutting’s laptop made during Hadoop development

Correct Answer :   The toy elephant of Cutting’s son


Explanation : Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant.

A)
Real-time
B)
Java-based
C)
Open-source
D)
Distributed computing approach

Correct Answer :   Real-time


Explanation : Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware.

A)
Oozie
B)
Mahout
C)
MapReduce
D)
All of the above

Correct Answer :   MapReduce


Explanation : MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm.

A)
Apple
B)
Datamatics
C)
Facebook
D)
None of the above

Correct Answer :   Facebook


Explanation : Facebook has many Hadoop clusters, the largest among them is the one that is used for Data warehousing.

A)
Pig
B)
Hive
C)
Oozie
D)
Pig Latin

Correct Answer :   Pig


Explanation : Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs.

A)
Scalding
B)
Cascalog
C)
HCatalog
D)
All of the above

Correct Answer :   Cascalog


Explanation : Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the name “Cascalog” is a contraction of Cascading and Datalog.

A)
Scalding
B)
HCatalog
C)
Cascalog
D)
Cascading

Correct Answer :   Cascading


Explanation : Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes and data flow abstractions.

A)
Drill
B)
Mapreduce
C)
Oozie
D)
None of the above

Correct Answer :   Mapreduce


Explanation : Mapreduce provides a flexible and scalable foundation for analytics, from traditional reporting to leading-edge machine learning algorithms.

A)
XML
B)
JSON
C)
SQL
D)
All of the above

Correct Answer :   SQL


Explanation : Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL and the low-level procedural style of MapReduce.

A)
Avro
B)
Drill
C)
BigTop
D)
Chukwa

Correct Answer :   Avro


Explanation : In the context of Hadoop, Avro can be used to pass data from one program or language to another.

A)
TaskTracker
B)
Mapper
C)
JobTracker
D)
MapReduce

Correct Answer :   TaskTracker


Explanation : TaskTracker receives the information necessary for the execution of a Task from JobTracker, Executes the Task, and Sends the Results back to JobTracker.

A)
Hadoop Stream
B)
Hadoop Strdata
C)
Hadoop Streaming
D)
None of the above

Correct Answer :   Hadoop Streaming


Explanation : Hadoop streaming is one of the most important utilities in the Apache Hadoop distribution.

A)
Reducer
B)
Mapper
C)
Both (A) and (B)
D)
None of the above

Correct Answer :   Mapper


Explanation : Maps are the individual tasks that transform input records into intermediate records.

A)
tasks
B)
outputs
C)
Both (A) and (B)
D)
inputs

Correct Answer :   inputs


Explanation : Total size of inputs means the total number of blocks of the input files.

A)
HashPar
B)
Partitioner
C)
HashPartitioner
D)
None of the above

Correct Answer :   HashPartitioner


Explanation : The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition to partition.

A)
JobConfigurable.configure
B)
JobConfigure.configure
C)
JobConfigurable.configurable
D)
None of the above

Correct Answer :   JobConfigurable.configure


Explanation : JobConfigurable.configure method is overridden to initialize themselves.

A)
Shuffle
B)
Reducer
C)
Mapper
D)
All of the above

Correct Answer :   Reducer


Explanation : In the Shuffle phase the framework fetches the relevant partition of the output of all the mappers, via HTTP.

A)
Mapper
B)
Scalding
C)
Cascader
D)
None of the above

Correct Answer :   None of the above


Explanation : The output of the reduce task is typically written to the FileSystem. The output of the Reducer is not sorted.

A)
Shuffle and Sort
B)
Shuffle and Map
C)
Reduce and Sort
D)
All of the above

Correct Answer :   Shuffle and Sort


Explanation : The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

A)
Reporter
B)
Partitioner
C)
OutputCollector
D)
All of the above

Correct Answer :   Reporter


Explanation : Reporter is a facility for MapReduce applications to report progress, set application-level status messages and update Counters.

A)
MemoryConf
B)
Map Parameters
C)
JobConf
D)
None of the above

Correct Answer :   JobConf


Explanation : JobConf represents a MapReduce job configuration.

A)
SQL
B)
NoSQL
C)
NewSQL
D)
All of the above

Correct Answer :   NoSQL


Explanation : NoSQL systems make the most sense whenever the application is based on data with varying data types and the data can be stored in key-value notation.

A)
Hive
B)
Hbase
C)
Both (A) and (B)
D)
HCatalog

Correct Answer :   HCatalog


Explanation : Other means of tagging the values also can be used.

A)
Scale up
B)
Scale out
C)
Both Scale up and out
D)
None of the above

Correct Answer :   Scale out


Explanation : HDFS and NoSQL file systems focus almost exclusively on adding nodes to increase performance (scale-out) but even they require node configuration with elements of scale up.

A)
Hbase
B)
Cassandra
C)
MongoDB
D)
None of the above

Correct Answer :   Hbase


Explanation : HBase is the Hadoop database: a distributed, scalable Big Data store that lets you host very large tables — billions of rows multiplied by millions of columns — on clusters built with commodity hardware.

A)
DataCache
B)
DistributedData
C)
DistributedCache
D)
All of the above

Correct Answer :   DistributedCache


Explanation : The child-jvm always has its current working directory added to the java.library.path and LD_LIBRARY_PATH.

A)
Bigtable
B)
BigTop
C)
TopTable
D)
None of the above

Correct Answer :   Bigtable


Explanation : Google Bigtable leverages the distributed data storage provided by the Google File System.

A)
tool
B)
task
C)
library
D)
generic

Correct Answer :   generic


Explanation : Place the generic options before the streaming options, otherwise the command will fail.

A)
mapper executable
B)
input directoryname
C)
output directoryname
D)
All of the above

Correct Answer :   All of the above


Explanation : Required parameters are used for Input and Output location for the mapper.

A)
-cmenv EXAMPLE_DIR=/home/example/dictionaries/
B)
-cmdenv EXAMPLE_DIR=/home/example/dictionaries/
C)
-cmden EXAMPLE_DIR=/home/example/dictionaries/
D)
-cmdev EXAMPLE_DIR=/home/example/dictionaries/

Correct Answer :   -cmdenv EXAMPLE_DIR=/home/example/dictionaries/


Explanation : Environment Variable is set using cmdenv command.

A)
Copy
B)
Paste
C)
Cut
D)
Move

Correct Answer :   Cut


Explanation : The map function defined in the class treats each input key/value pair as a list of fields.

A)
KeyFieldBasedComparator
B)
KeyFieldBased
C)
KeyFieldComparator
D)
All of the above

Correct Answer :   KeyFieldBasedComparator


Explanation : Hadoop has a library class, KeyFieldBasedComparator, that is useful for many applications.

A)
Map
B)
Reduce
C)
Reducer
D)
None of the above

Correct Answer :   Reducer


Explanation : Aggregate provides a special reducer class and a special combiner class, and a list of simple aggregators that perform aggregations such as “sum”, “max”, “min” and so on over a sequence of values.

A)
KeyFieldBased
B)
KeyFieldPartitioner
C)
Both (A) and (B)
D)
KeyFieldBasedPartitioner

Correct Answer :   KeyFieldBasedPartitioner


Explanation : The primary key is used for partitioning, and the combination of the primary and secondary keys is used for sorting.

A)
Replication
B)
NameNode
C)
Data Node
D)
Data block

Correct Answer :   NameNode


Explanation : All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

A)
worker/slave
B)
master-worker
C)
master-slave
D)
All of the above

Correct Answer :   master-worker


Explanation : NameNode servers as the master and each DataNode servers as a worker/slave

A)
Data
B)
Rack
C)
Secondary
D)
None of the above

Correct Answer :   Secondary


Explanation : Secondary namenode is used for all time availability and reliability.

A)
HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
B)
HDFS is suitable for storing data related to applications requiring low latency data access
C)
HDFS is suitable for storing data related to applications requiring low latency data access
D)
None of the above

Correct Answer :   HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file


Explanation : HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance.

A)
DataNode
B)
Data block
C)
Replication
D)
NameNode

Correct Answer :   DataNode


Explanation : A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them.

A)
"DFS Shell"
B)
"FS Shell"
C)
"HDFS Shell"
D)
None of the above

Correct Answer :   "FS Shell"


Explanation : The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS).

A)
Data Node
B)
NameNode
C)
Replication
D)
Resource

Correct Answer :   Resource


Explanation : All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

A)
DataNode
B)
ActionNode
C)
NameNode
D)
None of the above

Correct Answer :   NameNode


Explanation : HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it.

A)
Pig
B)
Hive
C)
Lucene
D)
MapReduce

Correct Answer :   MapReduce


Explanation : MapReduce is the heart of hadoop.

A)
job-tracker
B)
map-tracker
C)
reduce-tracker
D)
all of the above

Correct Answer :   job-tracker


Explanation : Map-Reduce jobs are submitted on job-tracker.

A)
TaskTracker
B)
DataNodes
C)
ActionNodes
D)
All of the above

Correct Answer :   DataNodes


Explanation : A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status whether the node is dead or alive.

A)
puts
B)
gets
C)
getSplits
D)
all of the above

Correct Answer :   getSplits


Explanation : InputFormat uses their storage locations to schedule map tasks to process them on the tasktrackers.

A)
InputFormat
B)
TextFormat
C)
TextInputFormat
D)
All of the above

Correct Answer :   TextInputFormat


Explanation : A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs.

A)
shuffling
B)
forking
C)
reducing
D)
secondary sorting

Correct Answer :   shuffling


Explanation : All values corresponding to the same key will go the same reducer.

A)
outstream
B)
inputstream
C)
datastream
D)
filesystem

Correct Answer :   filesystem


Explanation : InputDataStream is used to read data from file.

A)
Utils
B)
IOUtils
C)
IUtils
D)
All of the above

Correct Answer :   IOUtils


Explanation : IOUtils class is static method in Java interface.

A)
write()
B)
read()
C)
readwrite()
D)
All of the above

Correct Answer :   write()


Explanation : Readfully method can also be used instead of read method.

A)
Mapper
B)
Writable
C)
Reducer
D)
Readable

Correct Answer :   Reducer


Explanation : Reducer implementations can access the JobConf for the job.

A)
OutputCollect
B)
InputCollector
C)
OutputCollector
D)
All of the above

Correct Answer :   OutputCollector


Explanation : In reduce phase the reduce(Object, Iterator, OutputCollector, Reporter) method is called for each pair in the grouped inputs.

A)
classes
B)
methods
C)
commands
D)
None of the above

Correct Answer :   None of the above


Explanation : Hadoop I/O consist of primitives for serialization and deserialization.

A)
Putfile
B)
SequenceFile
C)
GetFile
D)
All of the above

Correct Answer :   SequenceFile


Explanation : SequenceFile is append-only.

A)
3
B)
4
C)
5
D)
6

Correct Answer :   3


Explanation : SequenceFile has 3 available formats: An “Uncompressed” format, a “Record Compressed” format and a “Block-Compressed”.

A)
Uncompressed
B)
Block-Compressed
C)
Partition Compressed
D)
Record Compressed

Correct Answer :   Block-Compressed


Explanation : SequenceFile key-value list can be just a Text/Text pair, and is written to the file during the initialization that happens in the SequenceFile.

A)
Array
B)
Index
C)
Immutable
D)
All of the above

Correct Answer :   Index


Explanation : Index doesn’t contains all the keys but just a fraction of the keys.

A)
SetFile
B)
BloomMapFile
C)
ArrayFile
D)
None of the above

Correct Answer :   ArrayFile


Explanation : The SetFile instead of append(key, value) as just the key field append(key) and the value is always the NullWritable instance.

A)
Avro
B)
Oozie
C)
cTakes
D)
Lucene

Correct Answer :   Avro


Explanation : Avro is a splittable data format with a metadata section at the beginning and then a sequence of avro serialized objects.

A)
JS
B)
XML
C)
XHTML
D)
JSON

Correct Answer :   JSON


Explanation : The JSON schema content is put into a file.

A)
DatumReader
B)
DatumRead
C)
DatReader
D)
None of the above

Correct Answer :   DatumReader


Explanation : DatumReader reads the content through the DataFileReader implementation.

A)
Mapper
B)
AvroMapper
C)
AvroReducer
D)
None of the above

Correct Answer :   AvroMapper


Explanation : AvroMapper is used to provide the ability to collect or map data.

A)
Mapper
B)
AvroReducer
C)
AvroMapper
D)
None of the above

Correct Answer :   AvroReducer


Explanation : AvroReducer summarizes them by looping through the values.

A)
kafka
B)
Lucene
C)
MapReduce
D)
None of the above

Correct Answer :   MapReduce


Explanation : You can use Avro and MapReduce together to process many items serialized with Avro’s small binary format.

A)
Snappy
B)
Snapcheck
C)
FileCompress
D)
None of the above

Correct Answer :   Snappy


Explanation : Snappy has fast compression and decompression speeds.

A)
Gzip
B)
Bzip2
C)
Both (A) and (B)
D)
LZO

Correct Answer :   LZO


Explanation : LZO is only really desirable if you need to compress text files.

A)
LZO
B)
Gzip
C)
Bzip2
D)
All of the above

Correct Answer :   LZO


Explanation : LZO enables the parallel processing of compressed text file splits by your MapReduce jobs.

A)
.g
B)
.gz
C)
.gzp
D)
.gzip

Correct Answer :   .gz


Explanation : You can use the gunzip command to decompress files that were created by a number of compression utilities, including Gzip.

A)
LZO
B)
Bzip2
C)
Gzip
D)
All of the above

Correct Answer :   Gzip


Explanation : gzip is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman Coding.

A)
24k
B)
36k
C)
128k
D)
256k

Correct Answer :   256k


Explanation :  LZO was designed with speed in mind : it decompresses about twice as fast as gzip, meaning it’s fast enough to keep up with hard drive read speeds.

A)
parity
B)
checksum
C)
metastore
D)
none of the above

Correct Answer :   checksum


Explanation : When a client creates an HDFS file, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace.

A)
NameNode
B)
ActionNode
C)
DataNode
D)
All of the above

Correct Answer :   NameNode


Explanation : If the NameNode machine fails, manual intervention is necessary. Currently, automatic restart and failover of the NameNode software to another machine is not supported.

A)
FsImage
B)
DsImage
C)
FsImages
D)
All of the above

Correct Answer :   FsImage


Explanation : A corruption of these files can cause the HDFS instance to be non-functional.

A)
Datanots
B)
Snapshots
C)
Data Image
D)
All of the above

Correct Answer :   Snapshots


Explanation : One usage of the snapshot feature may be to roll back a corrupted HDFS instance to a previously known good point in time.

A)
end
B)
failover
C)
scalability
D)
all of the above

Correct Answer :   failover


Explanation : If the NameNode machine fails, manual intervention is necessary.

A)
1, 2
B)
2, 3
C)
3, 2
D)
All of the above

Correct Answer :   3, 2


Explanation : HDFS has a simple yet robust architecture that was explicitly designed for data reliability in the face of faults and failures in disks, nodes and networks.

A)
ActionNode
B)
DataNode
C)
Both (A) and (B)
D)
NameNode

Correct Answer :   NameNode


Explanation : HDFS tolerates failures of storage servers (called DataNodes) and its disks.

A)
Dynamic typing
B)
Untagged data
C)
No manually-assigned field IDs
D)
All of the above

Correct Answer :   Dynamic typing


Explanation : Avro does not require that code be generated.

A)
Avro
B)
Thrift
C)
Protocol Buffers
D)
None of the above

Correct Answer :   Avro


Explanation : Avro is optimized to minimize the disk space needed by our data and it is flexible.

A)
UID
B)
Static number
C)
Name
D)
None of the above

Correct Answer :   Static number


Explanation : Avro resolves possible conflicts through the name of the field.

A)
RDC
B)
RMC
C)
RPC
D)
All of the above

Correct Answer :   RPC


Explanation : When Avro is used in RPC, the client and server exchange schemas in the connection handshake.

A)
AvroJob.Reflect(jConf);
B)
AvroJob.setReflect(jConf);
C)
Job.setReflect(jConf);
D)
None of the

Correct Answer :   Job.setReflect(jConf);


Explanation : For strongly typed languages like Java, it also provides a generation code layer, including RPC services code generation.

A)
alter
B)
set
C)
reset
D)
select

Correct Answer :   alter


Explanation : Alter is the command used to make changes to an existing table.

A)
MAX_FILESIZE
B)
MEMSTORE_FLUSH
C)
MEMSTORE_FLUSHSIZE
D)
All of the above

Correct Answer :   MEMSTORE_FLUSH


Explanation : Using alter, you can set and remove table scope operators such as MAX_FILESIZE, READONLY, MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc.

A)
delColumn()
B)
removeColumn()
C)
Both (A) and (B)
D)
deleteColumn()

Correct Answer :   deleteColumn()

A)
Collector
B)
Configuration
C)
Component
D)
None of the above

Correct Answer :   Configuration


Explanation : You can create a configuration object using the create() method of the HbaseConfiguration class.

A)
hbase.xml
B)
hbase-site-conf.xml
C)
hbase-site.xml
D)
None of the above

Correct Answer :   hbase-site.xml


Explanation : Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase.

A)
map
B)
reduce
C)
reducer
D)
mapper

Correct Answer :   map


Explanation : The Mapper outputs are sorted and then partitioned per Reducer.

A)
InputSplit
B)
OutputSplit
C)
InputSplitStream
D)
All of the mentioned

Correct Answer :   InputSplit


Explanation : Mapper implementations are passed the JobConf for the job via the JobConfigurable.configure(JobConf) method and override it to initialize themselves.

A)
Reporter
B)
Partitioner
C)
OutputSplit
D)
All of the above

Correct Answer :   Partitioner


Explanation : Users can control the grouping by specifying a Comparator via JobConf.setOutputKeyComparatorClass(Class).

A)
Reporter
B)
Partitioner
C)
OutputSplit
D)
All of the above

Correct Answer :   Reporter


Explanation : Reporter is also used to update Counters, or just indicate that they are alive.

A)
OutputCollector.put
B)
OutputCollector.get
C)
OutputCollector.receive
D)
OutputCollector.collect

Correct Answer :   OutputCollector.collect

A)
JobConf.setNumTasks(int)
B)
JobConf.setNumMapTasks(int)
C)
Both (A) and (B)
D)
JobConf.setNumReduceTasks(int)

Correct Answer :   JobConf.setNumReduceTasks(int)


Explanation : Reducer has 3 primary phases : Shuffle, Sort and Reduce.

A)
MergePartitioner
B)
HashedPartitioner
C)
HashPartitioner
D)
None of the above

Correct Answer :   HashPartitioner


Explanation : The total number of partitions is the same as the number of reduce tasks for the job.

A)
Collector
B)
Partitioner
C)
Compactor
D)
All of the above

Correct Answer :   Partitioner


Explanation : Partitioner controls the partitioning of the keys of the intermediate map-outputs.

A)
JobConf
B)
JobConfig
C)
JobConfiguration
D)
All of the above

Correct Answer :   JobConf


Explanation : JobConf is typically used to specify the Mapper, combiner (if any), Partitioner, Reducer, InputFormat, OutputFormat and OutputCommitter implementations.

A)
io.sort.factor
B)
mapred.inmem.merge.threshold
C)
mapred.job.shuffle.merge.percent
D)
mapred.job.reduce.input.buffer.percen

Correct Answer :   mapred.job.reduce.input.buffer.percen


Explanation : When the reduce begins, map outputs will be merged to disk until those that remain are under the resource limit this defines.

A)
Config
B)
Configuration
C)
OutputConfig
D)
None of the above

Correct Answer :   Configuration


Explanation : Configurations are specified by resources.

A)
coredefault.xml
B)
core-default.xml
C)
core-site.xml
D)
All of the above

Correct Answer :   core-site.xml


Explanation : core-default.xml is read-only defaults for hadoop.

A)
Clear
B)
getClass
C)
addResource
D)
None of the above

Correct Answer :   Clear


Explanation : getClass is used to get the value of the name property as a Class.

A)
isDeprecatedif
B)
isDeprecated
C)
setDeprecated
D)
All of the above

Correct Answer :   isDeprecated


Explanation : Method returns true if the key is deprecated and false otherwise.

A)
addResource
B)
addDefaultResource
C)
None of the above
D)
setDeprecatedProperties

Correct Answer :   setDeprecatedProperties


Explanation : setDeprecatedProperties sets all deprecated properties that are not currently set but have a corresponding new property that is set.

A)
addDeprecation
B)
addDefaultResource
C)
setDeprecatedProperties
D)
addResource

Correct Answer :   addResource


Explanation : The properties of this resource will override the properties of previously added resources unless they were marked final. addResource adds a configuration resource.

A)
Hiver
B)
Serde
C)
Mover
D)
None of the above

Correct Answer :   Mover


Explanation : Mover periodically scans the files in HDFS to check if the block placement satisfies the storage policy.

A)
hdfs storagepolicies
B)
hdfs storage
C)
hd storagepolicies
D)
All of the above

Correct Answer :   hdfs storagepolicies


Explanation : Arguments are none for the hdfs storagepolicies command.

A)
getPriority()
B)
getJobState()
C)
Both (A) and (B)
D)
getJobName()

Correct Answer :   getJobName()

A)
getPriority()
B)
getJobState()
C)
getJobName()
D)
getTaskCompletionEvents(int startFrom)

Correct Answer :   getTaskCompletionEvents(int startFrom)

A)
0.1.-1.0
B)
0.0-1.0
C)
1.0-2.0
D)
2.0-3.0

Correct Answer :   0.0-1.0


Explanation : mapProgress() is used to get the progress of the job’s map-tasks, as a float between 0.0 and 1.0.

A)
SSL
B)
SSH
C)
Kerberos
D)
None of the above

Correct Answer :   Kerberos


Explanation : Each service reads authenticate information saved in keytab file with appropriate permission.

A)
SSL
B)
SSH
C)
Kerberos
D)
None of the above

Correct Answer :   SSL


Explanation : AES offers the greatest cryptographic strength and the best performance.

A)
WebProxy
B)
ProxyServer
C)
WebAppProxy
D)
None of the above

Correct Answer :   WebAppProxy


Explanation : If security is enabled it will warn users before accessing a potentially unsafe web application. Authentication and authorization using the proxy is handled just like any other privileged web application.

A)
LinuxController
B)
LinuxTaskController
C)
TaskController
D)
None of the above

Correct Answer :   LinuxTaskController


Explanation : LinuxTaskController keeps track of all paths and directories on datanode.

A)
NodeManager
B)
DataManager
C)
ValidationManager
D)
None of the above

Correct Answer :   NodeManager


Explanation : To recap, local file-system permissions need to be modified.

A)
ROM_DISK
B)
RAM_DISK
C)
ARCHIVE
D)
All of the above

Correct Answer :   RAM_DISK


Explanation : DISK is the default storage type.

A)
ARCHIVE
B)
ROM_DISK
C)
RAM_DISK
D)
All of the above

Correct Answer :   ARCHIVE


Explanation : Little compute power is added for supporting archival storage.

A)
Hot
B)
All_SSD
C)
Lazy_Persist
D)
One_SSD

Correct Answer :   One_SSD


Explanation : The remaining replicas are stored in DISK.

A)
Hot
B)
All_SSD
C)
One_SSD
D)
Lazy_Persist

Correct Answer :   Lazy_Persist


Explanation : The replica is first written in RAM_DISK and then it is lazily persisted in DISK.

A)
Hive
B)
Chuckwa
C)
YARN
D)
Incubator

Correct Answer :   YARN


Explanation : YARN is the prerequisite for Enterprise Hadoop, providing resource management and a central platform to deliver consistent operations, security, and data governance tools across Hadoop clusters.

A)
Hive
B)
Imphala
C)
MapReduce
D)
All of the above

Correct Answer :   MapReduce


Explanation : Multi-tenant data processing improves an enterprise’s return on its Hadoop investments.

A)
NodeManager
B)
ApplicationMaster
C)
ResourceManager
D)
All of the above

Correct Answer :   ApplicationMaster


Explanation : Each ApplicationMaster has the responsibility for negotiating appropriate resource containers from the schedule.

A)
0.23
B)
0.24
C)
0.26
D)
0.30

Correct Answer :   0.23


Explanation : The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker.

A)
Master
B)
Manager
C)
Both (A) and (B)
D)
Scheduler

Correct Answer :   Scheduler


Explanation : The Scheduler is a pure scheduler in the sense that it performs no monitoring or tracking of status for the application.

A)
Partition
B)
Networked
C)
Hierarchical
D)
None of the above

Correct Answer :   Hierarchical


Explanation : The Scheduler has a pluggable policy plugin, which is responsible for partitioning the cluster resources among the various queues, applications etc.

A)
bin
B)
hive
C)
home
D)
hadoop

Correct Answer :   bin


Explanation : Running the yarn script without any arguments prints the description for all commands.

A)
rear
B)
root
C)
domain
D)
All of the above

Correct Answer :   root

A)
25%
B)
50%
C)
75%
D)
100%

Correct Answer :   100%


Explanation : Queues cannot be deleted, only the addition of new queues is supported.

A)
xml
B)
jar
C)
java
D)
C code

Correct Answer :   jar


Explanation : Usage: yarn jar <jar> [mainClass] args…

A)
-format-state
B)
-form-state-store
C)
-format-state-store
D)
None of the above

Correct Answer :   -format-state-store


Explanation : -format-state-store formats the RMStateStore.

A)
run
B)
admin
C)
proxyserver
D)
rmadmin

Correct Answer :   rmadmin

A)
TextInputFormat
B)
OutputInputFormat
C)
TextOutputFormat
D)
None of the above

Correct Answer :   TextInputFormat

A)
Streaming
B)
Mapreduce
C)
Orchestration
D)
All of the above

Correct Answer :   Streaming

A)
datanode
B)
split
C)
textformat
D)
None of the above

Correct Answer :   split


Explanation : Each split is divided into records, and the map processes each record—a key-value pair—in turn.

A)
TextInputFormat
B)
TextOutputFormat
C)
OutputInputFormat
D)
InputFormat

Correct Answer :   InputFormat


Explanation : As a MapReduce application writer, you don’t need to deal with InputSplits directly, as they are created by an InputFormat.

A)
MultithreadedMap
B)
MultithreadedRunner
C)
MultithreadedMapRunner
D)
SinglethreadedMapRunner

Correct Answer :   MultithreadedMapRunner


Explanation : A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs, which it passes to the map function.

A)
FileTextFormat
B)
FileInputFormat
C)
FileOutputFormat
D)
None of the above

Correct Answer :   FileInputFormat


Explanation : FileInputFormat provides implementation for generating splits for the input files.

A)
TextFileInputFormat
B)
CombineFileOutputFormat
C)
CombineFileInputFormat
D)
None of the above

Correct Answer :   CombineFileInputFormat


Explanation : CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job.

A)
LongWritable
B)
ShortReadable
C)
LongReadable
D)
All of the above

Correct Answer :   LongWritable


Explanation : The value is the contents of the line, excluding any line terminators (newline, carriage return), and is packaged as a Text object.

A)
FileValueTextInputFormat
B)
KeyValueTextInputFormat
C)
KeyValueTextOutputFormat
D)
All of the above

Correct Answer :   KeyValueTextOutputFormat


Explanation : To interpret such files correctly, KeyValueTextInputFormat is appropriate.

A)
HDFS
B)
Library
C)
Generic
D)
Task

Correct Answer :   Task


Explanation : FileInputFormat splits only large files(Here “large” means larger than an HDFS block).

A)
fsk
B)
fsck
C)
fetchdt
D)
None of the above

Correct Answer :   fsck


Explanation : fsck is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks.

A)
rec
B)
fsk
C)
fetdt
D)
fetchdt

Correct Answer :   fetchdt


Explanation : Delegation token can be later used to access secure server from a non secure client.

A)
full
B)
partial
C)
commit
D)
recovery

Correct Answer :   recovery


Explanation : Recovery mode can cause you to lose data, you should always backup your edit log and fsimage before using it.

A)
Safe
B)
Recover
C)
Rollback
D)
None of the above

Correct Answer :   Rollback


Explanation : dfsadmin runs a HDFS dfsadmin client.

A)
jobtracker
B)
mradmin
C)
tasktracker
D)
None of the above

Correct Answer :   jobtracker

A)
Lack of tools
B)
Lack of web interface
C)
Lack of configuration management
D)
None of the above

Correct Answer :   Lack of configuration management


Explanation : Without a centralized configuration management framework, you end up with a number of issues that can cascade just as your usage picks up.

A)
Alex
B)
Puppet
C)
Acem
D)
None of the above

Correct Answer :   Puppet


Explanation : Administrators may use configuration management systems such as Puppet and Chef to manage processes.

A)
Upgrade Hadoop
B)
React to incidents
C)
Remove worker nodes
D)
All of the above

Correct Answer :   All of the above


Explanation : The most common reason administrators restart Hadoop processes is to enact configuration changes.

A)
JVM
B)
JMX
C)
JVX
D)
None of the above

Correct Answer :   JMX


Explanation : Hadoop includes several managed beans (MBeans), which expose Hadoop metrics to JMX-aware applications.

A)
Two
B)
Three
C)
Four
D)
Five

Correct Answer :   Two


Explanation : You can run Pig (execute Pig Latin statements and Pig commands) using various mode: Interactive and Batch Mode.

A)
A LOAD statement to read data from the file system
B)
A series of “transformation” statements to process the data
C)
A DUMP statement to view results or a STORE statement to save the results
D)
All of the above

Correct Answer :   All of the above


Explanation : A DUMP or STORE statement is required to generate output.

A)
LOAD
B)
READ
C)
WRITE
D)
None of the above

Correct Answer :   LOAD


Explanation : PigStorage is the default load function.

A)
$ pig …
B)
$ pig -x local ...
C)
$ pig -x tez_local …
D)
None of the above

Correct Answer :   $ pig -x local..


Explanation : Specify local mode using the -x flag (pig -x local).

A)
LoadCaster
B)
LoadPushDown
C)
LoadMetadata
D)
All of the above

Correct Answer :   LoadMetadata


Explanation : Most implementation of loaders don’t need to implement this unless they interact with some metadata system.

A)
getShipFiles()
B)
getCacheFiles()
C)
relativeToAbsolutePath()
D)
setUdfContextSignature()

Correct Answer :   setUdfContextSignature()


Explanation : The signature can be used to store into the UDFContext any information which the Loader needs to store between various method invocations in the front end and back end.

A)
getShipFiles()
B)
getCacheFiles()
C)
relativeToAbsolutePath()
D)
setUdfContextSignature()

Correct Answer :   getShipFiles()


Explanation : The default implementation provided in LoadFunc handles this for FileSystem locations.

A)
getCacheFiles()
B)
relativeToAbsolutePath()
C)
setLocation()
D)
setUdfContextSignature()

Correct Answer :   setLocation()


Explanation : setLocation() method is called by Pig to communicate the load location to the loader.

A)
getNext()
B)
prepareToRead()
C)
relativeToAbsolutePath()
D)
All of the above

Correct Answer :   prepareToRead()


Explanation : The RecordReader can then be used by the implementation in getNext() to return a tuple representing a record of data back to pig.

A)
LoadCaster
B)
LoadMetadata
C)
LoadPushDown
D)
All of the above

Correct Answer :   LoadCaster


Explanation : LoadCaster has methods to convert byte arrays to specific types.

A)
DUMP
B)
STORE
C)
EXPLAIN
D)
DESCRIBE

Correct Answer :   DESCRIBE


Explanation : DESCRIBE returns the schema of a relation.

A)
DUMP
B)
EXPLAIN
C)
STORE
D)
DESCRIBE

Correct Answer :   EXPLAIN

A)
STORE
B)
EXPLAIN
C)
DESCRIBE
D)
ILLUSTRATE

Correct Answer :   ILLUSTRATE


Explanation : ILLUSTRATE allows you to test your programs on small datasets and get faster turnaround times.

A)
Pig Stats
B)
PStatistics
C)
Pig Statistics
D)
None of the above

Correct Answer :   Pig Statistics


Explanation : The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file.

A)
$pig_ ant pigunit-jar
B)
$pig_tr ant pigunit-jar
C)
$pig_trunk ant pigunit-jar
D)
None of the above

Correct Answer :   $pig_trunk ant pigunit-jar


Explanation : The compile will create the pigunit.jar file.

A)
\q
B)
\d alias
C)
\de alias
D)
None of the above

Correct Answer :   \d alias


Explanation : If alias is ignored last defined alias will be used.

A)
exec
B)
throw
C)
error
D)
execute

Correct Answer :   exec


Explanation : With the exec command, store statements will not trigger execution; rather, the entire script is parsed before execution starts.

A)
pig.jar
B)
tutorial.jar
C)
excite.log.bz2
D)
script2-local.pig

Correct Answer :   tutorial.jar


Explanation : tutorial.jar contains java classes also.

A)
{%declare | %default}
B)
{%declare | %default} param_name param_value
C)
{%declare | %default} param_name param_value cmd
D)
pig {-param param_name = param_value | -param_file file_name} [-debug | -dryrun] script

Correct Answer :   pig {-param param_name = param_value | -param_file file_name} [-debug | -dryrun] script


Explanation : Parameter Substitution is used to substitute values for parameters at run time.

A)
Parameter files
B)
Command line parameters
C)
Declare and default preprocessors
D)
Both parameter files and command line parameters

Correct Answer :   Both parameter files and command line parameters


Explanation : Parameters and command parameters are scanned in FIFO manner.

A)
functional
B)
declarative
C)
procedural
D)
All of the above

Correct Answer :   procedural


Explanation : In SQL users can specify that data from two tables must be joined, but not what join implementation to use.

A)
ETL
B)
Lazy evaluation
C)
Supports pipeline splits
D)
All of the above

Correct Answer :   All of the above


Explanation : Pig Latin ability to include user code at any point in the pipeline is useful for pipeline development.

A)
pig.input.dirs
B)
pig.job
C)
pig.feature
D)
None of the above

Correct Answer :   pig.input.dirs


Explanation : pig.input.dirs contains comma-separated list of input directories for the job.

A)
logj4
B)
log4j
C)
log4i
D)
log4l

Correct Answer :   log4j


Explanation : By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation.

187 .
What does the hive.rrot.logger specified in the following statement?
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,console 
A)
Log level
B)
Log source
C)
Log modes
D)
All of the above

Correct Answer :   Log level


Explaination : hive.root.logger specified the logging level as well as the log destination. Specifying console as the target sends the logs to the standard error.

A)
SqlLine
B)
CLilLine
C)
BeeLine
D)
HiveLine

Correct Answer :   BeeLine


Explanation : Beeline is a JDBC client based on SQLLine.

A)
0.9.0
B)
0.11.0
C)
0.10.0
D)
0.12.0

Correct Answer :   0.11.0


Explanation : hcat commands can be issued as hive commands, and vice versa.

A)
set hive.variable
B)
set hive.variable.substitute=true;
C)
set hive.variable.substitutevalues=false;
D)
set hive.variable.substitute=false;

Correct Answer :   set hive.variable.substitute=false;


Explanation : Variable substitution is on by default (hive.variable.substitute=true).

A)
Remote
B)
HTTP
C)
Interactive
D)
Embedded

Correct Answer :   Remote


Explanation : In HTTP mode, the message body contains Thrift payloads.

A)
“STORED AS AVRO”
B)
“STORED AS HIVE”
C)
“STORED AS SERDE”
D)
“STORED AS AVROHIVE”

Correct Answer :   “STORED AS AVRO”


Explanation : AvroSerDe takes care of creating the appropriate Avro schema from the Hive table schema.

A)
Set
B)
Intersection
C)
Union
D)
All of the above

Correct Answer :   Union


Explanation : A null in a field that is not so defined will result in an exception during the save. No changes need be made to the Hive schema to support this, as all fields in Hive can be null.

A)
Avro
B)
Hive
C)
Map Reduce
D)
All of the above

Correct Answer :   Hive


Explanation : If you copy these files out, you’ll likely want to rename them with .avro.

A)
row.literal
B)
schema.lit
C)
schema.literal
D)
all of the above

Correct Answer :   schema.literal


Explanation : You can embed the schema directly into the create statement.

A)
$ROW
B)
$SCHEMA
C)
$NAMESPACES
D)
$SCHEMASPACES

Correct Answer :   $SCHEMA


Explanation : Use none to ignore either avro.schema.literal or avro.schema.url.

A)
ORC
B)
OPC
C)
ODC
D)
None of the above

Correct Answer :   ORC


Explanation : The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data.

A)
Row
B)
Row-oriented
C)
Tuple-oriented
D)
Column-oriented

Correct Answer :   Column-oriented


Explanation : HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data.

A)
Bigtable
B)
BigTop
C)
Scanner
D)
FoundationDB

Correct Answer :   Bigtable


Explanation : Bigtable acts up on Google File System, likewise Apache HBase works on top of Hadoop and HDFS.

A)
user
B)
status
C)
whoami
D)
version

Correct Answer :   whoami


Explanation : status command provides the status of HBase, for example, the number of servers.

A)
HTable
B)
HDescriptor
C)
HTabDescriptor
D)
HTableDescriptor

Correct Answer :   HTableDescriptor


Explanation : Java provides an Admin API to achieve DDL functionalities through programming.

A)
Put
B)
Result
C)
Get
D)
Value

Correct Answer :   Result


Explanation : Get the result by passing your Get class instance to the get method of the HTable class. This method returns the Result class object, which holds the requested result.

A)
Htable
B)
Master Server
C)
Region Server
D)
All of the above

Correct Answer :   Region Server


Explanation : Region Server handle read and write requests for all the regions under it.

A)
Scala
B)
Hadoop
C)
Imphala
D)
Hive

Correct Answer :   Hadoop


Explanation : The data storage will be in the form of regions (tables). These regions will be split up and stored in region servers.

A)
ensemble
B)
chunks
C)
subdomains
D)
None of the above

Correct Answer :   ensemble


Explanation : As long as a majority of the servers are available, the ZooKeeper service will be available.

A)
Reliability
B)
Flexibility
C)
Scalability
D)
Interactivity

Correct Answer :   Reliability


Explanation : Once an update has been applied, it will persist from that time forward until a client overwrites the update.

A)
write
B)
read-write
C)
read-dominant
D)
none of the above

Correct Answer :   read-dominant


Explanation : ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.

A)
2.0.0
B)
3.0.0
C)
4.0.0
D)
6.0.0

Correct Answer :   3.0.0


Explanation : Old pre-3.0.0 clients are not guaranteed to operate against upgraded 3.0.0 servers and vice-versa.

A)
rnodes
B)
hnodes
C)
vnodes
D)
znodes

Correct Answer :   znodes


Explanation : Every znode is identified by a path, with path elements separated by a slash.

A)
iwrite
B)
iread
C)
icount
D)
inotify

Correct Answer :   inotify


Explanation : A client can request for Zookeeper to generate the node name to avoid collisions.

A)
availability
B)
flexibility
C)
scalability
D)
interactivity

Correct Answer :   availability


Explanation : The clients can thus ask another ZooKeeper master if the first fails to answer.

A)
Neo4j
B)
101tec
C)
Katta
D)
Helprace

Correct Answer :   Katta


Explanation : Zookeeper is used for node, master and index management in the grid.

A)
Katta
B)
Rackspace
C)
Helprace
D)
None of the above

Correct Answer :   Rackspace


Explanation : ZooKeeper also provides distributed locking for connections to prevent a cluster from overwhelming servers.

A)
3-node
B)
4-node
C)
5-node
D)
6-node

Correct Answer :   3-node


Explanation : Zookeeper is used to manage a system build out of hadoop, katta, oracle batch jobs and a web component.

A)
BigInt
B)
SmallInt
C)
MediumInt
D)
BigInteger

Correct Answer :   BigInteger


Explanation : The BigDecimal/BigInteger can also return itself as a ‘long’ value.

A)
LobSerializer
B)
LargeObjectLoader
C)
FieldMapProcessor
D)
DelimiterSet

Correct Answer :   DelimiterSet


Explanation : Delimiter set is created with the specified delimiters.

A)
DelimiterSet
B)
SmallObjectLoader
C)
JdbcWritableBridge
D)
FieldMapProcessor

Correct Answer :   JdbcWritableBridge


Explanation : JdbcWritableBridge class contains a set of methods which can read db columns from a ResultSet into Java types.

A)
FIELD_LIMITER
B)
RECORD_DELIMITER
C)
FIELD_DELIMITER
D)
None of the above

Correct Answer :   RECORD_DELIMITER


Explanation : Class RecordParser parses a record containing one or more fields.

A)
RecordParser
B)
LargeObjectLoader
C)
ProcessingException
D)
None of the above

Correct Answer :   RecordParser


Explanation : Multiple threads must use separate instances of RecordParser.

A)
SqoopWrite
B)
SqoopRead
C)
SqoopRecord
D)
None of the above

Correct Answer :   SqoopRecord


Explanation : Class SqoopRecord is an interface implemented by the classes generated by sqoop orm.ClassWriter.

A)
IBM
B)
Cloudera
C)
Microsoft
D)
All of the above

Correct Answer :   Microsoft


Explanation : Sqoop allows users to import data from their relational databases into HDFS and vice versa.

A)
Hive
B)
BigTOP
C)
Imphala
D)
Map reduce

Correct Answer :   Map reduce


Explanation : While fetching, it throttles the number of mappers accessing data on RDBMS to avoid DDoS.

A)
Hive
B)
Sqoop
C)
Oozie
D)
Imphala

Correct Answer :   Sqoop


Explanation : Sqoop is a connectivity tool for moving data from non-Hadoop data stores – such as relational databases and data warehouses – into Hadoop.

A)
Oracle
B)
MySQL
C)
PostreSQL
D)
SQL Server

Correct Answer :   SQL Server


Explanation : Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.

A)
BLOB
B)
CLOB
C)
LONGVARBINARY
D)
All of the above

Correct Answer :   All of the above


Explanation : Use JDBC-based imports for these columns; do not supply the –direct argument to the import tool.

A)
10.2.0
B)
10.3.0
C)
11.2.0
D)
12.2.0

Correct Answer :   10.2.0


Explanation : Oracle is notable in its different approach to SQL from the ANSI standard and its non-standard JDBC driver. Therefore, several features work differently.

A)
JBat
B)
JBash
C)
JBatch
D)
None of the above

Correct Answer :   JBatch


Explanation : BatchEE provides a set of useful extensions for this specification.

A)
Blur
B)
Calcite
C)
JBatch
D)
All of the above

Correct Answer :   Calcite


Explanation : Calcite also provides advanced query optimization, for data not residing in a traditional database.

A)
Ignite
B)
Droids
C)
DataFu
D)
Corinthia

Correct Answer :   Corinthia


Explanation : The toolkit is small, portable, and flexible, with minimal dependencies.

A)
Lens
B)
Kylin
C)
MRQL
D)
log4cxx2

Correct Answer :   Kylin


Explanation : MRQL is a query processing and optimization system for large-scale, distributed data analysis.

A)
set
B)
relational
C)
structured
D)
flow-based

Correct Answer :   flow-based


Explanation : NiFi is incubator made by Billie Rinaldi.

A)
HTML5
B)
C++
C)
Java
D)
Javascript

Correct Answer :   HTML5


Explanation : Ripple is a cross platform and cross runtime testing/debugging tool.

A)
Zeppelin
B)
ACE
C)
Abdera
D)
Accumulo

Correct Answer :   Zeppelin


Explanation : Zeppelin is used for general-purpose data processing systems such as Apache Spark, Apache Flink, etc.

A)
Abdera
B)
Zeppelin
C)
ACE
D)
Accumulo

Correct Answer :   ACE


Explanation : ACE allows you to manage and distribute artifacts.

A)
Buildr
B)
Bloodhound
C)
Cassandra
D)
All of the above

Correct Answer :   Bloodhound


Explanation : Buildr is a simple and intuitive build system for Java projects written in Ruby.

A)
Cazerra
B)
Cordova
C)
CouchDB
D)
All of the above

Correct Answer :   Cordova


Explanation : The project entered incubation as Callback, but decided to change its name to Cordova on 2011-11-28.

A)
CXF
B)
DeltaSpike
C)
DeltaCloud
D)
None of the above

Correct Answer :   CXF


Explanation : DeltaSpike is a collection of JSR-299 (CDI) Extensions for building applications on the Java SE and EE platforms.

A)
Oozie
B)
BigTop
C)
Imphala
D)
Chukwa

Correct Answer :   Chukwa


Explanation : Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and MapReduce framework and inherits Hadoop’s scalability and robustness.

A)
0.90.5+.
B)
0.10.4+.
C)
0.90.4+.
D)
None of the above

Correct Answer :   0.90.4+.


Explanation : The Chukwa cluster management scripts rely on ssh; these scripts, however, are not required if you have some alternate mechanism for starting and stopping daemons.

A)
Agents
B)
HCatalog
C)
Collectors
D)
HBase Table

Correct Answer :   Agents


Explanation : Setting the option chukwaAgent.control.remote will disallow remote connections to the agent control socket.

A)
Agents
B)
Collectors
C)
HBase Table
D)
None of the above

Correct Answer :   Collectors


Explanation : Most commonly, collectors simply write all received to HBase or HDFS.

A)
8008
B)
8080
C)
8070
D)
None of the above

Correct Answer :   8080


Explanation : Port number can be configured in chukwa-collector.conf.xml

A)
PipelineWriter
B)
PipelineStageWriter
C)
SocketTeeWriter
D)
None of the above

Correct Answer :   SocketTeeWriter


Explanation : PipelineStageWriter lets you string together a series of PipelineableWriters for pre-processing or post-processing incoming data.

A)
Hive
B)
Oozie
C)
Imphala
D)
Ambari

Correct Answer :   Ambari


Explanation : The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters.

A)
Ganglia
B)
Nagios
C)
Nagaond
D)
All of the above

Correct Answer :   Ganglia


Explanation : Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.

A)
view
B)
trigger
C)
schema
D)
none of the above

Correct Answer :   view

A)
RestLess
B)
RESTful
C)
Web Service
D)
None of the above

Correct Answer :   RESTful


Explanation : RESTful APIs enables integration with enterprise systems.

A)
SSL
B)
SSH
C)
Kerberos
D)
REST

Correct Answer :   Kerberos


Explanation : Kerberos requires a client side library and complex client side configuration.

A)
collector
B)
comparator
C)
load balancer
D)
all of the above

Correct Answer :   load balancer


Explanation : Knox is a stateless reverse proxy framework.

A)
SSL
B)
SSO
C)
SSH
D)
Kerberos

Correct Answer :   SSO


Explanation : Knox allows identities from those enterprise systems to be used for seamless, secure access to Hadoop clusters.

A)
SSH
B)
SSO
C)
SSL
D)
All of the above

Correct Answer :   SSH

A)
Zero
B)
Double
C)
Multiple
D)
Single

Correct Answer :   Single


Explanation : The Apache Knox Gateway is a system that provides a single point of authentication and access for Apache.

A)
AFS
B)
HCC
C)
ADF
D)
ASF

Correct Answer :   ASF


Explanation : The Apache Software Foundation(ASF) is sponsored by the Apache Incubator PMC.

A)
3.4
B)
3.5
C)
3.6
D)
3.7

Correct Answer :   3.6


Explanation : The user should be able to install using a single update site for all Hadoop-related Eclipse tools.

A)
Indiavo
B)
Indigo
C)
Hadovo
D)
Rainbow

Correct Answer :   Indigo


Explanation : HDT aims at bringing plugins in eclipse to simplify development on Hadoop platform.

A)
RAP
B)
RBP
C)
RVP
D)
RXP

Correct Answer :   RAP


Explanation : RCP/RAP developers package has the core Eclipse PDE tools.

A)
Driver
B)
Mapper
C)
Reducer
D)
All of the above

Correct Answer :   All of the above


Explanation : HDT provides wizards for the creation of Hadoop Based Projects.

A)
Stonebraker
B)
Doug Cutting
C)
Matei Zaharia
D)
Mahek Zaharia

Correct Answer :   Matei Zaharia


Explanation : Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley.

A)
RDDs
B)
Spark SQL
C)
Spark Streaming
D)
All of the above

Correct Answer :   RDDs


Explanation : Spark SQL provides SQL language support, with command-line interfaces and ODBC/JDBC server.

A)
MLlib
B)
RDDs
C)
GraphX
D)
Spark Streaming

Correct Answer :   Spark Streaming


Explanation : Spark Streaming ingests data in mini-batches and performs RDD transformations on those mini-batches of data.

A)
RDDs
B)
MLlib
C)
GraphX
D)
Spark Streaming

Correct Answer :   MLlib


Explanation : MLlib implements many common machine learning and statistical algorithms to simplify large scale machine learning pipelines.

A)
10
B)
20
C)
30
D)
80

Correct Answer :   10


Explanation : Spark architecture has proven scalability to over 8000 nodes in production.

A)
NF
B)
NR
C)
NG
D)
ND

Correct Answer :   NG


Explanation : Flume 1.3.0 has been put through many stress and regression tests, is stable, production-ready software, and is backwards-compatible with Flume 1.2.0.

A)
Sinks
B)
Decorator
C)
Source
D)
All of the above

Correct Answer :   Source


Explanation : A source can be any data source, and Flume has many predefined source adapters.

A)
Basic
B)
Mid
C)
Collector Tier Event
D)
Agent Tier Event

Correct Answer :   Agent Tier Event


Explanation : All agents in a specific tier could be given the same name; One configuration file with … Clients send Events to Agents; Agents hosts number Flume components.

A)
Basic
B)
Agent Tier Event
C)
Collector Tier Event
D)
None of the above

Correct Answer :   Basic

A)
Lucy
B)
Lucene Core
C)
Solr
D)
All of the above

Correct Answer :   Lucene Core


Explanation : Lucene provides spellchecking, hit highlighting and advanced analysis/tokenization capabilities.

A)
Lucy
B)
Solr
C)
PyLucene
D)
Lucene Core

Correct Answer :   Solr


Explanation : Solr provides hit highlighting, faceted search, caching, replication, and a web admin interface.

A)
OPR
B)
ORS
C)
ORP
D)
OSR

Correct Answer :   ORP


Explanation : Open Relevance Project is used for relevance testing and performance.

A)
PMC
B)
RPC
C)
CPM
D)
All of the above

Correct Answer :   PMC


Explanation : PyLucene was previously hosted at the Open Source Applications Foundation.

A)
2 TB
B)
1 TB
C)
500 GB
D)
150GB

Correct Answer :   150GB


Explanation : Lucene offers powerful features through a simple API.

A)
20%
B)
50%
C)
70%
D)
100%

Correct Answer :   20%


Explanation : Lucene provides incremental indexing as fast as batch indexing.

A)
2
B)
3
C)
4
D)
5

Correct Answer :   3


Explanation : Just like Hadoop, Hama has distinct between three modes.

A)
Local Mode
B)
Distributed Mode
C)
Pseudo Distributed Mode
D)
All of the above

Correct Answer :   Local Mode


Explanation : This mode can be configured via the bsp.master.address property to local.

A)
groom
B)
grsvers
C)
grervers
D)
groomservers

Correct Answer :   groomservers


Explanation : Distributed Mode is used when you have multiple machines.

A)
ISP
B)
USP
C)
BSP
D)
MPP

Correct Answer :   BSP


Explanation : Running/completed/Failed jobs is detailed in UI interface.

A)
Pig
B)
Hama
C)
Hive
D)
Hadoop

Correct Answer :   Hama


Explanation : HAMA is a distributed framework on Hadoop for massive matrix algorithms.

A)
SerDE
B)
DocSear
C)
SaerDear
D)
All of the above

Correct Answer :   SerDE


Explanation : By default, HCatalog supports RCFile, CSV, JSON, and SequenceFile, and ORC file formats. To use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.

A)
DCL
B)
DDL
C)
DML
D)
TCL

Correct Answer :   DDL


Explanation : HCatalog provides read and write interfaces for Pig and MapReduce and uses Hive’s command line interface for issuing data definition and metadata exploration commands.

A)
HCatLoad
B)
HCLoader
C)
Both (A) and (B)
D)
HCatLoader

Correct Answer :   HCatLoader


Explanation : HCatLoader accepts a table to read data from; you can indicate which partitions to scan by immediately following the load statement with a partition filter statement.

A)
InputFormat
B)
OutputFormat
C)
HCatInputFormat
D)
HCatOutputFormat

Correct Answer :   HCatInputFormat


Explanation : The HCatalog interface for MapReduce — HCatInputFormat and HCatOutputFormat — is an implementation of Hadoop InputFormat and OutputFormat.

A)
put
B)
get
C)
setOutput
D)
setOut

Correct Answer :   setOutput


Explanation : You can write to multiple partitions if the partition key(s) are columns in the data being stored.

A)
Hive
B)
Pig
C)
Hama
D)
Oozie

Correct Answer :   Hive


Explanation : Partitions are multi-dimensional and not hierarchical. Records are divided into columns.

A)
hcat.append.limit
B)
hcat.desired.partition.num.splits
C)
hcatalog.hive.client.cache.disabled
D)
hcatalog.hive.client.cache.expiry.time

Correct Answer :   hcatalog.hive.client.cache.expiry.time


Explanation : This property is an int, and specifies number of seconds.

A)
HCatStam
B)
HCatStorer
C)
HamaStorer
D)
All of the above

Correct Answer :   HCatStorer


Explanation : HCatStorer is accessed via a Pig store statement.

A)
short
B)
datetime
C)
decimal
D)
biginteger

Correct Answer :   short


Explanation : Hive 0.12.0 and earlier releases support writing Pig primitive data types with HCatStorer.

A)
setOut
B)
OutputSchema
C)
setOutput
D)
setOutputSchema

Correct Answer :   setOutput


Explanation : Any other call will throw an exception saying the output format is not initialized.

A)
DROP TABLE
B)
CREATE VIEW
C)
SHOW FUNCTIONS
D)
ALTER INDEX … REBUILD

Correct Answer :   ALTER INDEX … REBUILD


Explanation : Any command which is not supported throws an exception with the message “Operation Not Supported”.

A)
Perl
B)
Java
C)
Python
D)
Javascript

Correct Answer :   Java


Explanation : Maths operations are focused on linear algebra and statistics.

A)
collocation
B)
collection
C)
compaction
D)
none of the above

Correct Answer :   collocation


Explanation : The log-likelihood score indicates the relative usefulness of a collocation with regards other term combinations in the text.

A)
Collfilter
B)
ShngleFil
C)
SingleFilter
D)
ShingleFilter

Correct Answer :   ShingleFilter


Explanation : The tools that the collocation identification algorithm are embedded within either consume tokenized text as input or provide the ability to specify an implementation of the Lucene Analyzer class perform tokenization in order to form ngrams.

A)
lcr
B)
lbr
C)
llr
D)
lar

Correct Answer :   llr


Explanation : The –minLLR option can be used to control the cutoff that prevents collocations below the specified LLR score from being emitted.

A)
CarDriver
B)
CollocDriver
C)
CollocationDriver
D)
All of the above

Correct Answer :   CollocDriver


Explanation : Each call to the mapper passes in the full set of tokens for the corresponding document using a StringTuple.

A)
CollocMerger
B)
CollocReducer
C)
CollocCombiner
D)
None of the above

Correct Answer :   CollocCombiner


Explanation : The combiner treats the entire GramKey as the key and as such, identical tuples from separate documents are passed into a single call to the combiner’s reduce method, their frequencies are summed and a single tuple is passed out via the collector.

A)
semi-structured
B)
structured
C)
unstructured
D)
none of the above

Correct Answer :   semi-structured


Explanation : Drill is an Apache open-source SQL query engine for Big Data exploration.

A)
Oozie
B)
MapR
C)
Impala
D)
All of the above

Correct Answer :   MapR


Explanation : The MapR Sandbox with Apache Drill is a fully functional single-node cluster that can be used to get an overview on Apache Drill in a Hadoop environment.

A)
JDBC
B)
ODBC
C)
ODBC-JDBC
D)
None of the above

Correct Answer :   ODBC


Explanation : Drill conforms to the stringent ANSI SQL standards ensuring compatibility with existing BI environments as well as Hive deployments.

A)
Oozie
B)
Mahout
C)
Both (A) and (B)
D)
Drill

Correct Answer :   Drill


Explanation : Users can explore live data on their own as it arrives versus spending weeks or months on data preparation, modeling, ETL and subsequent schema management.

A)
int
B)
simple
C)
nested
D)
all of the above

Correct Answer :   nested


Explanation : Users can also plug-and-play with Hive environments to enable ad-hoc low latency queries on existing Hive tables and reuse Hive’s metadata, hundreds of file formats and UDFs out of the box.

A)
union
B)
OR
C)
intersection
D)
None of the above

Correct Answer :   union


Explanation : Union operation takes a series of distinct PCollections that all have the same data type and treats them as a single virtual PCollection.

A)
Pipeline
B)
WritePipe
C)
MyPipeline
D)
ReadPipeline

Correct Answer :   MyPipeline


Explanation : Inner classes contain references to their parent outer classes, so unless MyPipeline implements the Serializable interface, the NotSerializableException will be thrown when Crunch tries to serialize the inner DoFn.

A)
TaskInputContext
B)
TaskInputOutputContext
C)
TaskOutputContext
D)
All of the above

Correct Answer :   TaskInputOutputContext


Explanation : There are also a number of helper methods for working with the objects associated with the TaskInputOutputContext.

A)
org.apache.scrunch
B)
org.apache.kcrunch
C)
Both (A) and (B)
D)
org.apache.crunch

Correct Answer :   org.apache.crunch


Explanation : Each of these specialized DoFn implementations has associated methods on the PCollection, PTable, and PGroupedTable interfaces to support common data processing steps.

A)
NLineInputFormat
B)
LineInputFormat
C)
InputLineFormat
D)
None of the above

Correct Answer :   NLineInputFormat


Explanation : We can set the value of parameter via the Source interface’s inputConf method.

A)
Grouping
B)
RowGrouping
C)
GroupingOptions
D)
None of the above

Correct Answer :   GroupingOptions


Explanation : The GroupingOptions class is immutable.

A)
Oozie v2
B)
Oozie v3
C)
Oozie v4
D)
Oozie v5

Correct Answer :   Oozie v4


Explanation : Oozie combines multiple jobs sequentially into one logical unit of work.

A)
Elliptical
B)
Acyclical
C)
Cyclical
D)
All of the above

Correct Answer :   Acyclical


Explanation : Oozie is a framework allowing to combine multiple Map/Reduce jobs into a logical unit of work.

A)
START
B)
PREP
C)
RESUME
D)
END

Correct Answer :   PREP


Explanation : Possible states for a workflow jobs are: PREP, RUNNING, SUSPENDED, SUCCEEDED, KILLED and FAILED.

A)
BigTop
B)
Oozie
C)
Flume
D)
Impala

Correct Answer :   BigTop


Explanation : Bigtop supports a wide range of components/projects, including, but not limited to, Hadoop, HBase and Spark.

A)
SUSE
B)
Ubuntu
C)
Fedora
D)
Solaris

Correct Answer :   Solaris


Explanation : Bigtop components power the leading Hadoop distros and support many Operating Systems, including Debian/Ubuntu, CentOS, Fedora, SUSE and many others.

A)
“Bigtop”
B)
“Big-trunk”
C)
“Bigtop-trunk”
D)
None of the above

Correct Answer :   “Bigtop-trunk”


Explanation : Jenken Server in turn runs several test jobs.

A)
Bigtop-VM-matrix
B)
Bigtop-trunk-repository
C)
Bigtop-trunk-packagetest
D)
None of the above

Correct Answer :   Bigtop-trunk-repository


Explanation : Bigtop-trunk-packagetest runs the package tests.

A)
BigTop
B)
Impala
C)
Oozie
D)
Lucene

Correct Answer :   Impala

A)
Cloudera
B)
IBM
C)
MicroSoft
D)
All of the above

Correct Answer :   Cloudera


Explanation : Impala is open source (Apache License), so you can self-support in perpetuity if you wish.

A)
Kafka
B)
BigTop
C)
Storm
D)
Lucene

Correct Answer :   Storm


Explanation : Storm on YARN is powerful for scenarios requiring real-time analytics, machine learning and continuous monitoring of operations.

A)
YARN
B)
Scheduler
C)
Both (A) and (B)
D)
Compaction

Correct Answer :   Compaction

A)
MapR
B)
Cloudera
C)
Hortonworks
D)
Local Cloudera

Correct Answer :   Hortonworks


Explanation : The Storm community is working to improve capabilities related to three important themes: business continuity, operations and developer productivity.

A)
Nimbus
B)
Supervisor
C)
Zookeeper
D)
None of the above

Correct Answer :   Supervisor


Explanation : ZooKeeper nodes coordinate the Storm cluster.

A)
log aggregation
B)
collection
C)
compaction
D)
all of the above

Correct Answer :   log aggregation


Explanation : Log aggregation typically collects physical log files off servers and puts them in a central place.

A)
BigTop
B)
Impala
C)
ActiveMQ
D)
Zookeeper

Correct Answer :   Zookeeper


Explanation : You can use the convenience script packaged with Kafka to get a quick-and-dirty single-node ZooKeeper instance.

A)
log.index.enable
B)
log.retention
C)
log.cleaner.enable
D)
log.flush.interval.message

Correct Answer :   log.retention


Explanation : log.cleaner.enable is configuration must be set to true for log compaction to run.

A)
Drill
B)
Directory
C)
DeviceMap
D)
DirectMemory

Correct Answer :   DeviceMap


Explanation : Drill is a distributed system for interactive analysis of large-scale datasets.

A)
ESME
B)
Directory
C)
Empire-db
D)
All of the above

Correct Answer :   ESME


Explanation : ESME allows people to discover and meet one another and get controlled access to other sources of information, all in a business process context.

A)
Etch
B)
ESME
C)
Empire-db
D)
DirectoryMap

Correct Answer :   Etch


Explanation : Etch is a cross-platform, language- and transport-independent framework.

A)
Flex
B)
Flume
C)
Flink
D)
ESME

Correct Answer :   Flink


Explanation : Stratosphere combines the scalability and programming flexibility of distributed MapReduce-like platforms with the efficiency, out-of-core execution.

A)
Oozie
B)
FtpServer
C)
Giraph
D)
Gereition

Correct Answer :   FtpServer


Explanation : Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.

A)
iBix
B)
iBAT
C)
Helix
D)
iBATIS

Correct Answer :   iBATIS


Explanation : iBATIS couples objects with stored procedures or SQL statements using an XML descriptor.

A)
Nutch
B)
Oozie
C)
Imphala
D)
Manmgy

Correct Answer :   Nutch


Explanation : Oozie is server-based workflow scheduling and coordination system to manage data processing jobs for Apache Hadoop.

A)
Olingo
B)
Bigred
C)
Nuvem
D)
Onami

Correct Answer :   Onami


Explanation : Apache Onami aims to create a community focused on the development and maintenance of a set of Google Guice extensions.

A)
RAT
B)
RTA
C)
Qpid
D)
All of the above

Correct Answer :   Qpid


Explanation : RAT became part of new Apache Creadur TLP.

A)
Rave
B)
Samza
C)
ServiceMix
D)
All of the above

Correct Answer :   ServiceMix


Explanation : ServiceMix project is Geronimo developed by James.

A)
ADH
B)
CDH
C)
BDH
D)
MDH

Correct Answer :   CDH


Explanation : Cloudera’s open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), targets enterprise-class deployments of that technology.

A)
Manager
B)
Express
C)
Standard
D)
Enterprise

Correct Answer :   Manager


Explanation : All versions may be downloaded from Cloudera’s website.

A)
Zero
B)
One
C)
Two
D)
Three

Correct Answer :   Three


Explanation : Cloudera Enterprise comes in three editions: Basic, Flex, and Data Hub.

A)
flexibilty
B)
scalability
C)
multi-tenancy
D)
all of the above

Correct Answer :   multi-tenancy


Explanation : Cloudera Express offers the fastest and easiest way to getting your Hadoop cluster up and running and exploring your first use cases.

A)
Ubuntu
B)
Windows 7
C)
Windows 8
D)
Windows Server

Correct Answer :   Windows Server


Explanation : Win32 is supported as a development platform.

A)
Collection
B)
Streaming
C)
Orchestration
D)
All of the above

Correct Answer :   Streaming


Explanation : These external jobs can be written in various programming languages such as Python or Ruby.

A)
two
B)
three
C)
four
D)
five

Correct Answer :   two


Explanation : One is the standard Hadoop Hive console, the other one is unique in Hadoop world, it is based on JavaScript.

A)
had
B)
start
C)
hadoop
D)
hadstrat

Correct Answer :   hadoop


Explanation : HDInsight is the framework for the Microsoft Azure cloud implementation of Hadoop.

A)
.NET
B)
Ubuntu
C)
Hadoop
D)
None of the above

Correct Answer :   .NET


Explanation : The Microsoft .NET Library for Avro implements the Apache Avro compact binary data interchange format for serialization for the Microsoft .NET environment.

A)
Pig
B)
Hive
C)
Ubuntu
D)
Windows Server

Correct Answer :   Hive


Explanation : Amazon EMR supports several versions of Hive, which you can install on any running cluster.

A)
org.apache.hadoop.hive.ql.io.CombineFormat
B)
org.apache.hadoop.hive.ql.iont.CombineHiveInputFormat
C)
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
D)
All of the above

Correct Answer :   org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

A)
EC2
B)
EC3
C)
EC4
D)
None of the above

Correct Answer :   EC2


Explanation : Amazon EMR has made enhancements to Hadoop and other open-source applications to work seamlessly with AWS.

A)
AMR
B)
ASQ
C)
AWES
D)
AWS

Correct Answer :   AWS


Explanation : Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to process large amounts of data efficiently.

A)
MPA
B)
MAP
C)
MPP
D)
None of the above

Correct Answer :   MPP


Explanation : Impala avoids Hive’s overhead from creating MapReduce jobs, giving it faster query times than Hive.

A)
IamWatch
B)
CloudWatch
C)
AmWatch
D)
All of the above

Correct Answer :   CloudWatch


Explanation : The current AMIs for all CoreOS channels and EC2 regions are updated frequently.

A)
EC2
B)
EC3
C)
EC4
D)
All of the above

Correct Answer :   EC2


Explanation : Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use.

A)
EC2
B)
EC3
C)
EC4
D)
All of the above

Correct Answer :   EC3


Explanation : Amazon EC2 enables you to scale up or down to handle changes in requirements or spikes in popularity, reducing your need to forecast traffic.

A)
S5
B)
S4
C)
S3
D)
S2

Correct Answer :   S2


Explanation : Amazon S3 stands for Amazon Simple Storage Service.

A)
InfoData
B)
InfoSphere
C)
InfoStream
D)
InfoSurface

Correct Answer :   InfoStream


Explanation : InfoStream platform provides an enterprise-class foundation for information-intensive projects, providing the performance, scalability, reliability and acceleration needed to simplify difficult challenges and deliver trusted information to your business faster.

A)
1
B)
2
C)
3
D)
4

Correct Answer :   3


Explanation : InfoSphere DataStage also facilitates extended metadata management and enterprise connectivity.

A)
Solaris
B)
Debian
C)
Ubuntu
D)
Windows

Correct Answer :   Windows


Explanation : The IBM InfoSphere DataStage is capable of integrating data on demand across multiple and high volumes of data sources and target applications using a high-performance parallel framework.

A)
TX
B)
MVS Edition
C)
Server Edition
D)
Enterprise Edition

Correct Answer :   Enterprise Edition


Explanation : DataStage 5 added Sequence Jobs and DataStage 6 added Parallel Jobs via Enterprise Edition.

A)
TX
B)
PX
C)
MVS Edition
D)
Server Edition

Correct Answer :   TX


Explanation : MVS Edition developed on a Windows or Unix/Linux platform and transferred to the mainframe as compiled mainframe jobs.

A)
Info Server
B)
Information Server
C)
Data Server
D)
All of the above

Correct Answer :   Information Server


Explanation : IBM InfoSphere Information Server is a market-leading data integration platform which includes a family of products that enable you to understand, cleanse, monitor, transform, and deliver data.