52 + Mcqs in MapReduce Page 1 McqOptions

1.	The framework groups Reducer inputs by key in _________ stage.
A.	sort
B.	shuffle
C.	reduce
D.	none of the mentioned
Answer» B. shuffle

Discussion

2.	The Hadoop MapReduce framework spawns one map task for each __________ generated by the InputFormat for the job.
A.	OutputSplit
B.	InputSplit
C.	InputSplitStream
D.	All of the mentioned
Answer» C. InputSplitStream

Discussion

3.	The output of the reduce task is typically written to the FileSystem via _____________
A.	OutputCollector.collect
B.	OutputCollector.get
C.	OutputCollector.receive
D.	OutputCollector.put
Answer» B. OutputCollector.get

Discussion

4.	Applications can use the ____________ to report progress and set application-level status messages
A.	Partitioner
B.	OutputSplit
C.	Reporter
D.	All of the mentioned
Answer» D. All of the mentioned

Discussion

5.	Users can control which keys (and hence records) go to which Reducer by implementing a custom :
A.	Partitioner
B.	OutputSplit
C.	Reporter
D.	All of the mentioned
Answer» B. OutputSplit

Discussion

6.	The number of reduces for the job is set by the user via :
A.	JobConf.setNumTasks(int)
B.	JobConf.setNumReduceTasks(int)
C.	JobConf.setNumMapTasks(int)
D.	All of the mentioned
Answer» C. JobConf.setNumMapTasks(int)

Discussion

7.	The right level of parallelism for maps seems to be around _________ maps per-node
A.	1-10
B.	10-100
C.	100-150
D.	150-200
Answer» C. 100-150

Discussion

8.	The Mapper implementation processes one line at a time via _________ method.
A.	map
B.	reduce
C.	mapper
D.	reducer
Answer» B. reduce

Discussion

9.	Map output larger than ___ percent of the memory allocated to copying map outputs.
A.	10
B.	15
C.	25
D.	35
Answer» D. 35

Discussion

10.	______________ is percentage of memory relative to the maximum heap size in which map outputs may be retained during the reduce.
A.	mapred.job.shuffle.merge.percent
B.	mapred.job.reduce.input.buffer.percen
C.	mapred.inmem.merge.threshold
D.	io.sort.factor
Answer» C. mapred.inmem.merge.threshold

Discussion

11.	Which of the following is the default Partitioner for Mapreduce ?
A.	MergePartitioner
B.	HashedPartitioner
C.	HashPartitioner
D.	None of the mentioned
Answer» D. None of the mentioned

Discussion

12.	____________ specifies the number of segments on disk to be merged at the same time.
A.	mapred.job.shuffle.merge.percent
B.	mapred.job.reduce.input.buffer.percen
C.	mapred.inmem.merge.threshold
D.	io.sort.factor
Answer» E.

Discussion

13.	Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster.
A.	MapReduce
B.	Map
C.	Reducer
D.	All of the mentioned
Answer» B. Map

Discussion

14.	________ is a utility which allows users to create and run jobs with any executable as the mapper and/or the reducer.
A.	Hadoop Strdata
B.	Hadoop Streaming
C.	Hadoop Stream
D.	None of the mentioned
Answer» C. Hadoop Stream

Discussion

15.	Which of the following node is responsible for executing a Task assigned to it by the JobTracker ?
A.	MapReduce
B.	Mapper
C.	TaskTracker
D.	JobTracker
Answer» D. JobTracker

Discussion

16.	___________ is used for writing blocks with single replica in memory.
A.	Hot
B.	Lazy_Persist
C.	One_SSD
D.	All_SSD
Answer» C. One_SSD

Discussion

17.	____________ is used for storing one of the replicas in SSD.
A.	Hot
B.	Lazy_Persist
C.	One_SSD
D.	All_SSD
Answer» D. All_SSD

Discussion

18.	__________ storage is a solution to decouple growing storage capacity from compute capacity.
A.	DataNode
B.	Archival
C.	Policy
D.	None of the mentioned
Answer» C. Policy

Discussion

19.	The configuration file must be owned by the user running :
A.	DataManager
B.	NodeManager
C.	ValidationManager
D.	None of the mentioned
Answer» C. ValidationManager

Discussion

20.	The ____________ requires that paths including and leading up to the directories specified in yarn.nodemanager.local-dirs
A.	TaskController
B.	LinuxTaskController
C.	LinuxController
D.	None of the mentioned
Answer» C. LinuxController

Discussion

21.	_________ is useful for iterating the properties when all deprecated properties for currently set properties need to be present.
A.	addResource
B.	setDeprecatedProperties
C.	addDefaultResource
D.	none of the mentioned
Answer» C. addDefaultResource

Discussion

22.	Which of the following adds a configuration resource ?
A.	addResource
B.	setDeprecatedProperties
C.	addDefaultResource
D.	addResource
Answer» E.

Discussion

23.	Which of the following writes MapFiles as output ?
A.	DBInpFormat
B.	MapFileOutputFormat
C.	SequenceFileAsBinaryOutputFormat
D.	None of the mentioned
Answer» D. None of the mentioned

Discussion

24.	_________ is the base class for all implementations of InputFormat that use files as their data source .
A.	FileTextFormat
B.	FileInputFormat
C.	FileOutputFormat
D.	None of the mentioned
Answer» C. FileOutputFormat

Discussion

25.	Which of the following method add a path or paths to the list of inputs ?
A.	setInputPaths()
B.	addInputPath()
C.	setInput()
D.	none of the mentioned
Answer» C. setInput()

Discussion

26.	Hadoop has a library class, org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that effectively allows you to process text data like the unix ______ utility.
A.	Copy
B.	Cut
C.	Paste
D.	Move
Answer» C. Paste

Discussion

27.	HBase provides ___________ like capabilities on top of Hadoop and HDFS.
A.	TopTable
B.	BigTop
C.	Bigtable
D.	None of the mentioned
Answer» D. None of the mentioned

Discussion

28.	_______ refers to incremental costs with no major impact on solution design, performance and complexity.
A.	Scale-out
B.	Scale-down
C.	Scale-up
D.	None of the mentioned
Answer» D. None of the mentioned

Discussion

29.	__________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer
A.	Partitioner
B.	OutputCollector
C.	Reporter
D.	All of the mentioned
Answer» C. Reporter

Discussion

30.	_________ is the default Partitioner for partitioning key space.
A.	HashPar
B.	Partitioner
C.	HashPartitioner
D.	None of the mentioned
Answer» D. None of the mentioned

Discussion

31.	_________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution.
A.	Map Parameters
B.	JobConf
C.	MemoryConf
D.	None of the mentioned
Answer» C. MemoryConf

Discussion

32.	The number of maps is usually driven by the total size of :
A.	inputs
B.	outputs
C.	tasks
D.	None of the mentioned
Answer» B. outputs

Discussion

33.	__________ maps input key/value pairs to a set of intermediate key/value pairs.
A.	Mapper
B.	Reducer
C.	Both Mapper and Reducer
D.	None of the mentioned
Answer» B. Reducer

Discussion

34.	_________ function is responsible for consolidating the results produced by each of the Map() functions/tasks.
A.	Reduce
B.	Map
C.	Reducer
D.	All of the mentioned
Answer» B. Map

Discussion

35.	The _____________ can also be used to distribute both jars and native libraries for use in the map and/or reduce tasks.
A.	DistributedLog
B.	DistributedCache
C.	DistributedJars
D.	None of the mentioned
Answer» C. DistributedJars

Discussion

36.	__________ is used to filter log files from the output directory listing.
A.	OutputLog
B.	OutputLogFilter
C.	DistributedLog
D.	DistributedJarsd) None of the mentioned
Answer» C. DistributedLog

Discussion

37.	A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.
A.	MapReduce
B.	Mapper
C.	TaskTracker
D.	JobTracker
Answer» D. JobTracker

Discussion

38.	Jobs can enable task JVMs to be reused by specifying the job configuration :
A.	mapred.job.recycle.jvm.num.tasks
B.	mapissue.job.reuse.jvm.num.tasks
C.	mapred.job.reuse.jvm.num.tasks
D.	all of the mentioned
Answer» C. mapred.job.reuse.jvm.num.tasks

Discussion

39.	___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.
A.	Maptask
B.	Mapper
C.	Task execution
D.	All of the mentioned
Answer» B. Mapper

Discussion

40.	Although the Hadoop framework is implemented in Java , MapReduce applications need not be written in :
A.	Java
B.	C
C.	C#
D.	None of the mentioned
Answer» B. C

Discussion

41.	________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.
A.	Hadoop Strdata
B.	Hadoop Streaming
C.	Hadoop Stream
D.	None of the mentioned
Answer» C. Hadoop Stream

Discussion

42.	The standard output (stdout) and error (stderr) streams of the task are read by the TaskTracker and logged to :
A.	${HADOOP_LOG_DIR}/user
B.	${HADOOP_LOG_DIR}/userlogs
C.	${HADOOP_LOG_DIR}/logs
D.	None of the mentioned
Answer» C. ${HADOOP_LOG_DIR}/logs

Discussion

43.	__________ will clear the RMStateStore and is useful if past applications are no longer needed.
A.	-format-state
B.	-form-state-store
C.	-format-state-store
D.	none of the mentioned
Answer» D. none of the mentioned

Discussion

44.	The __________ is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc.
A.	Manager
B.	Master
C.	Scheduler
D.	None of the mentioned
Answer» D. None of the mentioned

Discussion

45.	The queue definitions and properties such as ________, ACLs can be changed, at runtime.
A.	tolerant
B.	capacity
C.	speed
D.	all of the mentioned
Answer» C. speed

Discussion

46.	The updated queue configuration should be a valid one i.e. queue-capacity at each level should be equal to :
A.	0.5
B.	0.75
C.	1
D.	0
Answer» D. 0

Discussion

47.	Which of the following command runs ResourceManager admin client ?
A.	proxyserver
B.	run
C.	admin
D.	rmadmin
Answer» E.

Discussion

48.	Users can bundle their Yarn code in a _________ file and execute it using jar command.
A.	java
B.	jar
C.	C code
D.	xml
Answer» C. C code

Discussion

49.	Yarn commands are invoked by the ________ script.
A.	hive
B.	bin
C.	hadoop
D.	home
Answer» C. hadoop

Discussion

50.	The CapacityScheduler has a predefined queue called :
A.	domain
B.	root
C.	rear
D.	all of the mentioned
Answer» C. rear

Discussion

Explore topic-wise MCQs in Apache Hadoop.

The framework groups Reducer inputs by key in _________ stage.

The Hadoop MapReduce framework spawns one map task for each __________ generated by the InputFormat for the job.

The output of the reduce task is typically written to the FileSystem via _____________

Applications can use the ____________ to report progress and set application-level status messages

Users can control which keys (and hence records) go to which Reducer by implementing a custom :

The number of reduces for the job is set by the user via :

The right level of parallelism for maps seems to be around _________ maps per-node

The Mapper implementation processes one line at a time via _________ method.

Map output larger than ___ percent of the memory allocated to copying map outputs.

______________ is percentage of memory relative to the maximum heap size in which map outputs may be retained during the reduce.

Which of the following is the default Partitioner for Mapreduce ?

____________ specifies the number of segments on disk to be merged at the same time.

Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster.

________ is a utility which allows users to create and run jobs with any executable as the mapper and/or the reducer.

Which of the following node is responsible for executing a Task assigned to it by the JobTracker ?

___________ is used for writing blocks with single replica in memory.

____________ is used for storing one of the replicas in SSD.

__________ storage is a solution to decouple growing storage capacity from compute capacity.

The configuration file must be owned by the user running :

The ____________ requires that paths including and leading up to the directories specified in yarn.nodemanager.local-dirs

_________ is useful for iterating the properties when all deprecated properties for currently set properties need to be present.

Which of the following adds a configuration resource ?

Which of the following writes MapFiles as output ?

_________ is the base class for all implementations of InputFormat that use files as their data source .

Which of the following method add a path or paths to the list of inputs ?

Hadoop has a library class, org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that effectively allows you to process text data like the unix ______ utility.

HBase provides ___________ like capabilities on top of Hadoop and HDFS.

_______ refers to incremental costs with no major impact on solution design, performance and complexity.

__________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer

_________ is the default Partitioner for partitioning key space.

_________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution.

The number of maps is usually driven by the total size of :

__________ maps input key/value pairs to a set of intermediate key/value pairs.

_________ function is responsible for consolidating the results produced by each of the Map() functions/tasks.

The _____________ can also be used to distribute both jars and native libraries for use in the map and/or reduce tasks.

__________ is used to filter log files from the output directory listing.

A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.

Jobs can enable task JVMs to be reused by specifying the job configuration :

___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.

Although the Hadoop framework is implemented in Java , MapReduce applications need not be written in :

________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.

The standard output (stdout) and error (stderr) streams of the task are read by the TaskTracker and logged to :

__________ will clear the RMStateStore and is useful if past applications are no longer needed.

The __________ is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc.

The queue definitions and properties such as ________, ACLs can be changed, at runtime.

The updated queue configuration should be a valid one i.e. queue-capacity at each level should be equal to :

Which of the following command runs ResourceManager admin client ?

Users can bundle their Yarn code in a _________ file and execute it using jar command.

Yarn commands are invoked by the ________ script.

The CapacityScheduler has a predefined queue called :