Explore topic-wise MCQs in Apache Hadoop.

This section includes 52 Mcqs, each offering curated multiple-choice questions to sharpen your Apache Hadoop knowledge and support exam preparation. Choose a topic below to get started.

1.

The framework groups Reducer inputs by key in _________ stage.

A. sort
B. shuffle
C. reduce
D. none of the mentioned
Answer» B. shuffle
2.

The Hadoop MapReduce framework spawns one map task for each __________ generated by the InputFormat for the job.

A. OutputSplit
B. InputSplit
C. InputSplitStream
D. All of the mentioned
Answer» C. InputSplitStream
3.

The output of the reduce task is typically written to the FileSystem via _____________

A. OutputCollector.collect
B. OutputCollector.get
C. OutputCollector.receive
D. OutputCollector.put
Answer» B. OutputCollector.get
4.

Applications can use the ____________ to report progress and set application-level status messages

A. Partitioner
B. OutputSplit
C. Reporter
D. All of the mentioned
Answer» D. All of the mentioned
5.

Users can control which keys (and hence records) go to which Reducer by implementing a custom :

A. Partitioner
B. OutputSplit
C. Reporter
D. All of the mentioned
Answer» B. OutputSplit
6.

The number of reduces for the job is set by the user via :

A. JobConf.setNumTasks(int)
B. JobConf.setNumReduceTasks(int)
C. JobConf.setNumMapTasks(int)
D. All of the mentioned
Answer» C. JobConf.setNumMapTasks(int)
7.

The right level of parallelism for maps seems to be around _________ maps per-node

A. 1-10
B. 10-100
C. 100-150
D. 150-200
Answer» C. 100-150
8.

The Mapper implementation processes one line at a time via _________ method.

A. map
B. reduce
C. mapper
D. reducer
Answer» B. reduce
9.

Map output larger than ___ percent of the memory allocated to copying map outputs.

A. 10
B. 15
C. 25
D. 35
Answer» D. 35
10.

______________ is percentage of memory relative to the maximum heap size in which map outputs may be retained during the reduce.

A. mapred.job.shuffle.merge.percent
B. mapred.job.reduce.input.buffer.percen
C. mapred.inmem.merge.threshold
D. io.sort.factor
Answer» C. mapred.inmem.merge.threshold
11.

Which of the following is the default Partitioner for Mapreduce ?

A. MergePartitioner
B. HashedPartitioner
C. HashPartitioner
D. None of the mentioned
Answer» D. None of the mentioned
12.

____________ specifies the number of segments on disk to be merged at the same time.

A. mapred.job.shuffle.merge.percent
B. mapred.job.reduce.input.buffer.percen
C. mapred.inmem.merge.threshold
D. io.sort.factor
Answer» E.
13.

Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster.

A. MapReduce
B. Map
C. Reducer
D. All of the mentioned
Answer» B. Map
14.

________ is a utility which allows users to create and run jobs with any executable as the mapper and/or the reducer.

A. Hadoop Strdata
B. Hadoop Streaming
C. Hadoop Stream
D. None of the mentioned
Answer» C. Hadoop Stream
15.

Which of the following node is responsible for executing a Task assigned to it by the JobTracker ?

A. MapReduce
B. Mapper
C. TaskTracker
D. JobTracker
Answer» D. JobTracker
16.

___________ is used for writing blocks with single replica in memory.

A. Hot
B. Lazy_Persist
C. One_SSD
D. All_SSD
Answer» C. One_SSD
17.

____________ is used for storing one of the replicas in SSD.

A. Hot
B. Lazy_Persist
C. One_SSD
D. All_SSD
Answer» D. All_SSD
18.

__________ storage is a solution to decouple growing storage capacity from compute capacity.

A. DataNode
B. Archival
C. Policy
D. None of the mentioned
Answer» C. Policy
19.

The configuration file must be owned by the user running :

A. DataManager
B. NodeManager
C. ValidationManager
D. None of the mentioned
Answer» C. ValidationManager
20.

The ____________ requires that paths including and leading up to the directories specified in yarn.nodemanager.local-dirs

A. TaskController
B. LinuxTaskController
C. LinuxController
D. None of the mentioned
Answer» C. LinuxController
21.

_________ is useful for iterating the properties when all deprecated properties for currently set properties need to be present.

A. addResource
B. setDeprecatedProperties
C. addDefaultResource
D. none of the mentioned
Answer» C. addDefaultResource
22.

Which of the following adds a configuration resource ?

A. addResource
B. setDeprecatedProperties
C. addDefaultResource
D. addResource
Answer» E.
23.

Which of the following writes MapFiles as output ?

A. DBInpFormat
B. MapFileOutputFormat
C. SequenceFileAsBinaryOutputFormat
D. None of the mentioned
Answer» D. None of the mentioned
24.

_________ is the base class for all implementations of InputFormat that use files as their data source .

A. FileTextFormat
B. FileInputFormat
C. FileOutputFormat
D. None of the mentioned
Answer» C. FileOutputFormat
25.

Which of the following method add a path or paths to the list of inputs ?

A. setInputPaths()
B. addInputPath()
C. setInput()
D. none of the mentioned
Answer» C. setInput()
26.

Hadoop has a library class, org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that effectively allows you to process text data like the unix ______ utility.

A. Copy
B. Cut
C. Paste
D. Move
Answer» C. Paste
27.

HBase provides ___________ like capabilities on top of Hadoop and HDFS.

A. TopTable
B. BigTop
C. Bigtable
D. None of the mentioned
Answer» D. None of the mentioned
28.

_______ refers to incremental costs with no major impact on solution design, performance and complexity.

A. Scale-out
B. Scale-down
C. Scale-up
D. None of the mentioned
Answer» D. None of the mentioned
29.

__________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer

A. Partitioner
B. OutputCollector
C. Reporter
D. All of the mentioned
Answer» C. Reporter
30.

_________ is the default Partitioner for partitioning key space.

A. HashPar
B. Partitioner
C. HashPartitioner
D. None of the mentioned
Answer» D. None of the mentioned
31.

_________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution.

A. Map Parameters
B. JobConf
C. MemoryConf
D. None of the mentioned
Answer» C. MemoryConf
32.

The number of maps is usually driven by the total size of :

A. inputs
B. outputs
C. tasks
D. None of the mentioned
Answer» B. outputs
33.

__________ maps input key/value pairs to a set of intermediate key/value pairs.

A. Mapper
B. Reducer
C. Both Mapper and Reducer
D. None of the mentioned
Answer» B. Reducer
34.

_________ function is responsible for consolidating the results produced by each of the Map() functions/tasks.

A. Reduce
B. Map
C. Reducer
D. All of the mentioned
Answer» B. Map
35.

The _____________ can also be used to distribute both jars and native libraries for use in the map and/or reduce tasks.

A. DistributedLog
B. DistributedCache
C. DistributedJars
D. None of the mentioned
Answer» C. DistributedJars
36.

__________ is used to filter log files from the output directory listing.

A. OutputLog
B. OutputLogFilter
C. DistributedLog
D. DistributedJarsd) None of the mentioned
Answer» C. DistributedLog
37.

A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.

A. MapReduce
B. Mapper
C. TaskTracker
D. JobTracker
Answer» D. JobTracker
38.

Jobs can enable task JVMs to be reused by specifying the job configuration :

A. mapred.job.recycle.jvm.num.tasks
B. mapissue.job.reuse.jvm.num.tasks
C. mapred.job.reuse.jvm.num.tasks
D. all of the mentioned
Answer» C. mapred.job.reuse.jvm.num.tasks
39.

___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.

A. Maptask
B. Mapper
C. Task execution
D. All of the mentioned
Answer» B. Mapper
40.

Although the Hadoop framework is implemented in Java , MapReduce applications need not be written in :

A. Java
B. C
C. C#
D. None of the mentioned
Answer» B. C
41.

________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.

A. Hadoop Strdata
B. Hadoop Streaming
C. Hadoop Stream
D. None of the mentioned
Answer» C. Hadoop Stream
42.

The standard output (stdout) and error (stderr) streams of the task are read by the TaskTracker and logged to :

A. ${HADOOP_LOG_DIR}/user
B. ${HADOOP_LOG_DIR}/userlogs
C. ${HADOOP_LOG_DIR}/logs
D. None of the mentioned
Answer» C. ${HADOOP_LOG_DIR}/logs
43.

__________ will clear the RMStateStore and is useful if past applications are no longer needed.

A. -format-state
B. -form-state-store
C. -format-state-store
D. none of the mentioned
Answer» D. none of the mentioned
44.

The __________ is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc.

A. Manager
B. Master
C. Scheduler
D. None of the mentioned
Answer» D. None of the mentioned
45.

The queue definitions and properties such as ________, ACLs can be changed, at runtime.

A. tolerant
B. capacity
C. speed
D. all of the mentioned
Answer» C. speed
46.

The updated queue configuration should be a valid one i.e. queue-capacity at each level should be equal to :

A. 0.5
B. 0.75
C. 1
D. 0
Answer» D. 0
47.

Which of the following command runs ResourceManager admin client ?

A. proxyserver
B. run
C. admin
D. rmadmin
Answer» E.
48.

Users can bundle their Yarn code in a _________ file and execute it using jar command.

A. java
B. jar
C. C code
D. xml
Answer» C. C code
49.

Yarn commands are invoked by the ________ script.

A. hive
B. bin
C. hadoop
D. home
Answer» C. hadoop
50.

The CapacityScheduler has a predefined queue called :

A. domain
B. root
C. rear
D. all of the mentioned
Answer» C. rear