_________________ property allow users to override the expiry time specified.

hcatalog.hive.client.cache.disabled

hcat.desired.partition.num.splits

hcatalog.hive.client.cache.expiry.time

hcatalog.hive.client.cache.disabled

Which of the following is true about mahout?

A mahout is one who drives an elephant as its master.

Apache Mahout is an open source project that is primarily used for creating scalable machine learning algorithms

Mahout lets applications to analyze large sets of data effectively and in quick time.

What is true about Apache Flume?

Apache Flumeis a reliable and distributed system for collecting, aggregating and moving massive quantities of log data.

It has a simple yet flexible architecture based on streaming data flows

Apache Flume is used to collect log data present in log files from web servers and aggregating it into HDFS for analysis.

230 + Mcqs in Hadoop in Technical Programming Page 2 McqOptions

51.	____________ is a subproject with the aim of collecting and distributing free materials.
A.	OSR
B.	OPR
C.	ORP
D.	ORS
Answer» D. ORS

Discussion

52.	____________ sink can be a text file, the console display, a simple HDFS path, or a null bucket where the data is simply deleted.
A.	Collector Tier Event
B.	Agent Tier Event
C.	Basic
D.	None of the mentioned
Answer» D. None of the mentioned

Discussion

53.	Hama requires JRE _______ or higher and ssh to be set up between nodes in the cluster.
A.	1.6
B.	1.7
C.	1.8
D.	2.0
Answer» B. 1.7

Discussion

54.	HCatalog is built on top of the Hive metastore and incorporates Hive’s is ____________
A.	DDL
B.	DML
C.	TCL
D.	DCL
Answer» B. DML

Discussion

55.	___________ is the type supported for storing values in HCatalog tables.
A.	HCatRecord
B.	HCatColumns
C.	HCatValues
D.	All of the mentioned
Answer» B. HCatColumns

Discussion

56.	_________________ property allow users to override the expiry time specified.
A.	hcat.desired.partition.num.splits
B.	hcatalog.hive.client.cache.expiry.time
C.	hcatalog.hive.client.cache.disabled
D.	hcat.append.limit
Answer» C. hcatalog.hive.client.cache.disabled

Discussion

57.	Sally in data processing uses __________ to cleanse and prepare the data.
A.	Pig
B.	Hive
C.	HCatalog
D.	Impala
Answer» B. Hive

Discussion

58.	_________ mode is used when you just have a single server and want to launch all the daemon processes.
A.	Local Mode
B.	Pseudo Distributed Mode
C.	Distributed Mode
D.	All of the mentioned
Answer» C. Distributed Mode

Discussion

59.	With HCatalog _________ does not need to modify the table structure.
A.	Partition
B.	Columns
C.	Robert
D.	All of the mentioned
Answer» D. All of the mentioned

Discussion

60.	____________ generates NGrams and counts frequencies for ngrams, head and tail subgrams.
A.	CollocationDriver
B.	CollocDriver
C.	CarDriver
D.	All of the mentioned
Answer» C. CarDriver

Discussion

61.	Ambari makes Hadoop management simpler by providing a consistent, secure platform for operational control.
A.	True
B.	False
C.	May be True or False
D.	Can't Say
Answer» B. False

Discussion

62.	Capacity Scheduler Viewhelps a Hadoop operator setup YARN workload management easily to enable multi-tenant and multi-workload processing.
A.	True
B.	False
C.	May be True or False
D.	Can't Say
Answer» B. False

Discussion

63.	HCatalog supports the same data types as _________
A.	Pig
B.	Hama
C.	Hive
D.	Oozie
Answer» D. Oozie

Discussion

64.	____________ is used when you want the sink to be the input source for another operation.
A.	Collector Tier Event
B.	Agent Tier Event
C.	Basic
D.	All of the mentioned
Answer» C. Basic

Discussion

65.	__________ is a single-threaded server using standard blocking I/O.
A.	TNonblockingServer
B.	TSimpleServer
C.	TSocket
D.	None of the mentioned
Answer» C. TSocket

Discussion

66.	Hama is a general ________________ computing engine on top of Hadoop.
A.	BSP
B.	ASP
C.	MPP
D.	None of the mentioned
Answer» B. ASP

Discussion

67.	____________ Collection API allows for even distribution of custom replica properties.
A.	BALANUNIQUE
B.	BALANCESHARDUNIQUE
C.	BALANCEUNIQUE
D.	None of the mentioned
Answer» C. BALANCEUNIQUE

Discussion

68.	A __________ in a social graph is a group of people who interact frequently with each other and less frequently with others.
A.	semi-cluster
B.	partial cluster
C.	full cluster
D.	none of the mentioned
Answer» B. partial cluster

Discussion

69.	In how many ways Spark uses Hadoop?
A.	2
B.	3
C.	4
D.	5
Answer» B. 3

Discussion

70.	Drill is designed from the ground up to support high-performance analysis on the ____________ data.
A.	semi-structured
B.	structured
C.	unstructured
D.	none of the mentioned
Answer» B. structured

Discussion

71.	A __________ server and a data node should be run on one physical node.
A.	groom
B.	web
C.	client
D.	all of the mentioned
Answer» B. web

Discussion

72.	A key of type ___________ is generated which is used later to join ngrams with their heads and tails in the reducer phase.
A.	GramKey
B.	Primary
C.	Secondary
D.	None of the mentioned
Answer» B. Primary

Discussion

73.	Which of the following performs compression using zlib?
A.	TZlibTransport
B.	TFramedTransport
C.	TMemoryTransport
D.	None of the mentioned
Answer» B. TFramedTransport

Discussion

74.	________ phase merges the counts for unique ngrams or ngram fragments across multiple documents.
A.	CollocCombiner
B.	CollocReducer
C.	CollocMerger
D.	None of the mentioned
Answer» B. CollocReducer

Discussion

75.	Mahout provides ____________ libraries for common and primitive Java collections.
A.	Java
B.	Javascript
C.	Perl
D.	Python
Answer» B. Javascript

Discussion

76.	_______ transport is required when using a non-blocking server.
A.	TZlibTransport
B.	TFramedTransport
C.	TMemoryTransport
D.	None of the mentioned
Answer» C. TMemoryTransport

Discussion

77.	A ________ is used to manage the efficient barrier synchronization of the BSPPeers.
A.	GroomServers
B.	BSPMaster
C.	Zookeeper
D.	None of the mentioned
Answer» D. None of the mentioned

Discussion

78.	For ___________ partitioning jobs, simply specifying a custom directory is not good enough.
A.	static
B.	semi cluster
C.	dynamic
D.	all of the mentioned
Answer» D. all of the mentioned

Discussion

79.	The Crunch APIs are modeled after _________ which is the library that Google uses for building data pipelines on top of their own implementation of MapReduce.
A.	FlagJava
B.	FlumeJava
C.	FlakeJava
D.	All of the mentioned
Answer» C. FlakeJava

Discussion

80.	Drill analyze semi-structured/nested data coming from _________ applications.
A.	RDBMS
B.	NoSQL
C.	NewSQL
D.	None of the mentioned
Answer» C. NewSQL

Discussion

81.	Drill integrates with BI tools using a standard __________ connector.
A.	JDBC
B.	ODBC
C.	ODBC-JDBC
D.	All of the mentioned
Answer» C. ODBC-JDBC

Discussion

82.	MapR __________ Solution Earns Highest Score in Gigaom Research Data Warehouse Interoperability Report.
A.	SQL-on-Hadoop
B.	Hive-on-Hadoop
C.	Pig-on-Hadoop
D.	All of the mentioned
Answer» B. Hive-on-Hadoop

Discussion

83.	The web UI provides information about ________ job statistics of the Hama cluster.
A.	MPP
B.	BSP
C.	USP
D.	ISP
Answer» C. USP

Discussion

84.	The tokens are passed through a Lucene ____________ to produce NGrams of the desired length.
A.	ShngleFil
B.	ShingleFilter
C.	SingleFilter
D.	Collfilter
Answer» C. SingleFilter

Discussion

85.	Which of the following is true about mahout?
A.	A mahout is one who drives an elephant as its master.
B.	Apache Mahout is an open source project that is primarily used for creating scalable machine learning algorithms
C.	Mahout lets applications to analyze large sets of data effectively and in quick time.
D.	All of the above
Answer» E.

Discussion

86.	Which of the following language is not supported by Spark?
A.	Java
B.	Pascal
C.	Scala
D.	Python
Answer» C. Scala

Discussion

87.	Which of the following project is interface definition language for hadoop?
A.	Oozie
B.	Mahout
C.	Thrift
D.	Impala
Answer» D. Impala

Discussion

88.	____________ can be used to generate stats over the results of arbitrary numeric functions.
A.	stats.field
B.	sta.field
C.	stats.value
D.	none of the mentioned
Answer» B. sta.field

Discussion

89.	The ________ class allows developers to exercise precise control over how data is partitioned, sorted, and grouped by the underlying execution engine.
A.	Grouping
B.	GroupingOptions
C.	RowGrouping
D.	None of the mentioned
Answer» C. RowGrouping

Discussion

90.	Hive does not have a data type corresponding to the ____________ type in Pig.
A.	decimal
B.	short
C.	biginteger
D.	datetime
Answer» D. datetime

Discussion

91.	New ____________ type enables Indexing and searching of date ranges, particularly multi-valued ones.
A.	RangeField
B.	DateField
C.	DateRangeField
D.	All of the mentioned
Answer» D. All of the mentioned

Discussion

92.	Crunch uses Java serialization to serialize the contents of all of the ______ in a pipeline definition.
A.	Transient
B.	DoFns
C.	Configuration
D.	All of the mentioned
Answer» C. Configuration

Discussion

93.	PCollection, PTable, and PGroupedTable all support a __________ operation.
A.	intersection
B.	union
C.	OR
D.	None of the mentioned
Answer» C. OR

Discussion

94.	Drill provides a __________ like internal data model to represent and process data.
A.	XML
B.	JSON
C.	TIFF
D.	None of the mentioned
Answer» C. TIFF

Discussion

95.	What is true about Apache Flume?
A.	Apache Flumeis a reliable and distributed system for collecting, aggregating and moving massive quantities of log data.
B.	It has a simple yet flexible architecture based on streaming data flows
C.	Apache Flume is used to collect log data present in log files from web servers and aggregating it into HDFS for analysis.
D.	All of the above
Answer» E.

Discussion

96.	Lucene index size is roughly _______ the size of text indexed.
A.	10%
B.	20%
C.	50%
D.	70%
Answer» C. 50%

Discussion

97.	Apache Hama provides complete clone of _________
A.	Pragmatic
B.	Pregel
C.	ServePreg
D.	All of the mentioned
Answer» C. ServePreg

Discussion

98.	Which of the following Uses JSON for encoding of data?
A.	TCompactProtocol
B.	TDenseProtocol
C.	TBinaryProtocol
D.	None of the mentioned
Answer» E.

Discussion

99.	DoFns provide direct access to the __________ object that is used within a given Map or Reduce task via the getContext method.
A.	TaskInputContext
B.	TaskInputOutputContext
C.	TaskOutputContext
D.	All of the mentioned
Answer» C. TaskOutputContext

Discussion

100.	The _________ collocation identifier is integrated into the process that is used to create vectors from sequence files of text keys and values.
A.	lbr
B.	lcr
C.	llr
D.	lar
Answer» D. lar

Discussion

Explore topic-wise MCQs in Technical Programming.

____________ is a subproject with the aim of collecting and distributing free materials.

____________ sink can be a text file, the console display, a simple HDFS path, or a null bucket where the data is simply deleted.

Hama requires JRE _______ or higher and ssh to be set up between nodes in the cluster.

HCatalog is built on top of the Hive metastore and incorporates Hive’s is ____________

___________ is the type supported for storing values in HCatalog tables.

_________________ property allow users to override the expiry time specified.

Sally in data processing uses __________ to cleanse and prepare the data.

_________ mode is used when you just have a single server and want to launch all the daemon processes.

With HCatalog _________ does not need to modify the table structure.

____________ generates NGrams and counts frequencies for ngrams, head and tail subgrams.

Ambari makes Hadoop management simpler by providing a consistent, secure platform for operational control.

Capacity Scheduler Viewhelps a Hadoop operator setup YARN workload management easily to enable multi-tenant and multi-workload processing.

HCatalog supports the same data types as _________

____________ is used when you want the sink to be the input source for another operation.

__________ is a single-threaded server using standard blocking I/O.

Hama is a general ________________ computing engine on top of Hadoop.

____________ Collection API allows for even distribution of custom replica properties.

A __________ in a social graph is a group of people who interact frequently with each other and less frequently with others.

In how many ways Spark uses Hadoop?

Drill is designed from the ground up to support high-performance analysis on the ____________ data.

A __________ server and a data node should be run on one physical node.

A key of type ___________ is generated which is used later to join ngrams with their heads and tails in the reducer phase.

Which of the following performs compression using zlib?

________ phase merges the counts for unique ngrams or ngram fragments across multiple documents.

Mahout provides ____________ libraries for common and primitive Java collections.

_______ transport is required when using a non-blocking server.

A ________ is used to manage the efficient barrier synchronization of the BSPPeers.

For ___________ partitioning jobs, simply specifying a custom directory is not good enough.

The Crunch APIs are modeled after _________ which is the library that Google uses for building data pipelines on top of their own implementation of MapReduce.

Drill analyze semi-structured/nested data coming from _________ applications.

Drill integrates with BI tools using a standard __________ connector.

MapR __________ Solution Earns Highest Score in Gigaom Research Data Warehouse Interoperability Report.

The web UI provides information about ________ job statistics of the Hama cluster.

The tokens are passed through a Lucene ____________ to produce NGrams of the desired length.

Which of the following is true about mahout?

Which of the following language is not supported by Spark?

Which of the following project is interface definition language for hadoop?

____________ can be used to generate stats over the results of arbitrary numeric functions.

The ________ class allows developers to exercise precise control over how data is partitioned, sorted, and grouped by the underlying execution engine.

Hive does not have a data type corresponding to the ____________ type in Pig.

New ____________ type enables Indexing and searching of date ranges, particularly multi-valued ones.

Crunch uses Java serialization to serialize the contents of all of the ______ in a pipeline definition.

PCollection, PTable, and PGroupedTable all support a __________ operation.

Drill provides a __________ like internal data model to represent and process data.

What is true about Apache Flume?

Lucene index size is roughly _______ the size of text indexed.

Apache Hama provides complete clone of _________

Which of the following Uses JSON for encoding of data?

DoFns provide direct access to the __________ object that is used within a given Map or Reduce task via the getContext method.

The _________ collocation identifier is integrated into the process that is used to create vectors from sequence files of text keys and values.