27 + Mcqs in Bigdata McqOptions

1.	Which of the following focuses on the discovery of (previously) unknown properties on the data?
A.	Data mining
B.	BigData
C.	Data wrangling
D.	Machine Learning
Answer» B. BigData

Discussion

2.	The overall percentage of the world s total data has been created just within the past two years is ?
A.	80%
B.	85%
C.	90%
D.	95%
Answer» D. 95%

Discussion

3.	___________ is general-purpose computing model and runtime system for distributed data analytics.
A.	Mapreduce
B.	Drill
C.	Oozie
D.	None of the above
Answer» B. Drill

Discussion

4.	The examination of large amounts of data to see what patterns or other useful information can be found is known as
A.	Data examination
B.	Information analysis
C.	Big data analytics
D.	Data analysis
Answer» D. Data analysis

Discussion

5.	The new source of big data that will trigger a Big Data revolution in the years to come is?
A.	Business transactions
B.	Social media
C.	Transactional data and sensor data
D.	RDBMS
Answer» D. RDBMS

Discussion

6.	Listed below are the three steps that are followed to deploy a Big Data Solution except
A.	Data Processing
B.	Data dissemination
C.	Data Storage
D.	Data Ingestion
Answer» C. Data Storage

Discussion

7.	______ is the term that is used to describe data that is high volume , high velocity and /or high variety.
A.	Analytics
B.	Bigdata
C.	Hadoop Data
D.	Bigdata analytics
Answer» C. Hadoop Data

Discussion

8.	To find the minimum or the maximum of a function, we set the gradient to zero because:
A.	The value of the gradient at extrema of a function is always zero
B.	Depends on the type of problem
C.	Both A and B
D.	None of the above
Answer» B. Depends on the type of problem

Discussion

9.	Which of the following can be used to create sub samples using a maximum dissimilarity approach?
A.	minDissim
B.	maxDissim
C.	inmaxDissim
D.	All of the Mentioned
Answer» C. inmaxDissim

Discussion

10.	Which of the following plots are often used for checking randomness in time series?
A.	Autocausation
B.	Autorank
C.	Autocorrelation
D.	None of the above
Answer» D. None of the above

Discussion

11.	Which of the following is a reasonable way to select the number of principal components "k"?
A.	Choose k to be the smallest value so that at least 99% of the varinace is retained.
B.	Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
C.	Choose k to be the largest value so that 99% of the variance is retained.
D.	Use the elbow method.
Answer» B. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).

Discussion

12.	Which of the following techniques can not be used for normalization in text mining?
A.	Stemming
B.	Lemmatization
C.	Stop Word Removal
D.	None of the above
Answer» D. None of the above

Discussion

13.	When performing regression or classification, which of the following is the correct way to preprocess the data?
A.	Normalize the data -> PCA -> training
B.	PCA -> normalize PCA output -> training
C.	Normalize the data -> PCA -> normalize PCA output -> training
D.	None of the above
Answer» B. PCA -> normalize PCA output -> training

Discussion

14.	A model of language consists of the categories which does not include ________.
A.	System Unit
B.	structural units.
C.	data units
D.	empirical units
Answer» C. data units

Discussion

15.	The modern conception of data science as an independent discipline is sometimes attributed to?
A.	William S.
B.	John McCarthy
C.	Arthur Samuel
D.	Satoshi Nakamoto
Answer» B. John McCarthy

Discussion

16.	The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
A.	TRUE
B.	FALSE
C.	Can be true or false
D.	Can not say
Answer» B. FALSE

Discussion

17.	In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	All of the above
Answer» E.

Discussion

18.	In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	All of the above
Answer» E.

Discussion

19.	Which of the following step is performed by data scientist after acquiring the data?
A.	Data Cleaning
B.	Data Integration
C.	Data Replication
D.	All of the above
Answer» B. Data Integration

Discussion

20.	Which of the following can be used to impute data sets based only on information in the training set?
A.	postprocess
B.	preProcess
C.	process
D.	All of the Mentioned
Answer» C. process

Discussion

21.	In descriptive statistics, data from the entire population or a sample is summarized with ?
A.	integer descriptors
B.	floating descriptors
C.	numerical descriptors
D.	decimal descriptors
Answer» D. decimal descriptors

Discussion

22.	Which of the following model model include a backwards elimination feature selection routine?
A.	MCV
B.	MARS
C.	MCRS
D.	All of the Mentioned
Answer» C. MCRS

Discussion

23.	Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging?
A.	Decision Tree
B.	Regression
C.	Classification
D.	Random Forest
Answer» E.

Discussion

24.	The branch of statistics which deals with development of particular statistical methods is classified as
A.	industry statistics
B.	economic statistics
C.	applied statistics
D.	applied statistics
Answer» E.

Discussion

25.	In Model based learning methods, an iterative process takes place on the ML models that are built based on various model parameters, called ?
A.	mini-batches
B.	optimizedparameters
C.	hyperparameters
D.	superparameters
Answer» D. superparameters

Discussion

26.	__________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.
A.	MapReduce
B.	Mahout
C.	Oozie
D.	All of the mentioned
Answer» B. Mahout

Discussion

27.	According to analysts, for what can traditional IT systems provide a foundation when they re integrated with big data technologies like Hadoop?
A.	Big data management and data mining
B.	Data warehousing and business intelligence
C.	Management of Hadoop clusters
D.	Collecting and storing unstructured data
Answer» B. Data warehousing and business intelligence

Discussion

Explore topic-wise MCQs in Testing Subject.

Which of the following focuses on the discovery of (previously) unknown properties on the data?

The overall percentage of the world s total data has been created just within the past two years is ?

___________ is general-purpose computing model and runtime system for distributed data analytics.

The examination of large amounts of data to see what patterns or other useful information can be found is known as

The new source of big data that will trigger a Big Data revolution in the years to come is?

Listed below are the three steps that are followed to deploy a Big Data Solution except

______ is the term that is used to describe data that is high volume , high velocity and /or high variety.

To find the minimum or the maximum of a function, we set the gradient to zero because:

Which of the following can be used to create sub samples using a maximum dissimilarity approach?

Which of the following plots are often used for checking randomness in time series?

Which of the following is a reasonable way to select the number of principal components "k"?

Which of the following techniques can not be used for normalization in text mining?

When performing regression or classification, which of the following is the correct way to preprocess the data?

A model of language consists of the categories which does not include ________.

The modern conception of data science as an independent discipline is sometimes attributed to?

The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes

In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes

Which of the following step is performed by data scientist after acquiring the data?

Which of the following can be used to impute data sets based only on information in the training set?

In descriptive statistics, data from the entire population or a sample is summarized with ?

Which of the following model model include a backwards elimination feature selection routine?

Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging?

The branch of statistics which deals with development of particular statistical methods is classified as

In Model based learning methods, an iterative process takes place on the ML models that are built based on various model parameters, called ?

__________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.

According to analysts, for what can traditional IT systems provide a foundation when they re integrated with big data technologies like Hadoop?