93 + Mcqs in Bigdata Page 1 McqOptions

1.	Which is used to find the factor congruence coefficients?
A.	factor.mosaicplot
B.	factor.xyplot
C.	factor.congruence
D.	factor.cumsum
Answer» D. factor.cumsum

Discussion

2.	What is true about Data Visualization?
A.	Data Visualization is used to communicate information clearly and efficiently to users by the usage of information graphics such as tables and charts.
B.	Data Visualization helps users in analyzing a large amount of data in a simpler way.
C.	Data Visualization makes complex data more accessible, understandable, and usable.
D.	All of the above
Answer» E.

Discussion

3.	The new source of big data that will trigger a Big Data revolution in theyears to come is?
A.	Business transactions
B.	Social media
C.	Transactional data and sensor data
D.	RDBMS
Answer» D. RDBMS

Discussion

4.	What is a sentence parser typically used for?
A.	It is used to parse sentences to check if they are utf-8 compliant.
B.	It is used to parse sentences to derive their most likely syntax tree structures.
C.	It is used to parse sentences to assign POS tags to all tokens.
D.	It is used to check if sentences can be parsed into meaningful tokens.
Answer» C. It is used to parse sentences to assign POS tags to all tokens.

Discussion

5.	In descriptive statistics, data from the entire population or a sample issummarized with ?
A.	integer descriptors
B.	floating descriptors
C.	numerical descriptors
D.	decimal descriptors
Answer» D. decimal descriptors

Discussion

6.	Which of the following can be used to impute data sets based only on informationin the training set?
A.	postprocess
B.	preProcess
C.	process
D.	All of the Mentioned
Answer» C. process

Discussion

7.	Which of the following is a reasonable way to select the number of principal components "k"?
A.	Choose k to be the smallest value so that at least 99% of the varinace is retained.
B.	Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
C.	Choose k to be the largest value so that 99% of the variance is retained.
D.	Use the elbow method.
Answer» B. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).

Discussion

8.	The branch of statistics which deals with development of particularstatistical methods is classified as
A.	industry statistics
B.	economic statistics
C.	applied statistics
D.	applied statistics
Answer» E.

Discussion

9.	Numbers ,text, image, audio and video data is ____
A.	Volume
B.	Value
C.	Varity
D.	Variety
Answer» E.

Discussion

10.	Which of the following is tool for checking normality?
A.	qqline()
B.	qline()
C.	anova()
D.	lm()
Answer» B. qline()

Discussion

11.	File containing R scripts end with extension _______.
A.	.R
B.	.S
C.	.bigdata
D.	All of the above
Answer» B. .S

Discussion

12.	Common use cases for data visualization include?
A.	Politics
B.	Sales and marketing
C.	Healthcare
D.	All of the above
Answer» E.

Discussion

13.	According to analysts, for what can traditional IT systems provide afoundation when they’re integrated with big data technologies like Hadoop?
A.	Big data management and data mining
B.	Data warehousing and business intelligence
C.	Management of Hadoop clusters
D.	Collecting and storing unstructured data
Answer» B. Data warehousing and business intelligence

Discussion

14.	The examination of large amounts of data to see what patterns or otheruseful information can be found is known as
A.	Data examination
B.	Information analysis
C.	Big data analytics
D.	Data analysis
Answer» D. Data analysis

Discussion

15.	______ is the term that is used to describe data that is high volume , highvelocity and /or high variety.
A.	Analytics
B.	Bigdata
C.	Hadoop Data
D.	Bigdata analytics
Answer» C. Hadoop Data

Discussion

16.	Raw data should be processed only one time.
A.	True
B.	False
C.	Can be true or false
D.	Can not say
Answer» C. Can be true or false

Discussion

17.	Which of the following are ML methods?
A.	based on human supervision
B.	supervised Learning
C.	semi-reinforcement Learning
D.	All of the above
Answer» B. supervised Learning

Discussion

18.	Which of the following are correct component for data science?
A.	Data Engineering
B.	Advanced Computing
C.	Domain expertise
D.	All of the above
Answer» E.

Discussion

19.	Which of the following techniques can not be used for normalization intext mining?
A.	Stemming
B.	Lemmatization
C.	Stop Word Removal
D.	None of the above
Answer» D. None of the above

Discussion

20.	In Model based learning methods, an iterative process takes place on theML models that are built based on various model parameters, called ?
A.	mini-batches
B.	optimizedparameters
C.	hyperparameters
D.	superparameters
Answer» D. superparameters

Discussion

21.	What is true about Machine Learning?
A.	Machine Learning (ML) is that field of computer science
B.	ML is a type of artificial intelligence that extract patterns out of raw data by using an algorithm or method.
C.	The main focus of ML is to allow computer systems learn from experience without being explicitly programmed or human intervention.
D.	All of the above
Answer» E.

Discussion

22.	Data Analysis is defined by the statistician?
A.	William S.
B.	Hans Peter Luhn
C.	Gregory Piatetsky-Shapiro
D.	John Tukey
Answer» E.

Discussion

23.	A__________ begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written.
A.	bottow-up parser
B.	top parser
C.	top-down parser
D.	bottom parser
Answer» D. bottom parser

Discussion

24.	Which of the following step is performed by data scientist after acquiringthe data?
A.	Data Cleaning
B.	Data Integration
C.	Data Replication
D.	All of the above
Answer» B. Data Integration

Discussion

25.	The modern conception of data science as an independent discipline issometimes attributed to?
A.	William S.
B.	John McCarthy
C.	Arthur Samuel
D.	Satoshi Nakamoto
Answer» B. John McCarthy

Discussion

26.	Which of the following is true about regression analysis?
A.	answering yes/no questions about the data
B.	estimating numerical characteristics of the data
C.	modeling relationships within the data
D.	describing associations within the data
Answer» D. describing associations within the data

Discussion

27.	How many layers Deep learning algorithms are constructed?
A.	2
B.	3
C.	4
D.	5
Answer» C. 4

Discussion

28.	Which of the following is a subset of machine learning?
A.	Numpy
B.	SciPy
C.	Deep Learning
D.	All of the above
Answer» D. All of the above

Discussion

29.	Who popularized bigdata term?
A.	John deere
B.	John Mashey
C.	johny Mashe
D.	Jhon Mash
Answer» C. johny Mashe

Discussion

30.	___________ is general-purpose computing model and runtime system fordistributed data analytics.
A.	Mapreduce
B.	Drill
C.	Oozie
D.	None of the above
Answer» B. Drill

Discussion

31.	In which of the following cases will K-means clustering fail to give goodresults? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	All of the above
Answer» E.

Discussion

32.	Data science is the process of diverse set of data through ?
A.	organizing data
B.	processing data
C.	analysing data
D.	All of the above
Answer» E.

Discussion

33.	To find the minimum or the maximum of a function, we set the gradient to zero because:
A.	The value of the gradient at extrema of a function is always zero
B.	Depends on the type of problem
C.	Both A and B
D.	None of the above
Answer» B. Depends on the type of problem

Discussion

34.	To find the minimum or the maximum of a function, we set the gradient tozero because:
A.	The value of the gradient at extrema of a function is always zero
B.	Depends on the type of problem
C.	Both A and B
D.	None of the above
Answer» B. Depends on the type of problem

Discussion

35.	Which of the following are the Data Sources in data science?
A.	Structured
B.	Unstructured
C.	Both A and B
D.	None Of the above
Answer» D. None Of the above

Discussion

36.	The model will be trained with data in one single batch is known as ?
A.	Batch learning
B.	Offline learning
C.	Both A and B
D.	None of the above
Answer» D. None of the above

Discussion

37.	Data can be visualized using?
A.	graphs
B.	charts
C.	maps
D.	All of the above
Answer» E.

Discussion

38.	The goal of business intelligence is to allow easy interpretation of largevolumes of data to identify new opportunities.
A.	TRUE
B.	FALSE
C.	Can be true or false
D.	Can not say
Answer» B. FALSE

Discussion

39.	Which of the following model model include a backwards elimination featureselection routine?
A.	MCV
B.	MARS
C.	MCRS
D.	All of the Mentioned
Answer» C. MCRS

Discussion

40.	Which method shows hierarchical data in a nested format?
A.	Treemaps
B.	Scatter plots
C.	Population pyramids
D.	Area charts
Answer» B. Scatter plots

Discussion

41.	Which is used to inference for 1 proportion using normal approx?
A.	fisher.test()
B.	chisq.test()
C.	Lm.test()
D.	prop.test()
Answer» E.

Discussion

42.	Which of the following is the common goal of statistical modelling?
A.	Inference
B.	Summarizing
C.	Subsetting
D.	None of the above
Answer» B. Summarizing

Discussion

43.	When performing regression or classification, which of the following is thecorrect way to preprocess the data?
A.	Normalize the data -> PCA -> training
B.	PCA -> normalize PCA output -> training
C.	Normalize the data -> PCA -> normalize PCA output -> training
D.	None of the above
Answer» B. PCA -> normalize PCA output -> training

Discussion

44.	Which of the following is one of the key data science skills?
A.	Statistics
B.	Machine Learning
C.	Data Visualization
D.	All of the above
Answer» E.

Discussion

45.	Text Analytics, also referred to as Text Mining?
A.	TRUE
B.	FALSE
C.	Can be true or false
D.	Can not say
Answer» B. FALSE

Discussion

46.	Which of the following techniques can not be used for normalization in text mining?
A.	Stemming
B.	Lemmatization
C.	Stop Word Removal
D.	None of the above
Answer» D. None of the above

Discussion

47.	Real time data is ______.
A.	Field
B.	Primary Key
C.	unique
D.	record
Answer» D. record

Discussion

48.	Which of the following model is usually a gold standard for data analysis?
A.	Inferential
B.	Descriptive
C.	Causal
D.	All of the above
Answer» D. All of the above

Discussion

49.	Which of the following is a key characteristic of a hacker?
A.	Afraid to say they don't know the answer
B.	Willing to find answers on their own
C.	Not Willing to find answers on their own
D.	All of the above
Answer» C. Not Willing to find answers on their own

Discussion

50.	In which of the following cases will K-means clustering fail to give good results?1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	All of the above
Answer» E.

Discussion

Explore topic-wise MCQs in Mongodb.

Which is used to find the factor congruence coefficients?

What is true about Data Visualization?

The new source of big data that will trigger a Big Data revolution in theyears to come is?

What is a sentence parser typically used for?

In descriptive statistics, data from the entire population or a sample issummarized with ?

Which of the following can be used to impute data sets based only on informationin the training set?

Which of the following is a reasonable way to select the number of principal components "k"?

The branch of statistics which deals with development of particularstatistical methods is classified as

Numbers ,text, image, audio and video data is ____

Which of the following is tool for checking normality?

File containing R scripts end with extension _______.

Common use cases for data visualization include?

According to analysts, for what can traditional IT systems provide afoundation when they’re integrated with big data technologies like Hadoop?

The examination of large amounts of data to see what patterns or otheruseful information can be found is known as

______ is the term that is used to describe data that is high volume , highvelocity and /or high variety.

Raw data should be processed only one time.

Which of the following are ML methods?

Which of the following are correct component for data science?

Which of the following techniques can not be used for normalization intext mining?

In Model based learning methods, an iterative process takes place on theML models that are built based on various model parameters, called ?

What is true about Machine Learning?

Data Analysis is defined by the statistician?

A__________ begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written.

Which of the following step is performed by data scientist after acquiringthe data?

The modern conception of data science as an independent discipline issometimes attributed to?

Which of the following is true about regression analysis?

How many layers Deep learning algorithms are constructed?

Which of the following is a subset of machine learning?

Who popularized bigdata term?

___________ is general-purpose computing model and runtime system fordistributed data analytics.

In which of the following cases will K-means clustering fail to give goodresults? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes

Data science is the process of diverse set of data through ?

To find the minimum or the maximum of a function, we set the gradient to zero because:

To find the minimum or the maximum of a function, we set the gradient tozero because:

Which of the following are the Data Sources in data science?

The model will be trained with data in one single batch is known as ?

Data can be visualized using?

The goal of business intelligence is to allow easy interpretation of largevolumes of data to identify new opportunities.

Which of the following model model include a backwards elimination featureselection routine?

Which method shows hierarchical data in a nested format?

Which is used to inference for 1 proportion using normal approx?

Which of the following is the common goal of statistical modelling?

When performing regression or classification, which of the following is thecorrect way to preprocess the data?

Which of the following is one of the key data science skills?

Text Analytics, also referred to as Text Mining?

Which of the following techniques can not be used for normalization in text mining?

Real time data is ______.

Which of the following model is usually a gold standard for data analysis?

Which of the following is a key characteristic of a hacker?

In which of the following cases will K-means clustering fail to give good results?1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes