940 + Mcqs in Machine Learning Page 7 McqOptions

301.	Suppose you are given ‘n’ predictions on test data by ‘n’ different models (M1, M2, …. Mn) respectively. Which of the following method(s) can be used to combine the predictions of these models?Note: We are working on a regression problem1. Median2. Product3. Average4. Weighted sum5. Minimum and Maximum6. Generalized mean rule
A.	1, 3 and 4
B.	1,3 and 6
C.	1,3, 4 and 6
D.	all of above
Answer» E.

Discussion

302.	How can we assign the weights to output of different models in an ensemble?1. Use an algorithm to return the optimal weights2. Choose the weights using cross validation3. Give high weights to more accurate models
A.	1 and 2
B.	1 and 3
C.	2 and 3
D.	all of above
Answer» E.

Discussion

303.	Which of the following is true about averaging ensemble?
A.	it can only be used in classification problem
B.	it can only be used in regression problem
C.	it can be used in both classification as well as regression
D.	none of these
Answer» D. none of these

Discussion

304.	Which of the following is true about weighted majority votes?1. We want to give higher weights to better performing models2. Inferior models can overrule the best model if collective weighted votes for inferior models is higher than best model3. Voting is special case of weighted voting
A.	1 and 3
B.	2 and 3
C.	1 and 2
D.	1, 2 and 3
Answer» E.

Discussion

305.	Which of the following are correct statement(s) about stacking?A machine learning model is trained on predictions of multiple machine learning modelsA Logistic regression will definitely work better in the second stage as compared to other classification methodsFirst stage models are trained on full / partial feature space of training data
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	all of above
Answer» D. all of above

Discussion

306.	Q25. Which of the following are advantages of stacking?1) More robust model2) better prediction3) Lower time of execution
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	all of the above
Answer» B. 2 and 3

Discussion

307.	Which of the following can be one of the steps in stacking?1. Divide the training data into k folds2. Train k models on each k-1 folds and get the out of fold predictions for remaining one fold3. Divide the test data set in “k” folds and get individual fold predictions by different algorithms
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	all of above
Answer» B. 2 and 3

Discussion

308.	Which of the following is the difference between stacking and blending?
A.	stacking has less stable cv compared to blending
B.	in blending, you create out of fold prediction
C.	stacking is simpler than blending
D.	none of these
Answer» E.

Discussion

309.	Suppose you are using stacking with n different machine learning algorithms with k folds on data.Which of the following is true about one level (m base models + 1 stacker) stacking?Note:Here, we are working on binary classification problemAll base models are trained on all featuresYou are using k folds for base models
A.	you will have only k features after the first stage
B.	you will have only m features after the first stage
C.	you will have k+m features after the first stage
D.	you will have k*n features after the first stage
Answer» C. you will have k+m features after the first stage

Discussion

310.	Which of the following is true about bagging?1. Bagging can be parallel2. The aim of bagging is to reduce bias not variance3. Bagging helps in reducing overfitting
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	all of these
Answer» D. all of these

Discussion

311.	True or False: In boosting, individual base learners can be parallel.
A.	true
B.	false
Answer» C.

Discussion

312.	Below are the two ensemble models:1. E1(M1, M2, M3) and2. E2(M4, M5, M6)Above, Mx is the individual base models.Which of the following are more likely to choose if following conditions for E1 and E2 are given?E1: Individual Models accuracies are high but models are of the same type or in another term less diverseE2: Individual Models accuracies are high but they are of different types in another term high diverse in nature
A.	e1
B.	e2
C.	any of e1 and e2
D.	none of these
Answer» C. any of e1 and e2

Discussion

313.	Suppose, you have 2000 different models with their predictions and want to ensemble predictions of best x models. Now, which of the following can be a possible method to select the best x models for an ensemble?
A.	step wise forward selection
B.	step wise backward elimination
C.	both
D.	none of above
Answer» D. none of above

Discussion

314.	Suppose, you want to apply a stepwise forward selection method for choosing the best models for an ensemble model. Which of the following is the correct order of the steps?Note: You have more than 1000 models predictions1. Add the models predictions (or in another term take the average) one by one in the ensemble which improves the metrics in the validation set.2. Start with empty ensemble3. Return the ensemble from the nested set of ensembles that has maximum performance on the validation set
A.	1-2-3
B.	1-3-4
C.	2-1-3
D.	none of above
Answer» E.

Discussion

315.	True or False: Dropout is computationally expensive technique w.r.t. bagging
A.	true
B.	false
Answer» C.

Discussion

316.	How is the model capacity affected with dropout rate (where model capacity means the ability of a neural network to approximate complex functions)?
A.	model capacity increases in increase in dropout rate
B.	model capacity decreases in increase in dropout rate
C.	model capacity is not affected on increase in dropout rate
D.	none of these
Answer» C. model capacity is not affected on increase in dropout rate

Discussion

317.	In machine learning, an algorithm (or learning algorithm) is said to be unstable if a small change in training data cause the large change in the learned classifiers.True or False: Bagging of unstable classifiers is a good idea
A.	true
B.	false
Answer» B. false

Discussion

318.	Suppose there are 25 base classifiers. Each classifier has error rates of e = 0.35.Suppose you are using averaging as ensemble technique. What will be the probabilities that ensemble of above 25 classifiers will make a wrong prediction?Note: All classifiers are independent of each other
A.	0.05
B.	0.06
C.	0.07
D.	0.09
Answer» C. 0.07

Discussion

319.	In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure?Hint: Persons are like base models of ensemble method.
A.	bagging
B.	boosting
C.	a or b
D.	none of these
Answer» B. boosting

Discussion

320.	Generally, an ensemble method works better, if the individual base models have ____________?Note: Suppose each individual base models have accuracy greater than 50%.
A.	less correlation among predictions
B.	high correlation among predictions
C.	correlation does not have any impact on ensemble output
D.	none of the above
Answer» B. high correlation among predictions

Discussion

321.	If you use an ensemble of different base models, is it necessary to tune the hyper parameters of all base models to improve the ensemble performance?
A.	yes
B.	no
C.	can’t say
Answer» C. can’t say

Discussion

322.	True or False: Ensemble of classifiers may or may not be more accurate than any of its individual model.
A.	true
B.	false
Answer» B. false

Discussion

323.	Which of the following is / are true about weak learners used in ensemble model?1. They have low variance and they don’t usually overfit2. They have high bias, so they can not solve hard learning problems3. They have high variance and they don’t usually overfit
A.	1 and 2
B.	1 and 3
C.	2 and 3
D.	none of these
Answer» B. 1 and 3

Discussion

324.	True or False: Ensembles will yield bad results when there is significant diversity among the models.Note: All individual models have meaningful and good predictions.
A.	true
B.	false
Answer» C.

Discussion

325.	True or False: Ensemble learning can only be applied to supervised learning methods.
A.	true
B.	false
Answer» C.

Discussion

326.	Which of the following can be true for selecting base learners for an ensemble?1. Different learners can come from same algorithm with different hyper parameters2. Different learners can come from different algorithms3. Different learners can come from different training spaces
A.	1
B.	2
C.	1 and 3
D.	1, 2 and 3
Answer» E.

Discussion

327.	Which of the following option is / are correct regarding benefits of ensemble model?1. Better performance2. Generalized models3. Better interpretability
A.	1 and 3
B.	2 and 3
C.	1 and 2
D.	1, 2 and 3
Answer» D. 1, 2 and 3

Discussion

328.	What is true about an ensembled classifier?1. Classifiers that are more “sure” can vote with more conviction2. Classifiers can be more “sure” about a particular part of the space3. Most of the times, it performs better than a single classifier
A.	1 and 2
B.	1 and 3
C.	2 and 3
D.	all of the above
Answer» E.

Discussion

329.	The F-test
A.	an omnibus test
B.	considers the reduction in error when moving from the complete model to the reduced model
C.	considers the reduction in error when moving from the reduced model to the complete model
D.	can only be conceptualized as a reduction in error
Answer» D. can only be conceptualized as a reduction in error

Discussion

330.	Increase in size of a convolutional kernel would necessarily increase the performance of a convolutional network.
A.	true
B.	false
Answer» C.

Discussion

331.	A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the constant of proportionality being equal to 2. The inputs are 4, 10, 10 and 30 respectively. What will be the output?
A.	238
B.	76
C.	248
D.	348
Answer» E.

Discussion

332.	What are the steps for using a gradient descent algorithm?1)Calculate error between the actual value and the predicted value2)Reiterate until you find the best weights of network3)Pass an input through the network and get values from output layer4)Initialize random weight and bias5)Go to each neurons which contributes to the error and change its respective values to reduce the error
A.	1, 2, 3, 4, 5
B.	4, 3, 1, 5, 2
C.	3, 2, 1, 5, 4
D.	5, 4, 3, 2, 1
Answer» C. 3, 2, 1, 5, 4

Discussion

333.	Given above is a description of a neural network. When does a neural network model become a deep learning model?
A.	when you add more hidden layers and increase depth of neural network
B.	when there is higher dimensionality of data
C.	when the problem is an image recognition problem
D.	when there is lower dimensionality of data
Answer» B. when there is higher dimensionality of data

Discussion

334.	Which of the following are correct statement(s) about stacking?1. A machine learning model is trained on predictions of multiple machine learning models2. A Logistic regression will definitely work better in the second stage as compared to other classification methods3. First stage models are trained on full / partial feature space of training data
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	1,2 and 3
Answer» D. 1,2 and 3

Discussion

335.	In which neural net architecture, does weight sharing occur?
A.	recurrent neural network
B.	convolutional neural network
C.	. fully connected neural network
D.	both a and b
Answer» E.

Discussion

336.	What is the sequence of the following tasks in a perceptron?Initialize weights of perceptron randomlyGo to the next batch of datasetIf the prediction does not match the output, change the weightsFor a sample input, compute an output
A.	1, 4, 3, 2
B.	3, 1, 2, 4
C.	4, 3, 2, 1
D.	1, 2, 3, 4
Answer» B. 3, 1, 2, 4

Discussion

337.	In an election for the head of college, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes.which of the following ensembles method works similar to the discussed elction Procedure?
A.	??bagging
B.	boosting
C.	stacking
D.	randomization
Answer» B. boosting

Discussion

338.	What is back propagation?a) It is another name given to the curvy function in the perceptronb) It is the transmission of error back through the network to adjust the inputsc) It is the transmission of error back through the network to allow weights to be adjusted so that the network can learnd) None of the mentioned
A.	a
B.	b
C.	c
D.	b&c
Answer» D. b&c

Discussion

339.	Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms?1. Max number of samples2. Max features3. Bootstrapping of samples4. Bootstrapping of features
A.	1
B.	2
C.	3&4
D.	1,2,3&4
Answer» E.

Discussion

340.	The network that involves backward links from output to the input and hidden layers is called
A.	self organizing maps
B.	perceptrons
C.	recurrent neural network
D.	multi layered perceptron
Answer» D. multi layered perceptron

Discussion

341.	Which one of the following is not a major strength of the neural network approach?
A.	neural network learning algorithms are guaranteed to converge to an optimal solution
B.	neural networks work well with datasets containing noisy data
C.	neural networks can be used for both supervised learning and unsupervised clustering
D.	neural networks can be used for applications that require a time element to be included in the data
Answer» B. neural networks work well with datasets containing noisy data

Discussion

342.	Having multiple perceptrons can actually solve the XOR problem satisfactorily: this is because each perceptron can partition off a linear part of the space itself, and they can then combine their results.
A.	true – this works always, and these multiple perceptrons learn to classify even complex problems
B.	false – perceptrons are mathematically incapable of solving linearly inseparable functions, no matter what you do
C.	true – perceptrons can do this but are unable to learn to do it – they have to be explicitly hand-coded
D.	false – just having a single perceptron is enough
Answer» D. false – just having a single perceptron is enough

Discussion

343.	Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance2. Generalized models3. Better interpretability
A.	1 and 3
B.	2 and 3
C.	1, 2 and 3
D.	1 and 2
Answer» E.

Discussion

344.	In Naive Bayes equation P(C / X)= (P(X / C) *P(C) ) / P(X) which part considers "likelihood"?
A.	p(x/c)
B.	p(c/x)
C.	p(c)
D.	p(x)
Answer» B. p(c/x)

Discussion

345.	8 observations are clustered into 3 clusters using K-Means clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations:C1: {(2,2), (4,4), (6,6)}C2: {(0,4), (4,0),(2,5)}C3: {(5,5), (9,9)}What will be the cluster centroids if you want to proceed for second iteration?
A.	c1: (4,4), c2: (2,2), c3: (7,7)
B.	c1: (6,6), c2: (4,4), c3: (9,9)
C.	c1: (2,2), c2: (0,0), c3: (5,5)
D.	c1: (4,4), c2: (3,3), c3: (7,7)
Answer» E.

Discussion

346.	Skewness of Normal distribution is ___________
A.	negative
B.	positive
C.	undefined
Answer» D.

Discussion

347.	Which of the following statements about Naive Bayes is incorrect?
A.	attributes are equally important.
B.	attributes are statistically dependent of one another given the class value.
C.	attributes are statistically independent of one another given the class value.
D.	attributes can be nominal or numeric
Answer» C. attributes are statistically independent of one another given the class value.

Discussion

348.	Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that Select one:
A.	y is false when x is known to be false.
B.	y is true when x is known to be true.
C.	x is true when y is known to be true
D.	x is false when y is known to be false.
Answer» C. x is true when y is known to be true

Discussion

349.	Consider the following dataset. x,y,z are the features and T is a class(1/0). Classify the test data (0,0,1) as values of x,y,z respectively.
A.	0
B.	1
C.	0.1
D.	0.9
Answer» C. 0.1

Discussion

350.	Which of the following quantities are minimized directly or indirectly during parameter estimation in Gaussian distribution Model?
A.	negative log-likelihood
B.	log-liklihood
C.	cross entropy
D.	residual sum of square
Answer» B. log-liklihood

Discussion

Explore topic-wise MCQs in Artificial Intelligence.

How can we assign the weights to output of different models in an ensemble?1. Use an algorithm to return the optimal weights2. Choose the weights using cross validation3. Give high weights to more accurate models

Which of the following is true about averaging ensemble?

Which of the following is true about weighted majority votes?1. We want to give higher weights to better performing models2. Inferior models can overrule the best model if collective weighted votes for inferior models is higher than best model3. Voting is special case of weighted voting

Q25. Which of the following are advantages of stacking?1) More robust model2) better prediction3) Lower time of execution

Which of the following can be one of the steps in stacking?1. Divide the training data into k folds2. Train k models on each k-1 folds and get the out of fold predictions for remaining one fold3. Divide the test data set in “k” folds and get individual fold predictions by different algorithms

Which of the following is the difference between stacking and blending?

Which of the following is true about bagging?1. Bagging can be parallel2. The aim of bagging is to reduce bias not variance3. Bagging helps in reducing overfitting

True or False: In boosting, individual base learners can be parallel.

Suppose, you have 2000 different models with their predictions and want to ensemble predictions of best x models. Now, which of the following can be a possible method to select the best x models for an ensemble?

True or False: Dropout is computationally expensive technique w.r.t. bagging

How is the model capacity affected with dropout rate (where model capacity means the ability of a neural network to approximate complex functions)?

In machine learning, an algorithm (or learning algorithm) is said to be unstable if a small change in training data cause the large change in the learned classifiers.True or False: Bagging of unstable classifiers is a good idea

Suppose there are 25 base classifiers. Each classifier has error rates of e = 0.35.Suppose you are using averaging as ensemble technique. What will be the probabilities that ensemble of above 25 classifiers will make a wrong prediction?Note: All classifiers are independent of each other

Generally, an ensemble method works better, if the individual base models have ____________?Note: Suppose each individual base models have accuracy greater than 50%.

If you use an ensemble of different base models, is it necessary to tune the hyper parameters of all base models to improve the ensemble performance?

True or False: Ensemble of classifiers may or may not be more accurate than any of its individual model.

Which of the following is / are true about weak learners used in ensemble model?1. They have low variance and they don’t usually overfit2. They have high bias, so they can not solve hard learning problems3. They have high variance and they don’t usually overfit

True or False: Ensembles will yield bad results when there is significant diversity among the models.Note: All individual models have meaningful and good predictions.

True or False: Ensemble learning can only be applied to supervised learning methods.

Which of the following can be true for selecting base learners for an ensemble?1. Different learners can come from same algorithm with different hyper parameters2. Different learners can come from different algorithms3. Different learners can come from different training spaces

Which of the following option is / are correct regarding benefits of ensemble model?1. Better performance2. Generalized models3. Better interpretability

What is true about an ensembled classifier?1. Classifiers that are more “sure” can vote with more conviction2. Classifiers can be more “sure” about a particular part of the space3. Most of the times, it performs better than a single classifier

The F-test

Increase in size of a convolutional kernel would necessarily increase the performance of a convolutional network.

A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the constant of proportionality being equal to 2. The inputs are 4, 10, 10 and 30 respectively. What will be the output?

Given above is a description of a neural network. When does a neural network model become a deep learning model?

In which neural net architecture, does weight sharing occur?

What is the sequence of the following tasks in a perceptron?Initialize weights of perceptron randomlyGo to the next batch of datasetIf the prediction does not match the output, change the weightsFor a sample input, compute an output

In an election for the head of college, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes.which of the following ensembles method works similar to the discussed elction Procedure?

Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms?1. Max number of samples2. Max features3. Bootstrapping of samples4. Bootstrapping of features

The network that involves backward links from output to the input and hidden layers is called

Which one of the following is not a major strength of the neural network approach?

Having multiple perceptrons can actually solve the XOR problem satisfactorily: this is because each perceptron can partition off a linear part of the space itself, and they can then combine their results.

Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance2. Generalized models3. Better interpretability

In Naive Bayes equation P(C / X)= (P(X / C) *P(C) ) / P(X) which part considers "likelihood"?

8 observations are clustered into 3 clusters using K-Means clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations:C1: {(2,2), (4,4), (6,6)}C2: {(0,4), (4,0),(2,5)}C3: {(5,5), (9,9)}What will be the cluster centroids if you want to proceed for second iteration?

Skewness of Normal distribution is ___________

Which of the following statements about Naive Bayes is incorrect?

Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that Select one:

Consider the following dataset. x,y,z are the features and T is a class(1/0). Classify the test data (0,0,1) as values of x,y,z respectively.

Which of the following quantities are minimized directly or indirectly during parameter estimation in Gaussian distribution Model?