Let's say, a Linear regression model perfectly fits the training data (train error

you will always have test error zero

you can not have test error zero

8 observations are clustered into 3 clusters using K-Means clustering algorithm. After first iteration clusters,C1, C2, C3 has following observations:C1: {(2,2), (4,4), (6,6)}C2: {(0,4), (4,0),(2,5)}C3: {(5,5), (9,9)}What will be the cluster centroids if you want to proceed for second iteration?

c1: (4,4), c2: (2,2), c3: (7,7)

c1: (6,6), c2: (4,4), c3: (9,9)

c1: (2,2), c2: (0,0), c3: (5,5)

c1: (4,4), c2: (3,3), c3: (7,7)

88 + Mcqs in Machine Learning (ML) in Computer Science Engineering (CSE) Page 2 McqOptions

51.	During the last few years, many algorithms have been applied to deepneural networks to learn the best policy for playing Atari video games and to teach an agent how to associate the right action with an input representingthe state.
A.	logical
B.	classical
C.	classification
D.	none of above
Answer» E.

Discussion

52.	Bernoulli Nave Bayes Classifier is distribution
A.	continuous
B.	discrete
C.	binary
Answer» D.

Discussion

53.	Multinomial Nave Bayes Classifier is distribution
A.	continuous
B.	discrete
C.	binary
Answer» C. binary

Discussion

54.	Multinomial Na ve Bayes Classifier is _ distribution
A.	continuous
B.	discrete
C.	binary
Answer» C. binary

Discussion

55.	Gaussian Nave Bayes Classifier is distribution
A.	continuous
B.	discrete
C.	binary
Answer» B. discrete

Discussion

56.	Gaussian Na ve Bayes Classifier is _ distribution
A.	continuous
B.	discrete
C.	binary
Answer» B. discrete

Discussion

57.	Multinomial Na ve Bayes Classifier is _ distribution
A.	continuous
B.	discrete
C.	binary
Answer» C. binary

Discussion

58.	Gaussian Na ve Bayes Classifier is _ distribution
A.	continuous
B.	discrete
C.	binary
Answer» B. discrete

Discussion

59.	How can we assign the weights to output of different models in an ensemble?1. Use an algorithm to return the optimal weights2. Choose the weights using cross validation3. Give high weights to more accurate models
A.	1 and 2
B.	1 and 3
C.	2 and 3
D.	all of above
Answer» E.

Discussion

60.	PCA works better if there is1. A linear structure in the data2. If the data lies on a curved surface and not on a flat surface3. If variables are scaled in the same unit
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	1,2 and 3
Answer» D. 1,2 and 3

Discussion

61.	Let's say, a Linear regression model perfectly fits the training data (train error
A.	you will always have test error zero
B.	you can not have test error zero
C.	none of the above
Answer» D.

Discussion

62.	Bayes Theorem is given by where 1. P(H) is the probability of hypothesis H being true.2. P(E) is the probability of the evidence(regardless of the hypothesis).3. P(E\|H) is the probability of the evidence given that hypothesis is true.4. P(H\|E) is the probability of the hypothesis given that the evidence is there.
A.	true
B.	false
Answer» B. false

Discussion

63.	What is the sequence of the following tasks in a perceptron?Initialize weights of perceptron randomlyGo to the next batch of datasetIf the prediction does not match the output, change the weightsFor a sample input, compute an output
A.	1, 4, 3, 2
B.	3, 1, 2, 4
C.	4, 3, 2, 1
D.	1, 2, 3, 4
Answer» B. 3, 1, 2, 4

Discussion

64.	Suppose, you got a situation where you find that your linear regression model is under fitting the data. In such situation which of the following options would you consider?1. I will add more variables2. I will start introducing polynomial degree variables3. I will remove some variables
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	1, 2 and 3
Answer» B. 2 and 3

Discussion

65.	Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms?1. Max number of samples2. Max features3. Bootstrapping of samples4. Bootstrapping of features
A.	1 and 3
B.	2 and 3
C.	1 and 2
D.	all of above
Answer» E.

Discussion

66.	Having built a decision tree, we are using reduced error pruning to reduce the size of thetree. We select a node to collapse. For this particular node, on the left branch, there are 3training data points with the following outputs: 5, 7, 9.6 and for the right branch, there arefour training data points with the following outputs: 8.7, 9.8, 10.5, 11. What were the originalresponses for data points along the two branches (left & right respectively) and what is thenew response after collapsing the node?
A.	10.8, 13.33, 14.48
B.	10.8, 13.33, 12.06
C.	7.2, 10, 8.8
D.	7.2, 10, 8.6
Answer» D. 7.2, 10, 8.6

Discussion

67.	Which of the following can be one of the steps in stacking?1. Divide the training data into k folds2. Train k models on each k-1 folds and get the out of fold predictions for remaining one fold3. Divide the test data set in k folds and get individual fold predictions by different algorithms
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	all of above
Answer» B. 2 and 3

Discussion

68.	Suppose you are given n predictions on test data by n different models (M1, M2, . Mn) respectively. Which of the following method(s) can be used to combine the predictions of these models? Note: We are working on a regression problem 1. Median 2. Product 3. Average 4. Weighted sum 5. Minimum and Maximum 6. Generalized mean rule
A.	1, 3 and 4
B.	1,3 and 6
C.	1,3, 4 and 6
D.	all of above
Answer» E.

Discussion

69.	Which of the following assumptions do we make while deriving linear regression parameters?1. The true relationship between dependent y and predictor x is linear2. The model errors are statistically independent3. The errors are normally distributed with a 0 mean and constant standard deviation4. The predictor x is non-stochastic and is measured error-free
A.	1,2 and 3.
B.	1,3 and 4.
C.	1 and 3.
D.	all of above.
Answer» E.

Discussion

70.	If I am using all features of my dataset and I achieve 100% accuracy on my training set, but~70% on validation set, what should I look outfor?
A.	underfitting
B.	nothing, the model is perfect
C.	overfitting
Answer» D.

Discussion

71.	Give the correct Answer for following statements.1. It is important to perform feature normalization before using the Gaussian kernel.2. The maximum value of the Gaussian kernel is 1.
A.	1 is true, 2 is false
B.	1 is false, 2 is true
C.	1 is true, 2 is true
D.	1 is false, 2 is false
Answer» D. 1 is false, 2 is false

Discussion

72.	Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering?1. Single-link2. Complete-link3. Average-link
A.	1 and 2
B.	1 and 3
C.	2 and 3
D.	1, 2 and 3
Answer» E.

Discussion

73.	Which of the following can act as possible termination conditions in K-Means?1. For a fixed number of iterations.2. Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.3. Centroids do not change between successive iterations.4. Terminate when RSS falls below a threshold.
A.	1, 3 and 4
B.	1, 2 and 3
C.	1, 2 and 4
D.	1,2,3,4
Answer» E.

Discussion

74.	The minimum time complexity for training an SVM is O(n2). According to this fact, what sizes of datasets are not best suited for SVM s?
A.	large datasets
B.	small datasets
C.	medium sized datasets
D.	size does not matter
Answer» B. small datasets

Discussion

75.	What is true about K-Mean Clustering?1. K-means is extremely sensitive to cluster center initializations2. Bad initialization can lead to Poor convergence speed3. Bad initialization can lead to bad overall clustering
A.	1 and 3
B.	1 and 2
C.	2 and 3
D.	1, 2 and 3
Answer» E.

Discussion

76.	Select the correct answers for following statements.1. Filter methods are much faster compared to wrapper methods.2. Wrapper methods use statistical methods for evaluation of a subset of features while Filter methods use cross validation.
A.	both are true
B.	1 is true and 2 is false
C.	both are false
D.	1 is false and 2 is true
Answer» C. both are false

Discussion

77.	Below are the two ensemble models:1. E1(M1, M2, M3) and2. E2(M4, M5, M6)Above, Mx is the individual base models.Which of the following are more likely to choose if following conditions for E1 and E2 are given?E1: Individual Models accuracies are high but models are of the same type or in another term less diverseE2: Individual Models accuracies are high but they are of different types in another term high diverse in nature
A.	e1
B.	e2
C.	any of e1 and e2
D.	none of these
Answer» C. any of e1 and e2

Discussion

78.	8 observations are clustered into 3 clusters using K-Means clustering algorithm. After first iteration clusters,C1, C2, C3 has following observations:C1: {(2,2), (4,4), (6,6)}C2: {(0,4), (4,0),(2,5)}C3: {(5,5), (9,9)}What will be the cluster centroids if you want to proceed for second iteration?
A.	c1: (4,4), c2: (2,2), c3: (7,7)
B.	c1: (6,6), c2: (4,4), c3: (9,9)
C.	c1: (2,2), c2: (0,0), c3: (5,5)
D.	c1: (4,4), c2: (3,3), c3: (7,7)
Answer» E.

Discussion

79.	In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure? Hint: Persons are like base models of ensemble method.
A.	bagging
B.	1,3 and 6
C.	a or b
D.	none of these
Answer» B. 1,3 and 6

Discussion

80.	In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure?Hint: Persons are like base models of ensemble method.
A.	bagging
B.	boosting
C.	a or b
D.	none of these
Answer» B. boosting

Discussion

81.	We can also compute the coefficient of linear regression with the help of an analytical method called Normal Equation . Which of the following is/are true about Normal Equation ?1. We don t have to choose the learning rate2. It becomes slow when number of features is very large3. No need to iterate
A.	1 and 2
B.	1 and 3.
C.	2 and 3.
D.	1,2 and 3.
Answer» E.

Discussion

82.	Which of the following is true about bagging?1. Bagging can be parallel2. The aim of bagging is to reduce bias not variance3. Bagging helps in reducing overfitting
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	all of these
Answer» D. all of these

Discussion

83.	What is true about an ensembled classifier?1. Classifiers that are more sure can vote with more conviction2. Classifiers can be more sure about a particular part of the space3. Most of the times, it performs better than a single classifier
A.	1 and 2
B.	1 and 3
C.	2 and 3
D.	all of the above
Answer» E.

Discussion

84.	Suppose there are 25 base classifiers. Each classifier has error rates of e = 0.35.Suppose you are using averaging as ensemble technique. What will be the probabilities that ensemble of above 25 classifiers will make a wrong prediction?Note: All classifiers are independent of each other
A.	0.05
B.	0.06
C.	0.07
D.	0.09
Answer» C. 0.07

Discussion

85.	Which of the following is / are true about weak learners used in ensemble model?1. They have low variance and they don t usually overfit2. They have high bias, so they can not solve hard learning problems3. They have high variance and they don t usually overfit
A.	1 and 2
B.	1 and 3
C.	2 and 3
D.	none of these
Answer» B. 1 and 3

Discussion

86.	We can also compute the coefficient of linear regression with the help of an analytical method called Normal Equation.Which of the following is/are true about Normal Equation?1. We don't have to choose the learning rate2. It becomes slow when number of features is very large3. No need to iterate
A.	1 and 2
B.	1 and 3.
C.	2 and 3.
D.	1,2 and 3.
Answer» E.

Discussion

87.	To control the size of the tree, we need to control the number of regions. One approach todo this would be to split tree nodes only if the resultant decrease in the sum of squares errorexceeds some threshold. For the described method, which among the following are true?(a) It would, in general, help restrict the size of the trees (b) It has the potential to affect the performance of the resultant regression/classificationmodel(c) It is computationally infeasible
A.	a and b
B.	a and d
C.	b, c and d
D.	all of the above
Answer» B. a and d

Discussion

88.	What is the Accuracy in percentage based on following confusion matrix of three class classification. Confusion Matrix C= [14 0 0] [ 1 15 0] [ 0 0 6]
A.	0.75
B.	0.97
C.	0.95
D.	0.85
Answer» C. 0.95

Discussion

Explore topic-wise MCQs in Computer Science Engineering (CSE).

During the last few years, many algorithms have been applied to deepneural networks to learn the best policy for playing Atari video games and to teach an agent how to associate the right action with an input representingthe state.

Bernoulli Nave Bayes Classifier is distribution

Multinomial Nave Bayes Classifier is distribution

Multinomial Na ve Bayes Classifier is _ distribution

Gaussian Nave Bayes Classifier is distribution

Gaussian Na ve Bayes Classifier is _ distribution

Multinomial Na ve Bayes Classifier is _ distribution

Gaussian Na ve Bayes Classifier is _ distribution

How can we assign the weights to output of different models in an ensemble?1. Use an algorithm to return the optimal weights2. Choose the weights using cross validation3. Give high weights to more accurate models

PCA works better if there is1. A linear structure in the data2. If the data lies on a curved surface and not on a flat surface3. If variables are scaled in the same unit

Let's say, a Linear regression model perfectly fits the training data (train error

What is the sequence of the following tasks in a perceptron?Initialize weights of perceptron randomlyGo to the next batch of datasetIf the prediction does not match the output, change the weightsFor a sample input, compute an output

Suppose, you got a situation where you find that your linear regression model is under fitting the data. In such situation which of the following options would you consider?1. I will add more variables2. I will start introducing polynomial degree variables3. I will remove some variables

Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms?1. Max number of samples2. Max features3. Bootstrapping of samples4. Bootstrapping of features

Which of the following can be one of the steps in stacking?1. Divide the training data into k folds2. Train k models on each k-1 folds and get the out of fold predictions for remaining one fold3. Divide the test data set in k folds and get individual fold predictions by different algorithms

If I am using all features of my dataset and I achieve 100% accuracy on my training set, but~70% on validation set, what should I look outfor?

Give the correct Answer for following statements.1. It is important to perform feature normalization before using the Gaussian kernel.2. The maximum value of the Gaussian kernel is 1.

Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering?1. Single-link2. Complete-link3. Average-link

The minimum time complexity for training an SVM is O(n2). According to this fact, what sizes of datasets are not best suited for SVM s?

What is true about K-Mean Clustering?1. K-means is extremely sensitive to cluster center initializations2. Bad initialization can lead to Poor convergence speed3. Bad initialization can lead to bad overall clustering

Select the correct answers for following statements.1. Filter methods are much faster compared to wrapper methods.2. Wrapper methods use statistical methods for evaluation of a subset of features while Filter methods use cross validation.

8 observations are clustered into 3 clusters using K-Means clustering algorithm. After first iteration clusters,C1, C2, C3 has following observations:C1: {(2,2), (4,4), (6,6)}C2: {(0,4), (4,0),(2,5)}C3: {(5,5), (9,9)}What will be the cluster centroids if you want to proceed for second iteration?

We can also compute the coefficient of linear regression with the help of an analytical method called Normal Equation . Which of the following is/are true about Normal Equation ?1. We don t have to choose the learning rate2. It becomes slow when number of features is very large3. No need to iterate

Which of the following is true about bagging?1. Bagging can be parallel2. The aim of bagging is to reduce bias not variance3. Bagging helps in reducing overfitting

What is true about an ensembled classifier?1. Classifiers that are more sure can vote with more conviction2. Classifiers can be more sure about a particular part of the space3. Most of the times, it performs better than a single classifier

Suppose there are 25 base classifiers. Each classifier has error rates of e = 0.35.Suppose you are using averaging as ensemble technique. What will be the probabilities that ensemble of above 25 classifiers will make a wrong prediction?Note: All classifiers are independent of each other

Which of the following is / are true about weak learners used in ensemble model?1. They have low variance and they don t usually overfit2. They have high bias, so they can not solve hard learning problems3. They have high variance and they don t usually overfit

We can also compute the coefficient of linear regression with the help of an analytical method called Normal Equation.Which of the following is/are true about Normal Equation?1. We don't have to choose the learning rate2. It becomes slow when number of features is very large3. No need to iterate

What is the Accuracy in percentage based on following confusion matrix of three class classification. Confusion Matrix C= [14 0 0] [ 1 15 0] [ 0 0 6]