Which of the following statements are true for a design matrix X Rn d with d > n? (The rows are n sample points and the columns represent d features.)

least-squares linear regression computes the weights w = (xtx) 1 xty

the sample points are linearly separable

x has exactly d n eigenvectors with eigenvalue zero

at least one principal component direction is orthogonal to a hyperplane that contains all the sample points

Suppose you are using stacking with n different machine learning algorithms with k folds on data.Which of the following is true about one level (m base models + 1 stacker) stacking?Note:Here, we are working on binary classification problemAll base models are trained on all featuresYou are using k folds for base models

you will have k+m features after the first stage

you will have only k features after the first stage

you will have only m features after the first stage

you will have k+m features after the first stage

you will have k*n features after the first stage

Let s say, you are working with categorical feature(s) and you have not looked at the distribution of the categorical variable in the test data.You want to apply one hot encoding (OHE) on the categorical feature(s). What challenges you may face if you have applied OHE on a categorical variable of train dataset?

All categories of categorical variable are not present in the test dataset.

Frequency distribution of categories is different in train as compared to the test dataset.

Train and Test always have same distribution.

Let s say, you are working with categorical feature(s) and you have not looked at the distribution of the categorical variable in the test data. You want to apply one hot encoding (OHE) on the categorical feature(s). What challenges you may face if you have applied OHE on a categorical variable of train dataset?

all categories of categorical variable are not present in the test dataset.

frequency distribution of categories is different in train as compared to the test dataset.

train and test always have same distribution.

We have been given a dataset with n records in which we have input attribute as x and output attribute as y. Suppose we use a linear regression method to model this data. To test our linear regressor, we split the data in training set and test set randomly. What do you expect will happen with bias and variance as you increase the size of training data?

bias increases and variance increases

bias decreases and variance increases

bias decreases and variance decreases

bias increases and variance decreases

Assume that you are given a data set and a neural network model trained on the data set. Youare asked to build a decision tree model with the sole purpose of understanding/interpretingthe built neural network model. In such a scenario, which among the following measures wouldyou concentrate most on optimising?

comprehensibility of the decision tree model, measured in terms of the size of the corresponding rule set

accuracy of the decision tree model on the given data set

f1 measure of the decision tree model on the given data set

fidelity of the decision tree model, which is the fraction of instances on which the neural network and the decision tree give the same output

comprehensibility of the decision tree model, measured in terms of the size of the corresponding rule set

88 + Mcqs in Machine Learning (ML) in Computer Science Engineering (CSE) Page 1 McqOptions

1.	In given image, P(H) is probability.
A.	posterior
B.	prior
Answer» C.

Discussion

2.	Even if there are no actual supervisors learning is also based on feedback provided by the environment
A.	supervised
B.	reinforcement
C.	unsupervised
D.	none of the above
Answer» C. unsupervised

Discussion

3.	According to , it's a key success factor for the survival and evolution of all species.
A.	claude shannon s theory
B.	gini index
C.	darwin's theory
D.	none of above
Answer» D. none of above

Discussion

4.	overlearning causes due to an excessive .
A.	capacity
B.	regression
C.	reinforcement
D.	accuracy
Answer» B. regression

Discussion

5.	It's possible to specify if the scaling process must include both mean and standard deviation using the parameters .
A.	with_mean=tru e/false
B.	with_std=true/ false
C.	both a & b
D.	none of the mentioned
Answer» D. none of the mentioned

Discussion

6.	if there is only a discrete number of possible outcomes (called categories), the process becomes a .
A.	regression
B.	classification.
C.	modelfree
D.	categories
Answer» C. modelfree

Discussion

7.	Q25. Which of the following are advantages of stacking?1) More robust model2) better prediction3) Lower time of execution
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	all of the above
Answer» B. 2 and 3

Discussion

8.	can be adopted when it's necessary to categorize a large amount of data with a few complete examples or when there's the need to
A.	supervised
B.	semi- supervised
C.	reinforcement
D.	clusters
Answer» C. reinforcement

Discussion

9.	Which of the following statements are true for a design matrix X Rn d with d > n? (The rows are n sample points and the columns represent d features.)
A.	least-squares linear regression computes the weights w = (xtx) 1 xty
B.	the sample points are linearly separable
C.	x has exactly d n eigenvectors with eigenvalue zero
D.	at least one principal component direction is orthogonal to a hyperplane that contains all the sample points
Answer» E.

Discussion

10.	A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college. Here feature type is
A.	nominal
B.	ordinal
C.	categorical
D.	boolean
Answer» C. categorical

Discussion

11.	Which of the following properties are characteristic of decision trees?(a) High bias(b) High variance(c) Lack of smoothness of prediction surfaces(d) Unbounded parameter set
A.	a and b
B.	a and d
C.	b, c and d
D.	all of the above
Answer» D. all of the above

Discussion

12.	Which of the following sentences are correct in reference toInformation gain?a. It is biased towards single-valued attributesb. It is biased towards multi-valued attributesc. ID3 makes use of information gaind. The approact used by ID3 is greedy
A.	a and b
B.	a and d
C.	b, c and d
D.	all of the above
Answer» D. all of the above

Discussion

13.	Let S1 and S2 be the set of support vectors and w1 and w2 be the learnt weight vectors for a linearlyseparable problem using hard and soft margin linear SVMs respectively. Which of the following are correct?
A.	s1 s2
B.	s1 may not be a subset of s2
C.	w1 = w2
D.	all of the above
Answer» C. w1 = w2

Discussion

14.	Imagine, you are solving a classification problems with highly imbalanced class. The majority class is observed 99% of times in the training data. Your model has 99% accuracy after taking the predictions on test data. Which of the following is true in such a case?1. Accuracy metric is not a good idea for imbalanced class problems.2.Accuracy metric is a good idea for imbalanced class problems.3.Precision and recall metrics are good for imbalanced class problems.4.Precision and recall metrics aren t good for imbalanced class problems.
A.	1 and 3
B.	1 and 4
C.	2 and 3
D.	2 and 4
Answer» B. 1 and 4

Discussion

15.	What is/are true about kernel in SVM? 1. Kernel function map low dimensional data to high dimensional space 2. It s a similarity function
A.	1
B.	2
C.	1 and 2
D.	none of these
Answer» D. none of these

Discussion

16.	We usually use feature normalization before using the Gaussian kernel in SVM. What is true about feature normalization? 1. We do feature normalization so that new feature will dominate other2. Some times, feature normalization is not feasible in case of categorical variables3. Feature normalization always helps when we use Gaussian kernel in SVM
A.	1
B.	1 and 2
C.	1 and 3
D.	2 and 3
Answer» C. 1 and 3

Discussion

17.	We usually use feature normalization before using the Gaussian kernel in SVM. What is true about feature normalization?1.We do feature normalization so that new feature will dominate other2. Some times, feature normalization is not feasible in case of categorical variables3. Feature normalization always helps when we use Gaussian kernel in SVM
A.	1
B.	1 and 2
C.	1 and 3
D.	2 and 3
Answer» C. 1 and 3

Discussion

18.	which can accept a NumPy RandomState generator or an integer seed.
A.	make_blobs
B.	random_state
C.	test_size
D.	training_size
Answer» C. test_size

Discussion

19.	We usually use feature normalization before using the Gaussian kernel in SVM. What is true about feature normalization? 1. We do feature normalization so that new feature will dominate other 2. Some times, feature normalization is not feasible in case of categorical variables3. Feature normalization always helps when we use Gaussian kernel in SVM
A.	1
B.	1 and 2
C.	1 and 3
D.	2 and 3
Answer» C. 1 and 3

Discussion

20.	Suppose on performing reduced error pruning, we collapsed a node and observed an improvement in the prediction accuracy on the validation set.Which among the following statements are possible in light of the performance improvement observed?(a) The collapsed node helped overcome the effect of one or more noise affected data points in the training set(b) The validation set had one or more noise affected data points in the region corresponding to the collapsed node(c) The validation set did not have any data points along at least one of the collapsed branches(d) The validation set did have data points adversely affected by the collapsed node
A.	a and b
B.	a and d
C.	b, c and d
D.	all of the above
Answer» E.

Discussion

21.	Which of the following are correct statement(s) about stacking?A machine learning model is trained on predictions of multiple machine learning modelsA Logistic regression will definitely work better in the second stage as compared to other classification methodsFirst stage models are trained on full / partial feature space of training data
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	all of above
Answer» D. all of above

Discussion

22.	Which of the following can be true for selecting base learners for an ensemble?1. Different learners can come from same algorithm with different hyper parameters2. Different learners can come from different algorithms3. Different learners can come from different training spaces
A.	1
B.	2
C.	1 and 3
D.	1, 2 and 3
Answer» E.

Discussion

23.	Suppose you are using stacking with n different machine learning algorithms with k folds on data.Which of the following is true about one level (m base models + 1 stacker) stacking?Note:Here, we are working on binary classification problemAll base models are trained on all featuresYou are using k folds for base models
A.	you will have only k features after the first stage
B.	you will have only m features after the first stage
C.	you will have k+m features after the first stage
D.	you will have k*n features after the first stage
Answer» C. you will have k+m features after the first stage

Discussion

24.	Which of the following are correct statement(s) about stacking? 1. A machine learning model is trained on predictions of multiple machine learning models 2. A Logistic regression will definitely work better in the second stage as compared to other classification methods 3. First stage models are trained on full / partial feature space of training data
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	1,2 and 3
Answer» D. 1,2 and 3

Discussion

25.	Let s say, you are working with categorical feature(s) and you have not looked at the distribution of the categorical variable in the test data.You want to apply one hot encoding (OHE) on the categorical feature(s). What challenges you may face if you have applied OHE on a categorical variable of train dataset?
A.	All categories of categorical variable are not present in the test dataset.
B.	Frequency distribution of categories is different in train as compared to the test dataset.
C.	Train and Test always have same distribution.
D.	Both A and B
Answer» E.

Discussion

26.	Let s say, you are working with categorical feature(s) and you have not looked at the distribution of the categorical variable in the test data. You want to apply one hot encoding (OHE) on the categorical feature(s). What challenges you may face if you have applied OHE on a categorical variable of train dataset?
A.	all categories of categorical variable are not present in the test dataset.
B.	frequency distribution of categories is different in train as compared to the test dataset.
C.	train and test always have same distribution.
D.	both a and b
Answer» E.

Discussion

27.	which of the following cases will K-Means clustering give poor results?1. Data points with outliers2. Data points with different densities3. Data points with round shapes4. Data points with non-convex shapes
A.	1 and 2
B.	2 and 3
C.	2 and 4
D.	1, 2 and 4
Answer» D. 1, 2 and 4

Discussion

28.	In which of the following cases will K-Means clustering fail to give good results?1. Data points with outliers2. Data points with different densities3. Data points with round shapes4. Data points with non-convex shapes
A.	1 and 2
B.	2 and 3
C.	2 and 4
D.	1, 2 and 4
Answer» E.

Discussion

29.	What is/are true about ridge regression?1. When lambda is 0, model works like linear regression model2. When lambda is 0, model doesn t work like linear regression model3. When lambda goes to infinity, we get very, very small coefficients approaching 04. When lambda goes to infinity, we get very, very large coefficients approaching infinity
A.	1 and 3
B.	1 and 4
C.	2 and 3
D.	2 and 4
Answer» B. 1 and 4

Discussion

30.	In many classification problems, the target dataset is made up of categorical labels which cannot immediately be processed by any algorithm. An encoding is needed and scikit-learn offers at least valid options
A.	1
B.	2
C.	3
D.	4
Answer» C. 3

Discussion

31.	We have been given a dataset with n records in which we have input attribute as x and output attribute as y. Suppose we use a linear regression method to model this data. To test our linear regressor, we split the data in training set and test set randomly. Now we increase the training set size gradually. As the training set size increases, what do you expect will happen with the mean training error?
A.	increase
B.	decrease
C.	remain constant
D.	can t say
Answer» E.

Discussion

32.	What are the steps for using a gradient descent algorithm?1)Calculate error between the actual value and the predicted value2)Reiterate until you find the best weights of network3)Pass an input through the network and get values from output layer4)Initialize random weight and bias5)Go to each neurons which contributes to the error and change its respective values to reduce the error
A.	1, 2, 3, 4, 5
B.	4, 3, 1, 5, 2
C.	3, 2, 1, 5, 4
D.	5, 4, 3, 2, 1
Answer» C. 3, 2, 1, 5, 4

Discussion

33.	Regarding bias and variance, which of the following statements are true? (Here high and low are relative to the ideal model.(i) Models which overfit are more likely to have high bias(ii) Models which overfit are more likely to have low bias(iii) Models which overfit are more likely to have high variance(iv) Models which overfit are more likely to have low variance
A.	(i) and (ii)
B.	(ii) and (iii)
C.	(iii) and (iv)
D.	none of these
Answer» C. (iii) and (iv)

Discussion

34.	The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA? 1. PCA is an unsupervised method2. It searches for the directions that data have the largest variance3. Maximum number of principal components <= number of features4. All principal components are orthogonal to each other
A.	1 & 2
B.	2 & 3
C.	3 & 4
D.	all of the above
Answer» E.

Discussion

35.	When it is necessary to allow the model to develop a generalization ability and avoid a common problem called .
A.	overfitting
B.	overlearning
C.	classification
D.	regression
Answer» B. overlearning

Discussion

36.	What is back propagation? a) It is another name given to the curvy function in the perceptron b) It is the transmission of error back through the network to adjust the inputs c) It is the transmission of error back through the network to allow weights to be adjusted so that the network can learn d) None of the mentioned
A.	a
B.	b
C.	c
D.	b&c
Answer» D. b&c

Discussion

37.	Suppose, you want to apply a stepwise forward selection method for choosing the best models for an ensemble model. Which of the following is the correct order of the steps?Note: You have more than 1000 models predictions1. Add the models predictions (or in another term take the average) one by one in the ensemble which improves the metrics in the validation set.2. Start with empty ensemble3. Return the ensemble from the nested set of ensembles that has maximum performance on the validation set
A.	1-2-3
B.	1-3-4
C.	2-1-3
D.	none of above
Answer» E.

Discussion

38.	In many classification problems, the target dataset is made up of categorical labels which cannot immediately be processed by any algorithm. An encoding is needed and scikit-learn offers at least valid options
A.	1
B.	2
C.	3
D.	4
Answer» C. 3

Discussion

39.	Which of the following is true about weighted majority votes?1. We want to give higher weights to better performing models2. Inferior models can overrule the best model if collective weighted votes for inferior models is higher than best model3. Voting is special case of weighted voting
A.	1 and 3
B.	2 and 3
C.	1 and 2
D.	1, 2 and 3
Answer» E.

Discussion

40.	Given that we can select the same feature multiple times during the recursive partitioning ofthe input space, is it always possible to achieve 100% accuracy on the training data (giventhat we allow for trees to grow to their maximum size) when building decision trees?
A.	yes
B.	no
Answer» C.

Discussion

41.	Which of the following option is / are correct regarding benefits of ensemble model?1. Better performance2. Generalized models3. Better interpretability
A.	1 and 3
B.	2 and 3
C.	1 and 2
D.	1, 2 and 3
Answer» D. 1, 2 and 3

Discussion

42.	Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance2. Generalized models3. Better interpretability
A.	1 and 3
B.	2 and 3
C.	1, 2 and 3
D.	1 and 2
Answer» E.

Discussion

43.	We have been given a dataset with n records in which we have input attribute as x and output attribute as y. Suppose we use a linear regression method to model this data. To test our linear regressor, we split the data in training set and test set randomly. What do you expect will happen with bias and variance as you increase the size of training data?
A.	bias increases and variance increases
B.	bias decreases and variance increases
C.	bias decreases and variance decreases
D.	bias increases and variance decreases
Answer» E.

Discussion

44.	Which of the following statement is true about k-NN algorithm?1) k-NN performs much better if all of the data have the same scale2) k-NN works well with a small number of input variables (p), but struggles when the number of inputs is very large3) k-NN makes no assumptions about the functional form of the problem being solved
A.	1 and 2
B.	1 and 3
C.	only 1
D.	1,2 and 3
Answer» E.

Discussion

45.	Which of the following statement(s) can be true post adding a variable in a linear regression model?1. R-Squared and Adjusted R-squared both increase2. R-Squared increases and Adjusted R-squared decreases3. R-Squared decreases and Adjusted R-squared decreases4. R-Squared decreases and Adjusted R-squared increases
A.	1 and 2
B.	1 and 3
C.	2 and 4
D.	None of the above
Answer» B. 1 and 3

Discussion

46.	Assume that you are given a data set and a neural network model trained on the data set. Youare asked to build a decision tree model with the sole purpose of understanding/interpretingthe built neural network model. In such a scenario, which among the following measures wouldyou concentrate most on optimising?
A.	accuracy of the decision tree model on the given data set
B.	f1 measure of the decision tree model on the given data set
C.	fidelity of the decision tree model, which is the fraction of instances on which the neural network and the decision tree give the same output
D.	comprehensibility of the decision tree model, measured in terms of the size of the corresponding rule set
Answer» D. comprehensibility of the decision tree model, measured in terms of the size of the corresponding rule set

Discussion

47.	Which of the following metrics can be used for evaluating regression models?i) R Squaredii) Adjusted R Squarediii) F Statisticsiv) RMSE / MSE / MAE
A.	ii and iv
B.	i and ii
C.	ii, iii and iv
D.	i, ii, iii and iv
Answer» E.

Discussion

48.	In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much output variable will change?
A.	by 1
B.	no change
C.	by intercept
D.	by its slope
Answer» E.

Discussion

49.	Naive Bayes classifiers is Learning
A.	supervised
B.	unsupervised
C.	both
D.	none
Answer» B. unsupervised

Discussion

50.	In the last decade, many researchers started training bigger and bigger models, built with several different layers that's why this approach is called .
A.	deep learning
B.	machine learning
C.	reinforcement learning
D.	unsupervised learning
Answer» B. machine learning

Discussion

Explore topic-wise MCQs in Computer Science Engineering (CSE).

In given image, P(H) is probability.

Even if there are no actual supervisors learning is also based on feedback provided by the environment

According to , it's a key success factor for the survival and evolution of all species.

overlearning causes due to an excessive .

It's possible to specify if the scaling process must include both mean and standard deviation using the parameters .

if there is only a discrete number of possible outcomes (called categories), the process becomes a .

Q25. Which of the following are advantages of stacking?1) More robust model2) better prediction3) Lower time of execution

can be adopted when it's necessary to categorize a large amount of data with a few complete examples or when there's the need to

Which of the following statements are true for a design matrix X Rn d with d > n? (The rows are n sample points and the columns represent d features.)

A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college. Here feature type is

Which of the following properties are characteristic of decision trees?(a) High bias(b) High variance(c) Lack of smoothness of prediction surfaces(d) Unbounded parameter set

Which of the following sentences are correct in reference toInformation gain?a. It is biased towards single-valued attributesb. It is biased towards multi-valued attributesc. ID3 makes use of information gaind. The approact used by ID3 is greedy

Let S1 and S2 be the set of support vectors and w1 and w2 be the learnt weight vectors for a linearlyseparable problem using hard and soft margin linear SVMs respectively. Which of the following are correct?

What is/are true about kernel in SVM? 1. Kernel function map low dimensional data to high dimensional space 2. It s a similarity function

which can accept a NumPy RandomState generator or an integer seed.

Which of the following can be true for selecting base learners for an ensemble?1. Different learners can come from same algorithm with different hyper parameters2. Different learners can come from different algorithms3. Different learners can come from different training spaces

which of the following cases will K-Means clustering give poor results?1. Data points with outliers2. Data points with different densities3. Data points with round shapes4. Data points with non-convex shapes

In which of the following cases will K-Means clustering fail to give good results?1. Data points with outliers2. Data points with different densities3. Data points with round shapes4. Data points with non-convex shapes

In many classification problems, the target dataset is made up of categorical labels which cannot immediately be processed by any algorithm. An encoding is needed and scikit-learn offers at least valid options

When it is necessary to allow the model to develop a generalization ability and avoid a common problem called .

In many classification problems, the target dataset is made up of categorical labels which cannot immediately be processed by any algorithm. An encoding is needed and scikit-learn offers at least valid options

Which of the following is true about weighted majority votes?1. We want to give higher weights to better performing models2. Inferior models can overrule the best model if collective weighted votes for inferior models is higher than best model3. Voting is special case of weighted voting

Given that we can select the same feature multiple times during the recursive partitioning ofthe input space, is it always possible to achieve 100% accuracy on the training data (giventhat we allow for trees to grow to their maximum size) when building decision trees?

Which of the following option is / are correct regarding benefits of ensemble model?1. Better performance2. Generalized models3. Better interpretability

Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance2. Generalized models3. Better interpretability

Which of the following metrics can be used for evaluating regression models?i) R Squaredii) Adjusted R Squarediii) F Statisticsiv) RMSE / MSE / MAE

In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much output variable will change?

Naive Bayes classifiers is Learning

In the last decade, many researchers started training bigger and bigger models, built with several different layers that's why this approach is called .