The SVM’s are less effective when:

The data is clean and ready to use

The data is noisy and contains overlapping points

Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper parameter.What would happen when you use very small C (C~0)?

Data will be correctly classified

Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature selection?

Both use subset selection of features

Ridge regression uses subset selection of features

Lasso regression uses subset selection of features

Both use subset selection of features

which of the following step / assumption in regression modeling impacts the trade-off between under-fitting and over-fitting the most.

Whether we learn the weights by matrix inversion or gradient descent

In a linear regression problem, we are using “R-squared” to measure goodness-of-fit. We add a feature in linear regression model and retrain the same model.Which of the following option is true?

If R Squared increases, this variable is significant.

If R Squared decreases, this variable is not significant.

Individually R squared cannot tell about variable importance. We can’t say anything about it right now.

Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now, Which of the following statement is true?

You will always have test error zero

You can not have test error zero

Function used for linear regression in R is __________

regression.linear(formula, data)

If Linear regression model perfectly first i.e., train error is zero, then _____________________

Test error is equal to Train error

Couldn’t comment on Test error

Test error is equal to Train error

940 + Mcqs in Machine Learning in Artificial Intelligence Page 4 McqOptions

151.	Some people are using the term ___ instead of prediction only to avoid the weird idea that machine learning is a sort of modern magic.
A.	Inference
B.	Interference
C.	Accuracy
D.	None of above
Answer» B. Interference

Discussion

152.	If there is only a discrete number of possible outcomes called _____.
A.	Modelfree
B.	Categories
C.	Prediction
D.	None of above
Answer» C. Prediction

Discussion

153.	The SVM’s are less effective when:
A.	The data is linearly separable
B.	The data is clean and ready to use
C.	The data is noisy and contains overlapping points
Answer» D.

Discussion

154.	Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper parameter.What would happen when you use very small C (C~0)?
A.	Misclassification would happen
B.	Data will be correctly classified
C.	Can’t say
D.	None of these
Answer» B. Data will be correctly classified

Discussion

155.	Which of the following statement(s) can be true post adding a variable in a linear regression model?1. R-Squared and Adjusted R-squared both increase2. R-Squared increases and Adjusted R-squared decreases3. R-Squared decreases and Adjusted R-squared decreases4. R-Squared decreases and Adjusted R-squared increases
A.	1 and 2
B.	1 and 3
C.	2 and 4
D.	None of the above
Answer» B. 1 and 3

Discussion

156.	What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high dimensional space2. It’s a similarity function
A.	1
B.	2
C.	1 and 2
D.	None of these
Answer» D. None of these

Discussion

157.	Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature selection?
A.	Ridge regression uses subset selection of features
B.	Lasso regression uses subset selection of features
C.	Both use subset selection of features
D.	None of above
Answer» C. Both use subset selection of features

Discussion

158.	which of the following step / assumption in regression modeling impacts the trade-off between under-fitting and over-fitting the most.
A.	The polynomial degree
B.	Whether we learn the weights by matrix inversion or gradient descent
C.	The use of a constant-term
Answer» B. Whether we learn the weights by matrix inversion or gradient descent

Discussion

159.	To test linear relationship of y(dependent) and x(independent) continuous variables, which of the following plot best suited?
A.	Scatter plot
B.	Barchart
C.	Histograms
D.	None of these
Answer» B. Barchart

Discussion

160.	In a linear regression problem, we are using “R-squared” to measure goodness-of-fit. We add a feature in linear regression model and retrain the same model.Which of the following option is true?
A.	If R Squared increases, this variable is significant.
B.	If R Squared decreases, this variable is not significant.
C.	Individually R squared cannot tell about variable importance. We can’t say anything about it right now.
D.	None of these.
Answer» D. None of these.

Discussion

161.	Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now, Which of the following statement is true?
A.	You will always have test error zero
B.	You can not have test error zero
C.	None of the above
Answer» D.

Discussion

162.	______allows exploiting the natural sparsity of data while extracting principal components.
A.	SparsePCA
B.	KernelPCA
C.	SVD
D.	init parameter
Answer» B. KernelPCA

Discussion

163.	The_____ parameter can assume different values which determine how the data matrix is initially processed.
A.	run
B.	start
C.	init
D.	stop
Answer» D. stop

Discussion

164.	In order to assess how much information is brought by each component, and the correlation among them, a useful tool is the_____.
A.	Concuttent matrix
B.	Convergance matrix
C.	Supportive matrix
D.	Covariance matrix
Answer» E.

Discussion

165.	______dataset with many features contains information proportional to the independence of all features and their variance.
A.	normalized
B.	unnormalized
C.	Both A & B
D.	None of the Mentioned
Answer» C. Both A & B

Discussion

166.	scikit-learn also provides a class for per-sample normalization,_____
A.	Normalizer
B.	Imputer
C.	Classifier
D.	All above
Answer» B. Imputer

Discussion

167.	scikit-learn offers the class______, which is responsible for filling the holes using a strategy based on the mean, median, or frequency
A.	LabelEncoder
B.	LabelBinarizer
C.	DictVectorizer
D.	Imputer
Answer» E.

Discussion

168.	_______produce sparse matrices of real numbers that can be fed into any machine learning model.
A.	DictVectorizer
B.	FeatureHasher
C.	Both A & B
D.	None of the Mentioned
Answer» D. None of the Mentioned

Discussion

169.	While using _____ all labels areturned into sequential numbers.
A.	LabelEncoder class
B.	LabelBinarizer class
C.	DictVectorizer
D.	FeatureHasher
Answer» B. LabelBinarizer class

Discussion

170.	_____ provides some built-in datasets that can be used for testing purposes.
A.	scikit-learn
B.	classification
C.	regression
D.	None of the above
Answer» B. classification

Discussion

171.	Which of the following are several models for feature extraction
A.	regression
B.	classification
C.	None of the above
Answer» D.

Discussion

172.	overlearning causes due to an excessive ______.
A.	Capacity
B.	Regression
C.	Reinforcement
D.	Accuracy
Answer» B. Regression

Discussion

173.	A supervised scenario is characterized by the concept of a _____.
A.	Programmer
B.	Teacher
C.	Author
D.	Farmer
Answer» C. Author

Discussion

174.	Techniques involve the usage of both labeled and unlabeled data is called___.
A.	Supervised
B.	Semi-supervised
C.	Unsupervised
D.	None of the above
Answer» C. Unsupervised

Discussion

175.	When it is necessary to allow the model to develop a generalization ability and avoid a common problem called______.
A.	Overfitting
B.	Overlearning
C.	Classification
D.	Regression
Answer» B. Overlearning

Discussion

176.	Even if there are no actual supervisors ________ learning is also based on feedback provided by the environment
A.	Supervised
B.	Reinforcement
C.	Unsupervised
D.	None of the above
Answer» C. Unsupervised

Discussion

177.	The linear SVM classifier works by drawing a straight line between two classes
A.	True
B.	false
Answer» B. false

Discussion

178.	SVM is a ------------------ learning
A.	Supervised
B.	Unsupervised
C.	Both
D.	None
Answer» B. Unsupervised

Discussion

179.	SVM is a ------------------ algorithm
A.	Classification
B.	Clustering
C.	Regression
D.	All
Answer» B. Clustering

Discussion

180.	Gaussian distribution when plotted, gives a bell shaped curve which is symmetric about the _______ of the feature values.
A.	Mean
B.	Variance
C.	Discrete
D.	Random
Answer» B. Variance

Discussion

181.	Gaussian Naïve Bayes Classifier is ___________distribution
A.	Continuous
B.	Discrete
C.	Binary
Answer» B. Discrete

Discussion

182.	Multinomial Naïve Bayes Classifier is ___________distribution
A.	Continuous
B.	Discrete
C.	Binary
Answer» C. Binary

Discussion

183.	Bernoulli Naïve Bayes Classifier is ___________distribution
A.	Continuous
B.	Discrete
C.	Binary
Answer» D.

Discussion

184.	Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
A.	True
B.	false
Answer» B. false

Discussion

185.	Conditional probability is a measure of the probability of an event given that another event has already occurred.
A.	True
B.	false
Answer» B. false

Discussion

186.	Features being classified is __________ of each other in Naïve Bayes Classifier
A.	Independent
B.	Dependent
C.	Partial Dependent
D.	None
Answer» B. Dependent

Discussion

187.	Features being classified is independent of each other in Naïve Bayes Classifier
A.	False
B.	true
Answer» C.

Discussion

188.	Naive Bayes classifiers is _______________ Learning
A.	Supervised
B.	Unsupervised
C.	Both
D.	None
Answer» B. Unsupervised

Discussion

189.	Naive Bayes classifiers are a collection ------------------of algorithms
A.	Classification
B.	Clustering
C.	Regression
D.	All
Answer» B. Clustering

Discussion

190.	In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to __________
A.	(X-intercept, Slope)
B.	(Slope, X-Intercept)
C.	(Y-Intercept, Slope)
D.	(slope, Y-Intercept)
Answer» D. (slope, Y-Intercept)

Discussion

191.	In syntax of linear model lm(formula,data,..), data refers to ______
A.	Matrix
B.	Vector
C.	Array
D.	List
Answer» C. Array

Discussion

192.	Function used for linear regression in R is __________
A.	lm(formula, data)
B.	lr(formula, data)
C.	lrm(formula, data)
D.	regression.linear(formula, data)
Answer» B. lr(formula, data)

Discussion

193.	If Linear regression model perfectly first i.e., train error is zero, then _____________________
A.	Test error is also always zero
B.	Test error is non zero
C.	Couldn’t comment on Test error
D.	Test error is equal to Train error
Answer» D. Test error is equal to Train error

Discussion

194.	In many classification problems, the target ______ is made up of categorical labels which cannot immediately be processed by any algorithm.
A.	random_state
B.	dataset
C.	test_size
D.	All above
Answer» C. test_size

Discussion

195.	_______adopts a dictionary-oriented approach, associating to each category label a progressive integer number.
A.	LabelEncoder class
B.	LabelBinarizer class
C.	DictVectorizer
D.	FeatureHasher
Answer» B. LabelBinarizer class

Discussion

196.	The parameter______ allows specifying the percentage of elements to put into the test/training set
A.	test_size
B.	training_size
C.	All above
D.	None of these
Answer» D. None of these

Discussion

197.	The parameter can assume different values which determine how the data matrix is initially processed.
A.	run
B.	start
C.	init
D.	stop
Answer» D. stop

Discussion

198.	________performs a PCA with non-linearly separable data sets.
A.	SparsePCA
B.	KernelPCA
C.	SVD
D.	None of the Mentioned
Answer» C. SVD

Discussion

199.	There are also many univariate methods that can be used in order to select the best features according to specific criteria based on________.
A.	F-tests and p-values
B.	chi-square
C.	ANOVA
D.	All above
Answer» B. chi-square

Discussion

200.	If you need a more powerful scaling feature, with a superior control on outliers and the possibility to select a quantile range, there's also the class________.
A.	RobustScaler
B.	DictVectorizer
C.	LabelBinarizer
D.	FeatureHasher
Answer» B. DictVectorizer

Discussion

Explore topic-wise MCQs in Artificial Intelligence.

Some people are using the term ___ instead of prediction only to avoid the weird idea that machine learning is a sort of modern magic.

If there is only a discrete number of possible outcomes called _____.

The SVM’s are less effective when:

What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high dimensional space2. It’s a similarity function

Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature selection?

which of the following step / assumption in regression modeling impacts the trade-off between under-fitting and over-fitting the most.

To test linear relationship of y(dependent) and x(independent) continuous variables, which of the following plot best suited?

In a linear regression problem, we are using “R-squared” to measure goodness-of-fit. We add a feature in linear regression model and retrain the same model.Which of the following option is true?

Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now, Which of the following statement is true?

______allows exploiting the natural sparsity of data while extracting principal components.

The_____ parameter can assume different values which determine how the data matrix is initially processed.

In order to assess how much information is brought by each component, and the correlation among them, a useful tool is the_____.

______dataset with many features contains information proportional to the independence of all features and their variance.

scikit-learn also provides a class for per-sample normalization,_____

scikit-learn offers the class______, which is responsible for filling the holes using a strategy based on the mean, median, or frequency

_______produce sparse matrices of real numbers that can be fed into any machine learning model.

While using _____ all labels areturned into sequential numbers.

_____ provides some built-in datasets that can be used for testing purposes.

Which of the following are several models for feature extraction

overlearning causes due to an excessive ______.

A supervised scenario is characterized by the concept of a _____.

Techniques involve the usage of both labeled and unlabeled data is called___.

When it is necessary to allow the model to develop a generalization ability and avoid a common problem called______.

Even if there are no actual supervisors ________ learning is also based on feedback provided by the environment

The linear SVM classifier works by drawing a straight line between two classes

SVM is a ------------------ learning

SVM is a ------------------ algorithm

Gaussian distribution when plotted, gives a bell shaped curve which is symmetric about the _______ of the feature values.

Gaussian Naïve Bayes Classifier is ___________distribution

Multinomial Naïve Bayes Classifier is ___________distribution

Bernoulli Naïve Bayes Classifier is ___________distribution

Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Conditional probability is a measure of the probability of an event given that another event has already occurred.

Features being classified is __________ of each other in Naïve Bayes Classifier

Features being classified is independent of each other in Naïve Bayes Classifier

Naive Bayes classifiers is _______________ Learning

Naive Bayes classifiers are a collection ------------------of algorithms

In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to __________

In syntax of linear model lm(formula,data,..), data refers to ______

Function used for linear regression in R is __________

If Linear regression model perfectly first i.e., train error is zero, then _____________________

In many classification problems, the target ______ is made up of categorical labels which cannot immediately be processed by any algorithm.

_______adopts a dictionary-oriented approach, associating to each category label a progressive integer number.

The parameter______ allows specifying the percentage of elements to put into the test/training set

The parameter can assume different values which determine how the data matrix is initially processed.

________performs a PCA with non-linearly separable data sets.

There are also many univariate methods that can be used in order to select the best features according to specific criteria based on________.

If you need a more powerful scaling feature, with a superior control on outliers and the possibility to select a quantile range, there's also the class________.