One Pager Cheat Sheet
- By
reviewing
common ML questions
andtopics
, you canprepare
yourself for a successful Machine Learning job interview. - AI is the broadest term of the three, covering everything related to making machines smart, while ML and DL are subsets of AI which can learn to make decisions and mimic the processing patterns of the human brain using
artificial neural networks
respectively. - Machine learning algorithms can be classified into three distinct categories:
Supervised
,Unsupervised
, andReinforcement
. Classification
is used when the target variable is categorical, andRegression
is used when the target variable is continuous.- Supervised Learning models such as
Logistic Regression
,Linear Regression
andDecision Tree
as well asRandom Forest
are used for classification and regression, while theK-means Clustering
andApriori Algorithm
are used for Unsupervised Learning tasks. - Supervised learning algorithms such as
Logistic Regression
,Linear Regression
,Decision Tree
andRandom Forest
are used to model linear relationships between input and output variables. - The important steps involved in Machine Learning are
Data collection
,Data preparation
,Model selection
,Model training
,Model evaluation
,Parameter tuning
andMaking predictions
. - Data is
cleaned
,visualized
,split
, andprepared
in the data preparation step. - A
confusion matrix
is a performance measurement for a classification problem that depicts the number oftrue positives
,true negatives
,false positives
, andfalse negatives
. - A
false negative
is when a negative prediction is actually positive, resulting in a misclassification of the positive instance. - The ROC Curve is used to plot the True Positive Rate (TPR) against the False Positive Rate (FPR) to determine the optimal threshold to separate the classes.
- We can evaluate our classification model's performance using popular metrics such as Precision, Recall, Accuracy, and AUC.
- We should use a validation set to check if our model is overfitting or underfitting.
- Overfitting occurs when a model learns too many details from the training data, resulting in poor generalization with unseen data.