One Pager Cheat Sheet
- By
reviewingcommon ML questionsandtopics, you canprepareyourself for a successful Machine Learning job interview. - AI is the broadest term of the three, covering everything related to making machines smart, while ML and DL are subsets of AI which can learn to make decisions and mimic the processing patterns of the human brain using
artificial neural networksrespectively. - Machine learning algorithms can be classified into three distinct categories:
Supervised,Unsupervised, andReinforcement. Classificationis used when the target variable is categorical, andRegressionis used when the target variable is continuous.- Supervised Learning models such as
Logistic Regression,Linear RegressionandDecision Treeas well asRandom Forestare used for classification and regression, while theK-means ClusteringandApriori Algorithmare used for Unsupervised Learning tasks. - Supervised learning algorithms such as
Logistic Regression,Linear Regression,Decision TreeandRandom Forestare used to model linear relationships between input and output variables. - The important steps involved in Machine Learning are
Data collection,Data preparation,Model selection,Model training,Model evaluation,Parameter tuningandMaking predictions. - Data is
cleaned,visualized,split, andpreparedin the data preparation step. - A
confusion matrixis a performance measurement for a classification problem that depicts the number oftrue positives,true negatives,false positives, andfalse negatives. - A
false negativeis when a negative prediction is actually positive, resulting in a misclassification of the positive instance. - The ROC Curve is used to plot the True Positive Rate (TPR) against the False Positive Rate (FPR) to determine the optimal threshold to separate the classes.
- We can evaluate our classification model's performance using popular metrics such as Precision, Recall, Accuracy, and AUC.
- We should use a validation set to check if our model is overfitting or underfitting.
- Overfitting occurs when a model learns too many details from the training data, resulting in poor generalization with unseen data.

