One Pager Cheat Sheet
- This tutorial will provide you with the necessary knowledge to answer entry-level data science interview questions, as well as tips on understanding concepts, narrowing answer choices and getting feedback when needed.
- A
p-value
of ≤ 0.05 indicates that the results of an experiment are highly unlikely to have occurred by random chance, allowing the null hypothesis to be rejected and the alternate hypothesis to be accepted. - Differencing a time series is an effective way to
remove
its seasonality by reducing its trend-like component and emphasizing its seasonal one. - The performance of a
binary classification model
is measured by two measures, Sensitivity (true positive rate) and Specificity (true negative rate) which are inversely proportional to each other. - The accuracy of this model can be calculated by taking the sum of the true positives and true negatives and dividing by the total number of observations (
15
), resulting in an accuracy of75%
. - The model's accuracy was calculated as 63.8% and its Area Under the Curve (AUC) was used to measure the model's ability to discriminate between two classes, with a value of 1.0 indicating a perfect classifier.
- The model predictions often differ from the actual results seen in a
real-world
environment, due to the possibility ofoverfitting
orunderfitting
leading to inaccurate predictions. - The
ROC
plot displays the trade-off between the true positive rate (TPR) and false positive rate (FPR) of a binary classifier system. - It is not necessary to manually create the immutable array index when creating a
DataFrame
inPandas
since it is automatically generated. - The average mean of the responses from your three friends is 3.33, calculated by adding the responses and dividing by the total number, with the resulting
variance
of the sample being 2.33. - The goodness of fit in simple linear regression measures how
closely
the observed data pointsmatch
the model's predicted values, with a value of 1 indicating a perfect fit. - The means of a Poisson and an exponential distribution can never be equal since the Poisson mean corresponds to a whole number, while the exponential mean is determined by an equation involving the
rate parameter λ
. - K-means clustering is a
machine learning
algorithm used for clustering data points into groups, but it does not actually reduce the number ofdimensions
. - Data scientists play an essential role in managing and analyzing data, requiring a deep understanding of mathematics, statistics, and computer programming to effectively
develop statistical models
,numerical simulations
,data extraction processes
, andvisualizations
to help explain the data. - The
in
method returns a boolean value,True
orFalse
, which is essential for data scientists to analyze data sets and act on the presence of certain elements. - The
in
method in Python returnsTrue
orFalse
, but cannot determine the probability of an event - in this case, rolling a certain number on a single dice is 1/6 for all given dices. - Predictive modeling in
Machine Learning
uses features to make predictions, as opposed to labels, which are the actual classifications or outcomes being predicted. - Using the
negative index
and theslicing technique
, the fourth item from the end of thetuple
inPython
can be returned.