One Pager Cheat Sheet
- Scaling numerical data by either normalizing or standardizing the features is important for many machine learning algorithms in order to account for differences in magnitude and unit.
- Normalization is the process of rescaling values into a common scale between a range of [0,1] or [-1,1] by using the MinMaxScaler transformer from the Scikit-Learn package.
- Normalization is a method of rescaling the values of a dataset so that they are within a given range, which is typically either between 0 and 1 or between -1 and 1, and is commonly referred to as Min-Max Scaling.
- Standardization, also known as Z-Score Normalization, involves rescaling the data based on standard normal distribution, and is performed using the
StandardScaler
transformer in Scikit-Learn. - Standardization is the process of calculating the
z-score
relative to the mean and standard deviation of the dataset. - When deciding between Normalization and Standardization, it depends on whether you know the distribution of your data or if you need an algorithm that makes assumptions about the distribution, such as the use of
k-nearest neighbors
orlinear regression
respectively. - No,
Normalization
should not be used when data has aGaussian distribution
, insteadStandardization
should be used for data with a Gaussian distribution where the algorithm makes assumptions about it. - Feature scaling is an important
preprocessing
step formachine learning
models, withNormalization
orStandardization
being the two main techniques available.