Mark As Completed Discussion

Multivariate Analysis

Multivariate analysis, 'Multi' meaning 'many', involves analysing multiple variables and their relationships. This type of analysis allows us to visualize what we would not be able to actually see i.e. we are not capable of viewing 4-D diagrams.

Multivariate Analysis

To carry out this type of analysis we will use Seaborn’s built in function PairPlot. The PairPlot allows us to observe relationships between all variables within our dataset.

PYTHON
1sns.pairplot(df,hue="species",size=3)

Multivariate Analysis

From our graphs we can see that we have all the variables of our data set along both the X and Y axis. Along the diagonal we see layered kernel density estimate (KDE) showing the distribution of each of our features.

If two features have high correlation with respect to the output we can drop one of them since it is better to use less features.

Observations

  1. petal_length and petal_width are the most useful features to identify various flower types.
  2. While Setosa can be easily identified (linearly separable), Versicolor and Virginica have some overlap (almost linearly separable)