# machine_learning_1.py

iris dataset in sklearn

Code source: GaĆ«l Varoquaux
Modified for documentation by Jaques Grobler
License: BSD 3 clause
Additional code and annotations: Clifton Callender
See original code without additional code and annotations here

```from sklearn import datasets
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D # matplotlib basic 3D plotting
from sklearn.decomposition import PCA # Principal Component Analysis```

import some data to play with

`iris = datasets.load_iris()`

iris dataset features

`X = iris.data[:,:2]  # we only take the first two features.`

iris dataset labels

`Y = iris.target`

boundaries for the x- and y-axes in the 2D plot below

```x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

plt.figure(1, figsize=(8, 6))```

Plot the training points
note the use of X[:, n] to get the nth column of the 2Darray

```plt.scatter(X[:, 0], X[:, 1], c=Y)
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
plt.show()```

To getter a better understanding of interaction of the dimensions
plot the first three PCA dimensions

```fig = plt.figure(2, figsize=(8, 6))
ax = Axes3D(fig, elev=-150, azim=110)```

`n_components=3` indicates to get the first three principal components

`pca = PCA(n_components=3).fit(iris.data)`

reduce the feature data from four to three dimensions

`X_reduced = PCA(n_components=3).fit_transform(iris.data)`

Create and label the 3D scatterplot

```ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=Y)
ax.set_title("First three PCA directions")
ax.set_xlabel("1st eigenvector")
ax.w_xaxis.set_ticklabels([])
ax.set_ylabel("2nd eigenvector")
ax.w_yaxis.set_ticklabels([])
ax.set_zlabel("3rd eigenvector")
ax.w_zaxis.set_ticklabels([])

plt.show()```

`pca.components_` expresses the principal components in terms of the original
feature space

```print("The vectors for three principal components, given in terms of the " \
"original 4D feature space, are:\n\n", pca.components_, "\n")```

`pca.explained_variance_` is the variance explained by each of the
principal components

```print("The variance explained by each of the principal components is:\n\n",
pca.explained_variance_, "\n")```

`pca.explained_variance_ratio_` expresses the variance explained as a ratio

```print("Variance explained expressed as a percentage:\n\n",
pca.explained_variance_ratio_)

```