한국소성가공학회 전문교육

Part 1: 인공지능 기초

Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Machine Learning and Deep Learning¶

1.1. Taxonomy of AI¶

1.2. Scikit Learn¶

Machine Learning in Python
Simple and efficient tools for data mining and data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Open source, commercially usable - BSD license
https://scikit-learn.org/stable/index.html

1.3. Supervised Learning¶

Given training set $\left\{ \left(x^{(1)}, y^{(1)}\right), \left(x^{(2)}, y^{(2)}\right),\cdots,\left(x^{(m)}, y^{(m)}\right) \right\}$

Want to find a function $f_{\omega}$ with learning parameter, $\omega$
- $f_{\omega}$ desired to be as close as possible to $y$ for future $(x,y)$
- $i.e., f_{\omega}(x) \approx y$

Define a loss function

$$\ell \left(f_{\omega} \left(x^{(i)}\right), y^{(i)}\right)$$

Solve the following optimization problem:

$$ \begin{align*} \text{minimize} &\quad \frac{1}{m} \sum_{i=1}^{m} \ell \left(f_{\omega} \left(x^{(i)}\right), y^{(i)}\right)\\ \text{subject to} &\quad \omega \in \boldsymbol{\omega} \end{align*} $$

Function approximation between inputs and outputs

Once it is learned,

2. Regression¶

A set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables

2.1. Linear Regression¶

2.2. Multivariate Linear Regression¶

2.3. Nonlinear Regression¶

2.4. Feature Selection¶

Multivariate regression

$$ \hat{y} = \theta_0 + \theta_{1}x_1 + \theta_{2}x_2 + \theta_{3}x_3 + \cdots $$

The process of selecting a subset of relevant features (variables, predictors) for use in model construction.

Feature selection techniques are used for several reasons:
- simplification of models to make them easier to interpret,
- shorter training times,
- to avoid the curse of dimensionality,
- improve data's compatibility with a learning model class,
- encode inherent symmetries present in the input space.

2.5. Correlation Coefficient¶

$+1 \to$ close to a straight line
$-1 \to$ close to a straight line
Indicate how close to a linear line, but
No information on slope

$$0 \leq \left\lvert \text{ correlation coefficient } \right\rvert \leq 1$$$$\hspace{1cm}\begin{array}{Icr}\leftarrow\\ (\text{uncorrelated})\end{array} \quad \quad \quad \begin{array}{Icr}\rightarrow \\ (\text{linearly correlated})\end{array}$$

Does not tell anything about causality

2.6. Correlation Coefficient Plot¶

Plots correlation coefficients among pairs of variables
http://rpsychologist.com/d3/correlation/

2.7. Python¶

2.7.1. Linear Regression¶

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# data points in column vector [input, output]
x = np.array([0.1, 0.4, 0.7, 1.2, 1.3, 1.7, 2.2, 2.8, 3.0, 4.0, 4.3, 4.4, 4.9]).reshape(-1, 1)
y = np.array([0.5, 0.9, 1.1, 1.5, 1.5, 2.0, 2.2, 2.8, 2.7, 3.0, 3.5, 3.7, 3.9]).reshape(-1, 1)

# to plot
plt.figure(figsize=(10, 6))
plt.title('Linear Regression', fontsize=15)
plt.xlabel('X', fontsize=15)
plt.ylabel('Y', fontsize=15)
plt.plot(x, y, 'ko', label="data")
plt.xlim([0, 5])
plt.grid(alpha=0.3)
plt.axis('scaled')
plt.show()

from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(x,y)

LinearRegression()

print(reg.coef_)       # Coef
print(reg.intercept_)  # Bias

[[0.67129519]]
[0.65306531]

# to plot
plt.figure(figsize=(10, 6))
plt.title('Linear Regression', fontsize=15)
plt.xlabel('X', fontsize=15)
plt.ylabel('Y', fontsize=15)
plt.plot(x, y, 'ko', label="data")

# to plot a straight line (fitted line)
xp = np.arange(0, 5, 0.01).reshape(-1, 1)
yp = reg.coef_*xp + reg.intercept_

plt.plot(xp, yp, 'r', linewidth=2, label="$L_2$")
plt.legend(fontsize=15)
plt.axis('scaled')
plt.grid(alpha=0.3)
plt.xlim([0, 5])
plt.show()

2.7.2. Nonlinear Regression¶

n = 100            
x = -5 + 15*np.random.rand(n, 1)
noise = 10*np.random.randn(n, 1)
y = 10 + 1*x + 2*x**2 + noise

plt.figure(figsize=(10, 6))
plt.title('Nonlinear Regression', fontsize=15)
plt.xlabel('X', fontsize=15)
plt.ylabel('Y', fontsize=15)
plt.plot(x, y, 'o', markersize=4, label='actual')
plt.xlim([np.min(x), np.max(x)])
plt.grid(alpha=0.3)
plt.legend(fontsize=15)
plt.show()

from sklearn.kernel_ridge import KernelRidge

reg = KernelRidge(kernel='rbf', gamma=0.1)
reg.fit(x, y)

KernelRidge(gamma=0.1, kernel='rbf')

p = reg.predict(x)

plt.figure(figsize=(10, 6))
plt.title('Nonlinear Regression', fontsize=15)
plt.xlabel('X', fontsize=15)
plt.ylabel('Y', fontsize=15)
plt.plot(x, y, 'o', markersize=4, label='actual')
plt.plot(x, p, 'ro', markersize=4, label='predict')
plt.grid(alpha=0.3)
plt.legend(fontsize=15)
plt.xlim([np.min(x), np.max(x)])
plt.show()

3. Classification¶

where $y$ is a discrete value
- develop the classification algorithm to determine which class a new input should fall into

To find a classification boundary

We will learn
- Support Vector Machine (SVM)
- Logistic Regression

3.1. Linear Classification¶

3.2. Non-linear Classification¶

3.3. Python¶

3.3.1. SVM¶

x1 = 8*np.random.rand(100, 1)
x2 = 7*np.random.rand(100, 1) - 4

g0 = 0.8*x1 + x2 - 3
g1 = g0 - 1
g2 = g0 + 1

C1 = np.where(g1 >= 0)[0]
C2 = np.where(g2 < 0)[0]

X1 = np.hstack([x1[C1],x2[C1]])
X2 = np.hstack([x1[C2],x2[C2]])
n = X1.shape[0]
m = X2.shape[0]
X = np.vstack([X1, X2])
y = np.vstack([np.zeros([n, 1]), np.ones([m, 1])])

plt.figure(figsize=(10, 6))
plt.plot(x1[C1], x2[C1], 'ro', label='C1')
plt.plot(x1[C2], x2[C2], 'bo', label='C2')
plt.xlabel('$x_1$', fontsize = 20)
plt.ylabel('$x_2$', fontsize = 20)
plt.legend(loc = 4)
plt.xlim([0, 8])
plt.ylim([-4, 3])
plt.show()

from sklearn.svm import SVC

clf = SVC(kernel='linear')
clf.fit(X, np.ravel(y))

SVC(kernel='linear')

print(clf.coef_)
print(clf.intercept_)

[[-0.76177006 -0.93761421]]
[2.85068138]

xp = np.linspace(0,8,100).reshape(-1,1)
yp = -clf.coef_[0,0]/clf.coef_[0,1]*xp - clf.intercept_/clf.coef_[0,1]

plt.figure(figsize=(10, 6))
plt.plot(X[0:n, 0], X[0:n, 1], 'ro', label='C1')
plt.plot(X[n:-1, 0], X[n:-1, 1], 'bo', label='C2')
plt.plot(xp, yp, '--k', label='SVM')
plt.xlabel('$x_1$', fontsize = 20)
plt.ylabel('$x_2$', fontsize = 20)
plt.legend(loc = 4)
plt.xlim([0, 8])
plt.ylim([-4, 3])
plt.show()

3.3.2. Logistic Regression¶

m = 500

X0 = np.random.multivariate_normal([0, 0], np.eye(2), m)
X1 = np.random.multivariate_normal([10, 10], np.eye(2), m)

X = np.vstack([X0, X1])
y = np.vstack([np.zeros([m,1]), np.ones([m,1])])

plt.figure(figsize=(10, 6))
plt.plot(X0[:,0], X0[:,1], '.b', label='Class 0')
plt.plot(X1[:,0], X1[:,1], '.k', label='Class 1')

plt.title('Data Classes', fontsize=15)
plt.legend(loc='lower right', fontsize=15)
plt.xlabel('X1', fontsize=15)
plt.ylabel('X2', fontsize=15)
plt.xlim([-10,20])
plt.ylim([-4,14])
plt.grid(alpha=0.3)
plt.show()

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()
clf.fit(X, np.ravel(y))

LogisticRegression()

print(clf.coef_)
print(clf.intercept_)

[[0.93938153 0.90472328]]
[-9.3291099]

xp = np.linspace(-10,20,100).reshape(-1,1)
yp = -clf.coef_[0,0]/clf.coef_[0,1]*xp - clf.intercept_/clf.coef_[0,1]

plt.figure(figsize=(10, 6))
plt.plot(X0[:,0], X0[:,1], '.b', label='Class 0')
plt.plot(X1[:,0], X1[:,1], '.k', label='Class 1')
plt.plot(xp, yp, '--k', label='Logistic')
plt.xlim([-10,20])
plt.ylim([-4,14])

plt.title('Data Classes', fontsize=15)
plt.legend(loc='lower right', fontsize=15)
plt.xlabel('X1', fontsize=15)
plt.ylabel('X2', fontsize=15)
plt.grid(alpha=0.3)
plt.show()

pred = clf.predict_proba([[0,6]])
pred

array([[0.98017467, 0.01982533]])

4. Steps for Machine Learning¶

4.1. Model Evaluation¶

Adding more features will always decrease the loss

How do we determine when an algorithm achieves “good” performance?

A better criterion:
- Training set (e.g., 70 %)
- Testing set (e.g., 30 %)

Performance on testing set called generalization performance

4.2. Supervised Learning¶

Workflow

Workflow in more detail

5. Supervised Learning vs. Unsupervised Learning¶

6. Clustering¶

Data clustering is an unsupervised learning problem
Given:
- $m$ unlabeled examples $\{x^{(1)},x^{(2)}\cdots, x^{(m)}\}$
- the number of partitions $k$

Goal: group the examples into $k$ partitions

$$\{x^{(1)},x^{(2)},\cdots,x^{(m)}\} \quad \Rightarrow \quad \text{Clustering}$$

6.1. K-means¶

6.2. Python¶

m = 200

X0 = np.random.multivariate_normal([-1, 1], np.eye(2), m)
X1 = np.random.multivariate_normal([15, 10], np.eye(2), m)
X2 = np.random.multivariate_normal([0, 6], np.eye(2), m)
X = np.vstack([X0, X1, X2])

plt.figure(figsize=(10, 6))
plt.plot(X[:,0], X[:,1], '.b')

plt.xlim([-10,20])
plt.ylim([-4,14])
plt.grid(alpha=0.3)
plt.show()

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters = 3, random_state = 0)
kmeans.fit(X)

KMeans(n_clusters=3, random_state=0)

print(kmeans.labels_)

[2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0]

plt.figure(figsize=(10,6))

plt.plot(X[kmeans.labels_ == 0,0],X[kmeans.labels_ == 0,1],'g.', label=0)
plt.plot(X[kmeans.labels_ == 1,0],X[kmeans.labels_ == 1,1],'k.', label=1)
plt.plot(X[kmeans.labels_ == 2,0],X[kmeans.labels_ == 2,1],'r.', label=2)

plt.xlim([-10,20])
plt.ylim([-4,14])
plt.grid(alpha=0.3)
plt.legend(loc='lower right', fontsize=15)
plt.show()

7. Dimension Reduction¶

Why dimensionality reduction?
- insights into the low-dimensinal structures in the data (visualization)
- Fewer dimensions ⇒ Less chances of overfitting ⇒ Better generalization
- Speeding up learning algorithms
  - Most algorithms scale badly with increasing data dimensionality
- Less storage requirements (data compression)
- Note: Dimensionality Reduction is different from Feature Selection
  - .. although the goals are kind of the same
- Dimensionality reduction is more like “Feature Extraction”
  - Constructing a small set of new features from the original features

How?
- idea: highly correlated data contains redundant features

7.1. Principal Component Analysis (PCA)¶

Each example $x$ has 2 features $\{x_1,x_2\}$
Consider ignoring the feature $x_2$ for each example
Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$
Are we losing much information by throwing away $x_2$ ?
No. Most of the data spread is along 𝑥_1 (very little variance along 𝑥_2)

Each example $x$ has 2 features $\{x_1,x_2\}$
Consider ignoring the feature $x_2$ for each example
Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$
Are we losing much information by throwing away $x_2$ ?
Yes, the data has substantial variance along both features (i.e. both axes)

Now consider a change of axes
Each example $x$ has 2 features $\{u_1,u_2\}$
Consider ignoring the feature $u_2$ for each example
Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{u_1\}$
Are we losing much information by throwing away $u_2$ ?
No. Most of the data spread is along $u_1$ (very little variance along $u_2$)

Data $\rightarrow$ projection onto unit vector $\hat{u}_1$
- PCA is used when we want projections capturing maximum variance directions
- Principal Components (PC): directions of maximum variability in the data
- Roughly speaking, PCA does a change of axes that can represent the data in a succinct manner

m = 5000
mu = np.array([0, 0])
sigma = np.array([[3, 1.5], 
                  [1.5, 1]])

X = np.random.multivariate_normal(mu, sigma, m)

fig = plt.figure(figsize=(10, 6))
plt.plot(X[:,0], X[:,1], 'k.')
plt.axis('equal')
plt.show()

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(X)

PCA(n_components=2)

plt.figure()
plt.stem(range(1,3),pca.explained_variance_ratio_)

plt.xlim([0.5, 2.5])
plt.ylim([0, 1])
plt.title('Score (%)')
plt.show()

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:2: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.

principal_axis = pca.components_[0, :]
u1 = principal_axis/(np.linalg.norm(principal_axis)) 
h = u1[1]/u1[0]

xp = np.linspace(-6,6,200)
yp = xp.dot(h)

plt.figure(figsize=(10,6))
plt.plot(X[:, 0], X[:, 1],'k.')
plt.plot(xp, yp, 'r.')
plt.axis('equal')
plt.show()

8. Decision Tree¶

8.1. Decision Tree for Classification¶

from sklearn import tree

data = np.array([[0, 0, 1, 0, 0],
                [1, 0, 2, 0, 0],
                [0, 1, 2, 0, 1],
                [2, 1, 0, 2, 1],
                [0, 1, 0, 1, 1],
                [1, 1, 1, 2, 0],
                [1, 1, 0, 2, 0],
                [0, 0, 2, 1, 0]])      

x = data[:,0:4]
y = data[:,4]
print(x, '\n')
print(y)

[[0 0 1 0]
 [1 0 2 0]
 [0 1 2 0]
 [2 1 0 2]
 [0 1 0 1]
 [1 1 1 2]
 [1 1 0 2]
 [0 0 2 1]] 

[0 0 1 1 1 0 0 0]

clf = tree.DecisionTreeClassifier(criterion = 'entropy', max_depth = 3, random_state=0)
clf.fit(x,y)

DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0)

clf.predict([[0, 0, 1, 0]])

array([0])

8.2. Nonlinear Classification¶

X1 = np.array([[-1.1,0],[-0.3,0.1],[-0.9,1],[0.8,0.4],[0.4,0.9],[0.3,-0.6],
               [-0.5,0.3],[-0.8,0.6],[-0.5,-0.5]])
     
X0 = np.array([[-1,-1.3], [-1.6,2.2],[0.9,-0.7],[1.6,0.5],[1.8,-1.1],[1.6,1.6],
               [-1.6,-1.7],[-1.4,1.8],[1.6,-0.9],[0,-1.6],[0.3,1.7],[-1.6,0],[-2.1,0.2]])

X1 = np.asmatrix(X1)
X0 = np.asmatrix(X0)

plt.figure(figsize=(10, 8))
plt.plot(X1[:,0], X1[:,1], 'ro', label = 'C1')
plt.plot(X0[:,0], X0[:,1], 'bo', label = 'C0')
plt.title('SVM for Nonlinear Data', fontsize = 15)
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 12)
plt.axis('equal')
plt.show()

N = X1.shape[0]
M = X0.shape[0]

X = np.vstack([X1, X0])
y = np.vstack([np.ones([N,1]), np.zeros([M,1])])

clf = tree.DecisionTreeClassifier(criterion = 'entropy', max_depth = 4, random_state=0)
clf.fit(X,y)

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py:598: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
  FutureWarning,

DecisionTreeClassifier(criterion='entropy', max_depth=4, random_state=0)

clf.predict([[0, 1]])

array([1.])

# to plot
[X1gr, X2gr] = np.meshgrid(np.arange(-3,3,0.1), np.arange(-3,3,0.1))

Xp = np.hstack([X1gr.reshape(-1,1), X2gr.reshape(-1,1)])
Xp = np.asmatrix(Xp)

q = clf.predict(Xp)
q = np.asmatrix(q).reshape(-1,1)

C1 = np.where(q == 1)[0]

plt.figure(figsize = (10, 8))
plt.plot(X1[:,0], X1[:,1], 'ro', label = 'C1')
plt.plot(X0[:,0], X0[:,1], 'bo', label = 'C0')
plt.plot(Xp[C1,0], Xp[C1,1], 'gs', markersize = 8, alpha = 0.1, label = 'Decison Tree')
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 12)
plt.axis('equal')
plt.show()

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py:598: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
  FutureWarning,

8.3. Multiclass Classification¶

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## generate three simulated clusters
mu1 = np.array([1, 7])
mu2 = np.array([3, 4])
mu3 = np.array([6, 5])

SIGMA1 = 0.8*np.array([[1, 1.5],
                       [1.5, 3]])
SIGMA2 = 0.5*np.array([[2, 0],
                       [0, 2]])
SIGMA3 = 0.5*np.array([[1, -1],
                       [-1, 2]])

X1 = np.random.multivariate_normal(mu1, SIGMA1, 100)
X2 = np.random.multivariate_normal(mu2, SIGMA2, 100)
X3 = np.random.multivariate_normal(mu3, SIGMA3, 100)

y1 = 1*np.ones([100,1])
y2 = 2*np.ones([100,1])
y3 = 3*np.ones([100,1])

plt.figure(figsize = (10, 8))
plt.title('Generated Data', fontsize = 15)
plt.plot(X1[:,0], X1[:,1], '.', label = 'C1')
plt.plot(X2[:,0], X2[:,1], '.', label = 'C2')
plt.plot(X3[:,0], X3[:,1], '.', label = 'C3')
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 0, 12])
plt.show()

X = np.vstack([X1, X2, X3])
y = np.vstack([y1, y2, y3])

clf = tree.DecisionTreeClassifier(criterion = 'entropy', max_depth = 3, random_state = 0)
clf.fit(X,y)

DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0)

res = 0.3
[X1gr, X2gr] = np.meshgrid(np.arange(-2,10,res), np.arange(0,12,res))

Xp = np.hstack([X1gr.reshape(-1,1), X2gr.reshape(-1,1)])
Xp = np.asmatrix(Xp)

q = clf.predict(Xp)
q = np.asmatrix(q).reshape(-1,1)

C1 = np.where(q == 1)[0]
C2 = np.where(q == 2)[0]
C3 = np.where(q == 3)[0]

plt.figure(figsize = (10, 8))
plt.plot(X1[:,0], X1[:,1], '.', label = 'C1')
plt.plot(X2[:,0], X2[:,1], '.', label = 'C2')
plt.plot(X3[:,0], X3[:,1], '.', label = 'C3')
plt.plot(Xp[C1,0], Xp[C1,1], 's', color = 'blue', markersize = 8, alpha = 0.1)
plt.plot(Xp[C2,0], Xp[C2,1], 's', color = 'orange', markersize = 8, alpha = 0.1)
plt.plot(Xp[C3,0], Xp[C3,1], 's', color = 'green', markersize = 8, alpha = 0.1)
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 0, 12])
plt.show()

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py:598: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
  FutureWarning,

8.4. Decision Tree for Regression¶

Decision tree regression is when the predicted outcome can be considered a real number.

9. Ensemble¶

9.1. Ensemble Learning¶

Ensembles: collections of predictors
- Combine predictions to improve performance

Ensemble with different models
Assume combined models make better prediction than one good model
- Improve overfitting and accuracy

Ensemble with different train datasets

To test, run each trained model
- For regression, each regressor predicts, take average
- Each classifier votes on the output, take majority

9.2. Random Forest¶

Ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time

Random Forest for Classification

Random Forest for Regression

9.3. More Advanced Ensemble Algorithms¶

LightGBM (Light Gradient-Boosting Machine)
XGBoost (Extreme Gradient Boost)
CatBoost
...
We do not need to understand how they work, but we will use them.

9.4. K-Fold Cross-Validation¶

Useful especially for a small data set
Advantages of cross validation
- It can prevent models from overfitting
- Cross validation helps in finding the optimal value of hyperparameters to increase the efficiency of the algorithm.

9.5. Python¶

from sklearn import ensemble

clf = ensemble.RandomForestClassifier(n_estimators = 100, max_depth = 3, random_state= 0)
clf.fit(X,np.ravel(y))

RandomForestClassifier(max_depth=3, random_state=0)

res = 0.3
[X1gr, X2gr] = np.meshgrid(np.arange(-2,10,res), np.arange(0,12,res))

Xp = np.hstack([X1gr.reshape(-1,1), X2gr.reshape(-1,1)])
Xp = np.asmatrix(Xp)

q = clf.predict(Xp)
q = np.asmatrix(q).reshape(-1,1)

C1 = np.where(q == 1)[0]
C2 = np.where(q == 2)[0]
C3 = np.where(q == 3)[0]

plt.figure(figsize = (10, 8))
plt.plot(X1[:,0], X1[:,1], '.', label = 'C1')
plt.plot(X2[:,0], X2[:,1], '.', label = 'C2')
plt.plot(X3[:,0], X3[:,1], '.', label = 'C3')
plt.plot(Xp[C1,0], Xp[C1,1], 's', color = 'blue', markersize = 8, alpha = 0.1)
plt.plot(Xp[C2,0], Xp[C2,1], 's', color = 'orange', markersize = 8, alpha = 0.1)
plt.plot(Xp[C3,0], Xp[C3,1], 's', color = 'green', markersize = 8, alpha = 0.1)
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 0, 12])
plt.show()

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py:598: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
  FutureWarning,

10. Artificial Neural Networks (ANN)¶

Complex/Nonlinear universal function approximator
- Linearly connected networks
- Simple nonlinear neurons

Hidden layers
- Autonomous feature learning

10.1. Machine Learning vs. Deep Learning¶

Machine learning

Deep supervised learning

10.2. ANN in Python¶

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step

img = train_x[5].reshape(28,28)

plt.figure(figsize = (6,6))
plt.imshow(img, 'gray')
plt.xticks([])
plt.yticks([])
plt.show()

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape = (28, 28)),
    tf.keras.layers.Dense(units = 100, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])

# Train Model

loss = model.fit(train_x, train_y, epochs = 5)

Epoch 1/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2750 - accuracy: 0.9219
Epoch 2/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1244 - accuracy: 0.9628
Epoch 3/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0863 - accuracy: 0.9740
Epoch 4/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0668 - accuracy: 0.9798
Epoch 5/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0525 - accuracy: 0.9841

# Evaluate Test Data

test_loss, test_acc = model.evaluate(test_x, test_y)

313/313 [==============================] - 0s 1ms/step - loss: 0.0706 - accuracy: 0.9760

test_img = test_x[np.random.choice(test_x.shape[0], 1)]

predict = model.predict_on_batch(test_img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(mypred[0]))

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:12: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
  if sys.path[0] == '':

Prediction : 5

11. Example¶

Prediction of Young’s Modulus using Eddy Current
Eddy current
- Primary field is generated by the use of a coil connected to an alternating current generator driving an alternating magnetic field
- The current flow induced by this primary field within the conductive material will itself produce a magnetic field (secondary field) in opposition to the primary field according to Lenz’s law.

Eddy current can be widely used for predicting the stability of materials
- The intrinsic characteristics of eddy current is that this current will go around the crack when there are cracks in the material
- This eddy current can also be utilized to determine the Young’s modulus of materials

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Just for warning message ignore  
import warnings
warnings.filterwarnings(action="ignore")

data = pd.read_csv("/content/drive/MyDrive/kstp/data_files/young.csv")
data.head()

X = np.array(data.iloc[:,:-1])
y = np.array(data["Young's"])

Decision Tree Regression

# train test = 80 : 20
# random state is just for reproducibility for results.

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(random_state = 42)
model.fit(X_train,y_train)

score = model.score(X_test,y_test)
print(score)

0.8825333518300653

Gradient Boosting

from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor(random_state = 42)
model.fit(X_train,y_train)

score = model.score(X_test,y_test)
print(score)

0.8335168140610975

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

	Di	Df	Dif	rho	VSA	GSA	VPOV	GPOV	POAV_vol_frac	...	D_func-T-3-all	D_func-S-1-all	D_func-S-2-all	D_func-S-3-all	D_func-alpha-1-all	D_func-alpha-2-all	D_func-alpha-3-all	Young's
0	51.40377	48.15758	50.89898	223201.0	262.877	6936.59	0.9826	25.928105	0.9826	...	-1.666667	-0.02	-0.04	-0.033333	-3.9	-7.80000	-6.50000	0.031136
1	54.93011	29.52417	54.47078	289401.0	432.081	6933.52	0.9700	15.565401	0.9700	...	-1.666667	-0.02	-0.04	-0.033333	-3.9	-7.80000	-6.50000	0.088565
2	53.04427	39.51778	53.04427	174002.0	365.088	8886.50	0.9751	23.734647	0.9751	...	4.000000	-0.06	0.68	0.680000	-11.7	-9.81422	-9.81422	0.154541
3	45.84475	34.25428	42.18607	130709.0	421.191	6772.03	0.9740	15.660247	0.9740	...	-1.666667	-0.02	-0.04	-0.033333	-3.9	-7.80000	-6.50000	0.167602
4	51.02181	37.30721	50.44725	155241.0	407.138	8200.28	0.9671	19.478623	0.9671	...	4.000000	-0.06	0.68	0.680000	-11.7	-9.81422	-9.81422	0.174816