한국소성가공학회 전문교육

Part 1: 인공지능 기초


Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Machine Learning and Deep Learning




1.1. Taxonomy of AI



1.2. Scikit Learn

  • Machine Learning in Python
  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license
  • https://scikit-learn.org/stable/index.html




1.3. Supervised Learning

  • Given training set $\left\{ \left(x^{(1)}, y^{(1)}\right), \left(x^{(2)}, y^{(2)}\right),\cdots,\left(x^{(m)}, y^{(m)}\right) \right\}$
  • Want to find a function $f_{\omega}$ with learning parameter, $\omega$
    • $f_{\omega}$ desired to be as close as possible to $y$ for future $(x,y)$
    • $i.e., f_{\omega}(x) \approx y$
  • Define a loss function
$$\ell \left(f_{\omega} \left(x^{(i)}\right), y^{(i)}\right)$$
  • Solve the following optimization problem:
$$ \begin{align*} \text{minimize} &\quad \frac{1}{m} \sum_{i=1}^{m} \ell \left(f_{\omega} \left(x^{(i)}\right), y^{(i)}\right)\\ \text{subject to} &\quad \omega \in \boldsymbol{\omega} \end{align*} $$


  • Function approximation between inputs and outputs


  • Once it is learned,

2. Regression

  • A set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables

2.1. Linear Regression



2.2. Multivariate Linear Regression



2.3. Nonlinear Regression



2.4. Feature Selection

  • Multivariate regression


$$ \hat{y} = \theta_0 + \theta_{1}x_1 + \theta_{2}x_2 + \theta_{3}x_3 + \cdots $$

  • The process of selecting a subset of relevant features (variables, predictors) for use in model construction.
  • Feature selection techniques are used for several reasons:
    • simplification of models to make them easier to interpret,
    • shorter training times,
    • to avoid the curse of dimensionality,
    • improve data's compatibility with a learning model class,
    • encode inherent symmetries present in the input space.

2.5. Correlation Coefficient

  • $+1 \to$ close to a straight line

  • $-1 \to$ close to a straight line

  • Indicate how close to a linear line, but

  • No information on slope

$$0 \leq \left\lvert \text{ correlation coefficient } \right\rvert \leq 1$$$$\hspace{1cm}\begin{array}{Icr}\leftarrow\\ (\text{uncorrelated})\end{array} \quad \quad \quad \begin{array}{Icr}\rightarrow \\ (\text{linearly correlated})\end{array}$$
  • Does not tell anything about causality

2.6. Correlation Coefficient Plot



2.7. Python

2.7.1. Linear Regression

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# data points in column vector [input, output]
x = np.array([0.1, 0.4, 0.7, 1.2, 1.3, 1.7, 2.2, 2.8, 3.0, 4.0, 4.3, 4.4, 4.9]).reshape(-1, 1)
y = np.array([0.5, 0.9, 1.1, 1.5, 1.5, 2.0, 2.2, 2.8, 2.7, 3.0, 3.5, 3.7, 3.9]).reshape(-1, 1)

# to plot
plt.figure(figsize=(10, 6))
plt.title('Linear Regression', fontsize=15)
plt.xlabel('X', fontsize=15)
plt.ylabel('Y', fontsize=15)
plt.plot(x, y, 'ko', label="data")
plt.xlim([0, 5])
plt.grid(alpha=0.3)
plt.axis('scaled')
plt.show()
In [3]:
from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(x,y)
Out[3]:
LinearRegression()
In [4]:
print(reg.coef_)       # Coef
print(reg.intercept_)  # Bias
[[0.67129519]]
[0.65306531]
In [5]:
# to plot
plt.figure(figsize=(10, 6))
plt.title('Linear Regression', fontsize=15)
plt.xlabel('X', fontsize=15)
plt.ylabel('Y', fontsize=15)
plt.plot(x, y, 'ko', label="data")

# to plot a straight line (fitted line)
xp = np.arange(0, 5, 0.01).reshape(-1, 1)
yp = reg.coef_*xp + reg.intercept_

plt.plot(xp, yp, 'r', linewidth=2, label="$L_2$")
plt.legend(fontsize=15)
plt.axis('scaled')
plt.grid(alpha=0.3)
plt.xlim([0, 5])
plt.show()

2.7.2. Nonlinear Regression

In [6]:
n = 100            
x = -5 + 15*np.random.rand(n, 1)
noise = 10*np.random.randn(n, 1)
y = 10 + 1*x + 2*x**2 + noise

plt.figure(figsize=(10, 6))
plt.title('Nonlinear Regression', fontsize=15)
plt.xlabel('X', fontsize=15)
plt.ylabel('Y', fontsize=15)
plt.plot(x, y, 'o', markersize=4, label='actual')
plt.xlim([np.min(x), np.max(x)])
plt.grid(alpha=0.3)
plt.legend(fontsize=15)
plt.show()
In [7]:
from sklearn.kernel_ridge import KernelRidge

reg = KernelRidge(kernel='rbf', gamma=0.1)
reg.fit(x, y)
Out[7]:
KernelRidge(gamma=0.1, kernel='rbf')
In [8]:
p = reg.predict(x)
In [9]:
plt.figure(figsize=(10, 6))
plt.title('Nonlinear Regression', fontsize=15)
plt.xlabel('X', fontsize=15)
plt.ylabel('Y', fontsize=15)
plt.plot(x, y, 'o', markersize=4, label='actual')
plt.plot(x, p, 'ro', markersize=4, label='predict')
plt.grid(alpha=0.3)
plt.legend(fontsize=15)
plt.xlim([np.min(x), np.max(x)])
plt.show()

3. Classification

  • where $y$ is a discrete value
    • develop the classification algorithm to determine which class a new input should fall into
  • To find a classification boundary
  • We will learn
    • Support Vector Machine (SVM)
    • Logistic Regression


3.1. Linear Classification


3.2. Non-linear Classification


3.3. Python

3.3.1. SVM

In [10]:
x1 = 8*np.random.rand(100, 1)
x2 = 7*np.random.rand(100, 1) - 4

g0 = 0.8*x1 + x2 - 3
g1 = g0 - 1
g2 = g0 + 1

C1 = np.where(g1 >= 0)[0]
C2 = np.where(g2 < 0)[0]

X1 = np.hstack([x1[C1],x2[C1]])
X2 = np.hstack([x1[C2],x2[C2]])
n = X1.shape[0]
m = X2.shape[0]
X = np.vstack([X1, X2])
y = np.vstack([np.zeros([n, 1]), np.ones([m, 1])])

plt.figure(figsize=(10, 6))
plt.plot(x1[C1], x2[C1], 'ro', label='C1')
plt.plot(x1[C2], x2[C2], 'bo', label='C2')
plt.xlabel('$x_1$', fontsize = 20)
plt.ylabel('$x_2$', fontsize = 20)
plt.legend(loc = 4)
plt.xlim([0, 8])
plt.ylim([-4, 3])
plt.show()
In [11]:
from sklearn.svm import SVC

clf = SVC(kernel='linear')
clf.fit(X, np.ravel(y))
Out[11]:
SVC(kernel='linear')
In [12]:
print(clf.coef_)
print(clf.intercept_)
[[-0.76177006 -0.93761421]]
[2.85068138]
In [13]:
xp = np.linspace(0,8,100).reshape(-1,1)
yp = -clf.coef_[0,0]/clf.coef_[0,1]*xp - clf.intercept_/clf.coef_[0,1]

plt.figure(figsize=(10, 6))
plt.plot(X[0:n, 0], X[0:n, 1], 'ro', label='C1')
plt.plot(X[n:-1, 0], X[n:-1, 1], 'bo', label='C2')
plt.plot(xp, yp, '--k', label='SVM')
plt.xlabel('$x_1$', fontsize = 20)
plt.ylabel('$x_2$', fontsize = 20)
plt.legend(loc = 4)
plt.xlim([0, 8])
plt.ylim([-4, 3])
plt.show()

3.3.2. Logistic Regression

In [14]:
m = 500

X0 = np.random.multivariate_normal([0, 0], np.eye(2), m)
X1 = np.random.multivariate_normal([10, 10], np.eye(2), m)

X = np.vstack([X0, X1])
y = np.vstack([np.zeros([m,1]), np.ones([m,1])])

plt.figure(figsize=(10, 6))
plt.plot(X0[:,0], X0[:,1], '.b', label='Class 0')
plt.plot(X1[:,0], X1[:,1], '.k', label='Class 1')

plt.title('Data Classes', fontsize=15)
plt.legend(loc='lower right', fontsize=15)
plt.xlabel('X1', fontsize=15)
plt.ylabel('X2', fontsize=15)
plt.xlim([-10,20])
plt.ylim([-4,14])
plt.grid(alpha=0.3)
plt.show()
In [15]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()
clf.fit(X, np.ravel(y))
Out[15]:
LogisticRegression()
In [16]:
print(clf.coef_)
print(clf.intercept_)
[[0.93938153 0.90472328]]
[-9.3291099]
In [17]:
xp = np.linspace(-10,20,100).reshape(-1,1)
yp = -clf.coef_[0,0]/clf.coef_[0,1]*xp - clf.intercept_/clf.coef_[0,1]

plt.figure(figsize=(10, 6))
plt.plot(X0[:,0], X0[:,1], '.b', label='Class 0')
plt.plot(X1[:,0], X1[:,1], '.k', label='Class 1')
plt.plot(xp, yp, '--k', label='Logistic')
plt.xlim([-10,20])
plt.ylim([-4,14])

plt.title('Data Classes', fontsize=15)
plt.legend(loc='lower right', fontsize=15)
plt.xlabel('X1', fontsize=15)
plt.ylabel('X2', fontsize=15)
plt.grid(alpha=0.3)
plt.show()
In [18]:
pred = clf.predict_proba([[0,6]])
pred
Out[18]:
array([[0.98017467, 0.01982533]])

4. Steps for Machine Learning

4.1. Model Evaluation

  • Adding more features will always decrease the loss
  • How do we determine when an algorithm achieves “good” performance?


  • A better criterion:
    • Training set (e.g., 70 %)
    • Testing set (e.g., 30 %)
  • Performance on testing set called generalization performance

4.2. Supervised Learning

  • Workflow



  • Workflow in more detail



5. Supervised Learning vs. Unsupervised Learning




6. Clustering

  • Data clustering is an unsupervised learning problem

  • Given:

    • $m$ unlabeled examples $\{x^{(1)},x^{(2)}\cdots, x^{(m)}\}$
    • the number of partitions $k$
  • Goal: group the examples into $k$ partitions


$$\{x^{(1)},x^{(2)},\cdots,x^{(m)}\} \quad \Rightarrow \quad \text{Clustering}$$


6.1. K-means



6.2. Python

In [19]:
m = 200

X0 = np.random.multivariate_normal([-1, 1], np.eye(2), m)
X1 = np.random.multivariate_normal([15, 10], np.eye(2), m)
X2 = np.random.multivariate_normal([0, 6], np.eye(2), m)
X = np.vstack([X0, X1, X2])

plt.figure(figsize=(10, 6))
plt.plot(X[:,0], X[:,1], '.b')

plt.xlim([-10,20])
plt.ylim([-4,14])
plt.grid(alpha=0.3)
plt.show()
In [20]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters = 3, random_state = 0)
kmeans.fit(X)
Out[20]:
KMeans(n_clusters=3, random_state=0)
In [21]:
print(kmeans.labels_)
[2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0]
In [22]:
plt.figure(figsize=(10,6))

plt.plot(X[kmeans.labels_ == 0,0],X[kmeans.labels_ == 0,1],'g.', label=0)
plt.plot(X[kmeans.labels_ == 1,0],X[kmeans.labels_ == 1,1],'k.', label=1)
plt.plot(X[kmeans.labels_ == 2,0],X[kmeans.labels_ == 2,1],'r.', label=2)

plt.xlim([-10,20])
plt.ylim([-4,14])
plt.grid(alpha=0.3)
plt.legend(loc='lower right', fontsize=15)
plt.show()

7. Dimension Reduction

  • Why dimensionality reduction?
    • insights into the low-dimensinal structures in the data (visualization)
    • Fewer dimensions ⇒ Less chances of overfitting ⇒ Better generalization
    • Speeding up learning algorithms
      • Most algorithms scale badly with increasing data dimensionality
    • Less storage requirements (data compression)
    • Note: Dimensionality Reduction is different from Feature Selection
      • .. although the goals are kind of the same
    • Dimensionality reduction is more like “Feature Extraction
      • Constructing a small set of new features from the original features
  • How?
    • idea: highly correlated data contains redundant features




7.1. Principal Component Analysis (PCA)

  • Each example $x$ has 2 features $\{x_1,x_2\}$

  • Consider ignoring the feature $x_2$ for each example

  • Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$

  • Are we losing much information by throwing away $x_2$ ?

  • No. Most of the data spread is along 𝑥_1 (very little variance along 𝑥_2)





  • Each example $x$ has 2 features $\{x_1,x_2\}$

  • Consider ignoring the feature $x_2$ for each example

  • Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$

  • Are we losing much information by throwing away $x_2$ ?

  • Yes, the data has substantial variance along both features (i.e. both axes)





  • Now consider a change of axes

  • Each example $x$ has 2 features $\{u_1,u_2\}$

  • Consider ignoring the feature $u_2$ for each example

  • Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{u_1\}$

  • Are we losing much information by throwing away $u_2$ ?

  • No. Most of the data spread is along $u_1$ (very little variance along $u_2$)





  • Data $\rightarrow$ projection onto unit vector $\hat{u}_1$
    • PCA is used when we want projections capturing maximum variance directions
    • Principal Components (PC): directions of maximum variability in the data
    • Roughly speaking, PCA does a change of axes that can represent the data in a succinct manner




In [23]:
m = 5000
mu = np.array([0, 0])
sigma = np.array([[3, 1.5], 
                  [1.5, 1]])

X = np.random.multivariate_normal(mu, sigma, m)

fig = plt.figure(figsize=(10, 6))
plt.plot(X[:,0], X[:,1], 'k.')
plt.axis('equal')
plt.show()
In [24]:
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(X)
Out[24]:
PCA(n_components=2)
In [25]:
plt.figure()
plt.stem(range(1,3),pca.explained_variance_ratio_)

plt.xlim([0.5, 2.5])
plt.ylim([0, 1])
plt.title('Score (%)')
plt.show()
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:2: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
  
In [26]:
principal_axis = pca.components_[0, :]
u1 = principal_axis/(np.linalg.norm(principal_axis)) 
h = u1[1]/u1[0]

xp = np.linspace(-6,6,200)
yp = xp.dot(h)

plt.figure(figsize=(10,6))
plt.plot(X[:, 0], X[:, 1],'k.')
plt.plot(xp, yp, 'r.')
plt.axis('equal')
plt.show()

8. Decision Tree

8.1. Decision Tree for Classification







In [27]:
from sklearn import tree
In [28]:
data = np.array([[0, 0, 1, 0, 0],
                [1, 0, 2, 0, 0],
                [0, 1, 2, 0, 1],
                [2, 1, 0, 2, 1],
                [0, 1, 0, 1, 1],
                [1, 1, 1, 2, 0],
                [1, 1, 0, 2, 0],
                [0, 0, 2, 1, 0]])      

x = data[:,0:4]
y = data[:,4]
print(x, '\n')
print(y)
[[0 0 1 0]
 [1 0 2 0]
 [0 1 2 0]
 [2 1 0 2]
 [0 1 0 1]
 [1 1 1 2]
 [1 1 0 2]
 [0 0 2 1]] 

[0 0 1 1 1 0 0 0]
In [29]:
clf = tree.DecisionTreeClassifier(criterion = 'entropy', max_depth = 3, random_state=0)
clf.fit(x,y)
Out[29]:
DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0)
In [30]:
clf.predict([[0, 0, 1, 0]])
Out[30]:
array([0])

8.2. Nonlinear Classification

In [31]:
X1 = np.array([[-1.1,0],[-0.3,0.1],[-0.9,1],[0.8,0.4],[0.4,0.9],[0.3,-0.6],
               [-0.5,0.3],[-0.8,0.6],[-0.5,-0.5]])
     
X0 = np.array([[-1,-1.3], [-1.6,2.2],[0.9,-0.7],[1.6,0.5],[1.8,-1.1],[1.6,1.6],
               [-1.6,-1.7],[-1.4,1.8],[1.6,-0.9],[0,-1.6],[0.3,1.7],[-1.6,0],[-2.1,0.2]])

X1 = np.asmatrix(X1)
X0 = np.asmatrix(X0)

plt.figure(figsize=(10, 8))
plt.plot(X1[:,0], X1[:,1], 'ro', label = 'C1')
plt.plot(X0[:,0], X0[:,1], 'bo', label = 'C0')
plt.title('SVM for Nonlinear Data', fontsize = 15)
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 12)
plt.axis('equal')
plt.show()
In [32]:
N = X1.shape[0]
M = X0.shape[0]

X = np.vstack([X1, X0])
y = np.vstack([np.ones([N,1]), np.zeros([M,1])])
In [33]:
clf = tree.DecisionTreeClassifier(criterion = 'entropy', max_depth = 4, random_state=0)
clf.fit(X,y)
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py:598: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
  FutureWarning,
Out[33]:
DecisionTreeClassifier(criterion='entropy', max_depth=4, random_state=0)
In [34]:
clf.predict([[0, 1]])
Out[34]:
array([1.])
In [35]:
# to plot
[X1gr, X2gr] = np.meshgrid(np.arange(-3,3,0.1), np.arange(-3,3,0.1))

Xp = np.hstack([X1gr.reshape(-1,1), X2gr.reshape(-1,1)])
Xp = np.asmatrix(Xp)

q = clf.predict(Xp)
q = np.asmatrix(q).reshape(-1,1)

C1 = np.where(q == 1)[0]

plt.figure(figsize = (10, 8))
plt.plot(X1[:,0], X1[:,1], 'ro', label = 'C1')
plt.plot(X0[:,0], X0[:,1], 'bo', label = 'C0')
plt.plot(Xp[C1,0], Xp[C1,1], 'gs', markersize = 8, alpha = 0.1, label = 'Decison Tree')
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 12)
plt.axis('equal')
plt.show()
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py:598: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
  FutureWarning,

8.3. Multiclass Classification

In [36]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## generate three simulated clusters
mu1 = np.array([1, 7])
mu2 = np.array([3, 4])
mu3 = np.array([6, 5])

SIGMA1 = 0.8*np.array([[1, 1.5],
                       [1.5, 3]])
SIGMA2 = 0.5*np.array([[2, 0],
                       [0, 2]])
SIGMA3 = 0.5*np.array([[1, -1],
                       [-1, 2]])

X1 = np.random.multivariate_normal(mu1, SIGMA1, 100)
X2 = np.random.multivariate_normal(mu2, SIGMA2, 100)
X3 = np.random.multivariate_normal(mu3, SIGMA3, 100)

y1 = 1*np.ones([100,1])
y2 = 2*np.ones([100,1])
y3 = 3*np.ones([100,1])

plt.figure(figsize = (10, 8))
plt.title('Generated Data', fontsize = 15)
plt.plot(X1[:,0], X1[:,1], '.', label = 'C1')
plt.plot(X2[:,0], X2[:,1], '.', label = 'C2')
plt.plot(X3[:,0], X3[:,1], '.', label = 'C3')
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 0, 12])
plt.show()
In [37]:
X = np.vstack([X1, X2, X3])
y = np.vstack([y1, y2, y3])

clf = tree.DecisionTreeClassifier(criterion = 'entropy', max_depth = 3, random_state = 0)
clf.fit(X,y)
Out[37]:
DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0)
In [38]:
res = 0.3
[X1gr, X2gr] = np.meshgrid(np.arange(-2,10,res), np.arange(0,12,res))

Xp = np.hstack([X1gr.reshape(-1,1), X2gr.reshape(-1,1)])
Xp = np.asmatrix(Xp)

q = clf.predict(Xp)
q = np.asmatrix(q).reshape(-1,1)

C1 = np.where(q == 1)[0]
C2 = np.where(q == 2)[0]
C3 = np.where(q == 3)[0]

plt.figure(figsize = (10, 8))
plt.plot(X1[:,0], X1[:,1], '.', label = 'C1')
plt.plot(X2[:,0], X2[:,1], '.', label = 'C2')
plt.plot(X3[:,0], X3[:,1], '.', label = 'C3')
plt.plot(Xp[C1,0], Xp[C1,1], 's', color = 'blue', markersize = 8, alpha = 0.1)
plt.plot(Xp[C2,0], Xp[C2,1], 's', color = 'orange', markersize = 8, alpha = 0.1)
plt.plot(Xp[C3,0], Xp[C3,1], 's', color = 'green', markersize = 8, alpha = 0.1)
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 0, 12])
plt.show()
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py:598: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
  FutureWarning,

8.4. Decision Tree for Regression

  • Decision tree regression is when the predicted outcome can be considered a real number.





9. Ensemble

9.1. Ensemble Learning

  • Ensembles: collections of predictors
    • Combine predictions to improve performance
  • Ensemble with different models
  • Assume combined models make better prediction than one good model
    • Improve overfitting and accuracy





  • Ensemble with different train datasets



  • To test, run each trained model
    • For regression, each regressor predicts, take average
    • Each classifier votes on the output, take majority



9.2. Random Forest

  • Ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time
  • Random Forest for Classification



  • Random Forest for Regression




9.3. More Advanced Ensemble Algorithms

  • LightGBM (Light Gradient-Boosting Machine)
  • XGBoost (Extreme Gradient Boost)
  • CatBoost
  • ...
  • We do not need to understand how they work, but we will use them.

9.4. K-Fold Cross-Validation

  • Useful especially for a small data set
  • Advantages of cross validation
    • It can prevent models from overfitting
    • Cross validation helps in finding the optimal value of hyperparameters to increase the efficiency of the algorithm.




9.5. Python

In [39]:
from sklearn import ensemble

clf = ensemble.RandomForestClassifier(n_estimators = 100, max_depth = 3, random_state= 0)
clf.fit(X,np.ravel(y))
Out[39]:
RandomForestClassifier(max_depth=3, random_state=0)
In [40]:
res = 0.3
[X1gr, X2gr] = np.meshgrid(np.arange(-2,10,res), np.arange(0,12,res))

Xp = np.hstack([X1gr.reshape(-1,1), X2gr.reshape(-1,1)])
Xp = np.asmatrix(Xp)

q = clf.predict(Xp)
q = np.asmatrix(q).reshape(-1,1)

C1 = np.where(q == 1)[0]
C2 = np.where(q == 2)[0]
C3 = np.where(q == 3)[0]

plt.figure(figsize = (10, 8))
plt.plot(X1[:,0], X1[:,1], '.', label = 'C1')
plt.plot(X2[:,0], X2[:,1], '.', label = 'C2')
plt.plot(X3[:,0], X3[:,1], '.', label = 'C3')
plt.plot(Xp[C1,0], Xp[C1,1], 's', color = 'blue', markersize = 8, alpha = 0.1)
plt.plot(Xp[C2,0], Xp[C2,1], 's', color = 'orange', markersize = 8, alpha = 0.1)
plt.plot(Xp[C3,0], Xp[C3,1], 's', color = 'green', markersize = 8, alpha = 0.1)
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 0, 12])
plt.show()
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py:598: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
  FutureWarning,

10. Artificial Neural Networks (ANN)

  • Complex/Nonlinear universal function approximator
    • Linearly connected networks
    • Simple nonlinear neurons
  • Hidden layers
    • Autonomous feature learning




10.1. Machine Learning vs. Deep Learning

  • Machine learning




  • Deep supervised learning






10.2. ANN in Python





In [41]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
In [42]:
mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
In [43]:
img = train_x[5].reshape(28,28)

plt.figure(figsize = (6,6))
plt.imshow(img, 'gray')
plt.xticks([])
plt.yticks([])
plt.show()
In [44]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape = (28, 28)),
    tf.keras.layers.Dense(units = 100, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])
In [45]:
model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])
In [46]:
# Train Model

loss = model.fit(train_x, train_y, epochs = 5)
Epoch 1/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2750 - accuracy: 0.9219
Epoch 2/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1244 - accuracy: 0.9628
Epoch 3/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0863 - accuracy: 0.9740
Epoch 4/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0668 - accuracy: 0.9798
Epoch 5/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0525 - accuracy: 0.9841
In [47]:
# Evaluate Test Data

test_loss, test_acc = model.evaluate(test_x, test_y)
313/313 [==============================] - 0s 1ms/step - loss: 0.0706 - accuracy: 0.9760
In [48]:
test_img = test_x[np.random.choice(test_x.shape[0], 1)]

predict = model.predict_on_batch(test_img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(mypred[0]))
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:12: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
  if sys.path[0] == '':
Prediction : 5

11. Example

  • Prediction of Young’s Modulus using Eddy Current

  • Eddy current

    • Primary field is generated by the use of a coil connected to an alternating current generator driving an alternating magnetic field
    • The current flow induced by this primary field within the conductive material will itself produce a magnetic field (secondary field) in opposition to the primary field according to Lenz’s law.
  • Eddy current can be widely used for predicting the stability of materials
    • The intrinsic characteristics of eddy current is that this current will go around the crack when there are cracks in the material
    • This eddy current can also be utilized to determine the Young’s modulus of materials

picutre-2

In [50]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Just for warning message ignore  
import warnings
warnings.filterwarnings(action="ignore")
In [58]:
data = pd.read_csv("/content/drive/MyDrive/kstp/data_files/young.csv")
data.head()
Out[58]:
Di Df Dif rho VSA GSA VPOV GPOV POAV_vol_frac PONAV_vol_frac ... D_func-T-3-all D_func-S-0-all D_func-S-1-all D_func-S-2-all D_func-S-3-all D_func-alpha-0-all D_func-alpha-1-all D_func-alpha-2-all D_func-alpha-3-all Young's
0 51.40377 48.15758 50.89898 223201.0 262.877 6936.59 0.9826 25.928105 0.9826 0.0 ... -1.666667 0 -0.02 -0.04 -0.033333 0 -3.9 -7.80000 -6.50000 0.031136
1 54.93011 29.52417 54.47078 289401.0 432.081 6933.52 0.9700 15.565401 0.9700 0.0 ... -1.666667 0 -0.02 -0.04 -0.033333 0 -3.9 -7.80000 -6.50000 0.088565
2 53.04427 39.51778 53.04427 174002.0 365.088 8886.50 0.9751 23.734647 0.9751 0.0 ... 4.000000 0 -0.06 0.68 0.680000 0 -11.7 -9.81422 -9.81422 0.154541
3 45.84475 34.25428 42.18607 130709.0 421.191 6772.03 0.9740 15.660247 0.9740 0.0 ... -1.666667 0 -0.02 -0.04 -0.033333 0 -3.9 -7.80000 -6.50000 0.167602
4 51.02181 37.30721 50.44725 155241.0 407.138 8200.28 0.9671 19.478623 0.9671 0.0 ... 4.000000 0 -0.06 0.68 0.680000 0 -11.7 -9.81422 -9.81422 0.174816

5 rows × 191 columns

In [59]:
X = np.array(data.iloc[:,:-1])
y = np.array(data["Young's"])

Decision Tree Regression

picture-6

In [60]:
# train test = 80 : 20
# random state is just for reproducibility for results.

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)

picture-3

In [64]:
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(random_state = 42)
model.fit(X_train,y_train)

score = model.score(X_test,y_test)
print(score)
0.8825333518300653

Gradient Boosting

picture-4

In [66]:
from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor(random_state = 42)
model.fit(X_train,y_train)

score = model.score(X_test,y_test)
print(score)
0.8335168140610975
In [67]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')