Loading [MathJax]/jax/output/HTML-CSS/jax.js



Classification: Perceptron


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

1. Classification

  • where y is a discrete value
    • develop the classification algorithm to determine which class a new input should fall into
  • start with binary class problems
    • Later look at multiclass classification problem, although this is just an extension of binary classification
  • We could use linear regression
    • Then, threshold the classifier output (i.e. anything over some value is yes, else no)
    • linear regression with thresholding seems to work
  • We will learn
    • perceptron
    • logistic regression

2. Perceptron

  • For input x=[x1xd] 'attributes of a customer'
  • weights ω=[ω1ωd]
Approve credit ifdi=1ωixi>threshold,Deny credit ifdi=1ωixi<threshold.


h(x)=sign((di=1ωixi)threshold)=sign((di=1ωixi)+ω0)

  • Introduce an artificial coordinate x0=1:
h(x)=sign(di=0ωixi)
  • In a vector form, the perceptron implements
h(x)=sign(ωTx)
  • Sign function
sgn(x)={1,if x<00,if x=01,if x>0




  • Hyperplane

    • Separates a D-dimensional space into two half-spaces
    • Defined by an outward pointing normal vector ω
    • ω is orthogonal to any vector lying on the hyperplane
    • Assume the hyperplane passes through origin, ωTx=0 with x0=1



  • Sign with respect to a line
ω=[ω1ω2],x=[x1x2]g(x)=ω0+ω1x1+ω2x2=ω0+ωTxω=[ω0ω1ω2],x=[1x1x2]g(x)=ω01+ω1x1+ω2x2=ωTx




  • Goal: to learn the hyperplane gω(x)=0 using the training data
  • How to find ω

    • All data in class 1 (y=+1) g(x)>0
    • All data in class 0 (y=1) g(x)<0

2.1. Perceptron Algorithm

The perceptron implements

h(x)=sign(ωTx)

Given the training set

(x1,y1),(x2,y2),,(xN,yN)where yi{1,1}

1) pick a misclassified point

sign(ωTxn)yn

2) and update the weight vector

ωω+ynxn





Why perceptron updates work ?

  • Let's look at a misclassified positive example (yn=+1)

    • perceptron (wrongly) thinks ωToldxn<0
  • updates would be ωnew=ωold+ynxn=ωold+xnωTnewxn=(ωold+xn)Txn=ωToldxn+xTnxn
  • Thus ωTnewxn is less negative than ωToldxn

2.2. Iterations of Perceptron

  1. Randomly assign ω

  2. One iteration of the PLA (perceptron learning algorithm) ωω+yx where (x,y) is a misclassified training point

  3. At iteration i=1,2,3,, pick a misclassified point from (x1,y1),(x2,y2),,(xN,yN)

  4. and run a PLA iteration on it

  5. That's it!




Summary





2.3. Perceptron loss function



L(ω)=mn=1max{0,yn(ωTxn)}

  • Loss =0 on examples where perceptron is correct, i.e., yn(ωTxn)>0
  • Loss >0 on examples where perceptron misclassified, i.e., yn(ωTxn)<0


Note: sign(ωTxn)yn is equivalent to yn(ωTxn)<0

3. Perceptron in Python


g(x)=ω0+ωTx=ω0+ω1x1+ω2x2=0



ω=[ω0ω1ω2]x=[(x(1))T(x(2))T(x(3))T(x(m))T]=[1x(1)1x(1)21x(2)1x(2)21x(3)1x(3)21x(m)1x(m)2],y=[y(1)y(2)y(3)y(m)]

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
#training data gerneration
m = 100
x1 = 8*np.random.rand(m, 1)
x2 = 7*np.random.rand(m, 1) - 4

g = 0.8*x1 + x2 - 3
In [3]:
C1 = np.where(g >= 1)
C0 = np.where(g < -1)
print(C1)
(array([ 3,  4,  7, 13, 17, 19, 22, 24, 27, 34, 36, 40, 42, 43, 44, 50, 59,
       63, 68, 71, 77, 78, 84, 88, 91, 92, 95, 96, 97, 98, 99],
      dtype=int64), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64))
In [4]:
C1 = np.where(g >= 1)[0]
C0 = np.where(g < -1)[0]
print(C1.shape)
print(C0.shape)
(31,)
(43,)
In [5]:
plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.title('Linearly Separable Classes', fontsize = 15)
plt.legend(loc = 1, fontsize = 15)
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.show()
x=[(x(1))T(x(2))T(x(3))T(x(m))T]=[1x(1)1x(1)21x(2)1x(2)21x(3)1x(3)21x(m)1x(m)2]y=[y(1)y(2)y(3)y(m)]
In [6]:
X1 = np.hstack([np.ones([C1.shape[0],1]), x1[C1], x2[C1]])
X0 = np.hstack([np.ones([C0.shape[0],1]), x1[C0], x2[C0]])
X = np.vstack([X1, X0])

y = np.vstack([np.ones([C1.shape[0],1]), -np.ones([C0.shape[0],1])])

X = np.asmatrix(X)
y = np.asmatrix(y)
ω=[ω0ω1ω2]


ωω+yx where (x,y) is a misclassified training point

In [7]:
w = np.ones([3,1])
w = np.asmatrix(w)

n_iter = y.shape[0]
for k in range(n_iter):
    for i in range(n_iter):
        if y[i,0] != np.sign(X[i,:]*w)[0,0]:
            w += y[i,0]*X[i,:].T

print(w)
[[-20.        ]
 [  5.87514833]
 [  9.87930425]]
g(x)=ω0+ωTx=ω0+ω1x1+ω2x2=0x2=ω1ω2x1ω0ω2
In [8]:
x1p = np.linspace(0,8,100).reshape(-1,1)
x2p = - w[1,0]/w[2,0]*x1p - w[0,0]/w[2,0]

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.plot(x1p, x2p, c = 'k', linewidth = 3, label = 'perceptron')
plt.xlim([0, 8])
plt.xlabel('$x_1$', fontsize = 15)
plt.ylabel('$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 15)
plt.show()

4. Perceptron using Scikit-Learn



x=[(x(1))T(x(2))T(x(3))T(x(m))T]=[x(1)1x(1)2x(2)1x(2)2x(3)1x(3)2x(m)1x(m)2]y=[y(1)y(2)y(3)y(m)]

In [9]:
X1 = np.hstack([x1[C1], x2[C1]])
X0 = np.hstack([x1[C0], x2[C0]])
X = np.vstack([X1, X0])

y = np.vstack([np.ones([C1.shape[0],1]), -np.ones([C0.shape[0],1])])
In [10]:
from sklearn import linear_model

clf = linear_model.Perceptron(tol=1e-3)
clf.fit(X, np.ravel(y))
Out[10]:
Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
           fit_intercept=True, max_iter=1000, n_iter_no_change=5, n_jobs=None,
           penalty=None, random_state=0, shuffle=True, tol=0.001,
           validation_fraction=0.1, verbose=0, warm_start=False)
In [11]:
clf.predict([[3, -2]])
Out[11]:
array([-1.])
In [12]:
clf.predict([[6, 2]])
Out[12]:
array([1.])
In [13]:
clf.coef_
Out[13]:
array([[4.45077785, 8.31238814]])
In [14]:
clf.intercept_
Out[14]:
array([-17.])
g(x)=ω0+ωTx=ω0+ω1x1+ω2x2=0x2=ω1ω2x1ω0ω2
In [15]:
w0 = clf.intercept_[0]
w1 = clf.coef_[0,0]
w2 = clf.coef_[0,1]
In [16]:
x1p = np.linspace(0,8,100).reshape(-1,1)
x2p = - w1/w2*x1p - w0/w2

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.plot(x1p, x2p, c = 'k', linewidth = 4, label = 'perceptron')
plt.xlim([0, 8])
plt.xlabel('$x_1$', fontsize = 15)
plt.ylabel('$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 15)
plt.show()

5. The Best Hyperplane Separator?

  • Perceptron finds one of the many possible hyperplanes separating the data if one exists
  • Of the many possible choices, which one is the best?
  • Utilize distance information
  • Intuitively we want the hyperplane having the maximum margin
  • Large margin leads to good generalization on the test data
    • we will see this formally when we cover Support Vector Machine



6. Vidoe Lectures

In [17]:
%%html
<center><iframe src="https://www.youtube.com/embed/NBkNVfDoTjY?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
In [18]:
%%html
<center><iframe src="https://www.youtube.com/embed/riIOSV1GSOI?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
In [19]:
%%html
<center><iframe src="https://www.youtube.com/embed/4C-JXJpULe0?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
In [20]:
%%html
<center><iframe src="https://www.youtube.com/embed/TpWGUQ3vudg?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
In [21]:
%%html
<center><iframe src="https://www.youtube.com/embed/1-WDtNC51cg?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
In [22]:
%%html
<center><iframe src="https://www.youtube.com/embed/nH_J02FnDBI?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
In [23]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')