Machine Learning for Mechanical Engineering

Clustering


Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST
  • For your handwritten solutions, please scan or take a picture of them. Alternatively, you can write them in markdown if you prefer.

  • Only .ipynb files will be graded for your code.

    • Ensure that your NAME and student ID are included in your .ipynb files. ex) SeungchulLee_20241234_HW01.ipynb
  • Compress all the files into a single .zip file.

    • In the .zip file's name, include your NAME and student ID.

    ex) SeungchulLee_20241234_HW01.zip

    • Submit this .zip file on KLMS
  • Do not submit a printed version of your code, as it will not be graded.

Problem 1

  1. True or false for the following questions. (Correct +1, Wrong -1)

$\quad$ a. K-means clustering always gives the same result when cluster centers are randomly initalized.

$\quad$ b. K-means clustering is a suitable method for convex-shaped and different density clusters.

$\quad$ c. It is impossible to use K-means clustering when we don't know the number of clusters.

Problem 2

There are 6 different datasets noted as A,B,C,D,E,F. Each dataset is clustered using two different methods, and one of them is K-means. All results are shown in below figure. Centers for each cluster have been noted as 'x' and distance is measured by Euclidean norm.

You are required to determine which result is more likely generated by K-means clustering.



  1. Dataset A (Write A1 or A2, same in the following question)

  2. Dataset B

  3. Dataset C

  4. Dataset D

  5. Dataset E

  6. Dataset F

Problem 3

There are data points from four classes in 2 dimensional space. Each data point has its own centroid.

  1. Plot the data points on 2D plane and calculate the centroid of each class. (treat as a supervised learning)
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
In [ ]:
data1 = np.array((np.random.normal(0, 0.2, 100), np.random.normal(0.0, 0.4, 100)))
data2 = np.array((np.random.normal(1.2, 0.3, 100), np.random.normal(1.1, 0.7, 100)))
data3 = np.array((np.random.normal(1.6, 0.2, 100), np.random.normal(1.8, 0.7, 100)))
data4 = np.array((np.random.normal(0.3, 0.1, 100), np.random.normal(0.5, 0.3, 100)))
In [ ]:
# Write your code here

c1 = [c1x, c1y]
c2 = [c2x, c2y]
c3 = [c3x, c3y]
c3 = [c4x, c4y]
print('centrid 1 : ', c1)
print('centrid 2 : ', c2)
print('centrid 3 : ', c3)
print('centrid 4 : ', c4)

plt.scatter(label ='class 1', c = 'r', s = 1)
plt.scatter(label ='class 2', c = 'b', s = 1)
plt.scatter(label ='class 3', c = 'g', s = 1)
plt.scatter(label ='class 4', c = 'c', s = 1)
plt.scatter(s = 100, c = 'r')
plt.scatter(s = 100, c = 'b')
plt.scatter(s = 100, c = 'g')
plt.scatter(s = 100, c = 'c')
plt.legend()
plt.show()
  1. Assume that the labels are unknown, and then do k-means clustering on the given data points and compare the centroids of k-means clustering with the above centroids. (set $k = 4$)
In [ ]:
# Write your code here
#
  1. Assume that we do not know the number of partitions $k$. Choose the number of clusters by checking the 'elbow-point' in the cost verses $k$ graph.
In [ ]:
# Write your code here
#

Problem 4

In our class, we did k-means clustering on image colors. In some special cases, we can use k-means clustering for noise reduction in color. This is one of images from Piet Mondrian.

Download data

  1. Do the k-means clustering on this image. (Choose appropriate $k$ that you think)
In [ ]:
img = plt.imread('./image_files/Piet Mondrian.jpg')
plt.axis('off')
plt.imshow(img)
plt.show()
  1. How many clusters are needed for this image? Check the cost as $k$ increases. Then paint it with the centroid colors.
In [ ]:
# write your code here
#

Problem 5

We examined K-means to compress an image by reducing the number of colors in class. We can apply the same method to compress an image down to binary colors (i.e., black and white, and it is the maximum compression). When I, as an instructor, prepare for the lecture materials, I often write some equations down on a whiteboard and take a picture of it to put it in lecture slides as shown in Figure 1. However, if I use it as it is, it will take up a lot of ink or toner when it is printed by a number of students. Since I do not want to waste unnecessary resources and do want to maintain our only one Earth as sustainable as possible, I will ask you to convert it to a binary image which prints only black parts.

You can download handwirtten.jpg





In [ ]:
import cv2
import matplotlib.pyplot as plt

img = cv2.imread('./image_files/k_means_handwritten.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(8,8))
plt.imshow(img)
plt.axis('off')
plt.show()
In [ ]:
## Your code here
#