For your handwritten solutions, please scan or take a picture of them. Alternatively, you can write them in markdown if you prefer.
Only .ipynb files will be graded for your code.
Compress all the files into a single .zip file.
ex) SeungchulLee_20241234_HW01.zip
Do not submit a printed version of your code, as it will not be graded.
$\quad$ a. K-means clustering always gives the same result when cluster centers are randomly initalized.
$\quad$ b. K-means clustering is a suitable method for convex-shaped and different density clusters.
$\quad$ c. It is impossible to use K-means clustering when we don't know the number of clusters.
There are 6 different datasets noted as A,B,C,D,E,F. Each dataset is clustered using two different methods, and one of them is K-means. All results are shown in below figure. Centers for each cluster have been noted as 'x' and distance is measured by Euclidean norm.
You are required to determine which result is more likely generated by K-means clustering.
Dataset A (Write A1 or A2, same in the following question)
Dataset B
Dataset C
Dataset D
Dataset E
Dataset F
There are data points from four classes in 2 dimensional space. Each data point has its own centroid.
import numpy as np
import matplotlib.pyplot as plt
data1 = np.array((np.random.normal(0, 0.2, 100), np.random.normal(0.0, 0.4, 100)))
data2 = np.array((np.random.normal(1.2, 0.3, 100), np.random.normal(1.1, 0.7, 100)))
data3 = np.array((np.random.normal(1.6, 0.2, 100), np.random.normal(1.8, 0.7, 100)))
data4 = np.array((np.random.normal(0.3, 0.1, 100), np.random.normal(0.5, 0.3, 100)))
# Write your code here
c1 = [c1x, c1y]
c2 = [c2x, c2y]
c3 = [c3x, c3y]
c3 = [c4x, c4y]
print('centrid 1 : ', c1)
print('centrid 2 : ', c2)
print('centrid 3 : ', c3)
print('centrid 4 : ', c4)
plt.scatter(label ='class 1', c = 'r', s = 1)
plt.scatter(label ='class 2', c = 'b', s = 1)
plt.scatter(label ='class 3', c = 'g', s = 1)
plt.scatter(label ='class 4', c = 'c', s = 1)
plt.scatter(s = 100, c = 'r')
plt.scatter(s = 100, c = 'b')
plt.scatter(s = 100, c = 'g')
plt.scatter(s = 100, c = 'c')
plt.legend()
plt.show()
# Write your code here
#
# Write your code here
#
In our class, we did k-means clustering on image colors. In some special cases, we can use k-means clustering for noise reduction in color. This is one of images from Piet Mondrian.
img = plt.imread('./image_files/Piet Mondrian.jpg')
plt.axis('off')
plt.imshow(img)
plt.show()
# write your code here
#
We examined K-means to compress an image by reducing the number of colors in class. We can apply the same method to compress an image down to binary colors (i.e., black and white, and it is the maximum compression). When I, as an instructor, prepare for the lecture materials, I often write some equations down on a whiteboard and take a picture of it to put it in lecture slides as shown in Figure 1. However, if I use it as it is, it will take up a lot of ink or toner when it is printed by a number of students. Since I do not want to waste unnecessary resources and do want to maintain our only one Earth as sustainable as possible, I will ask you to convert it to a binary image which prints only black parts.
You can download handwirtten.jpg
import cv2
import matplotlib.pyplot as plt
img = cv2.imread('./image_files/k_means_handwritten.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(8,8))
plt.imshow(img)
plt.axis('off')
plt.show()
## Your code here
#