AI for Mechanical Engineering: Manufacturing


By Dr. Bumsoo Park
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

1. Tool Wear Prediction

  • Taylor's equation for tool wear:

$$VT^n = C, $$
  • where
    • $V$: Cutting speed
    • $T$: Tool life
    • $n, C$: Empirical constants
  • A typical problem involves predicting/evaluating the remaining tool life, given conditions ($V, n, C$)

1.1 Prediction with Data

  • What if we are only given data?

Download data here

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy
import sklearn
In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [ ]:
# Load the .npy file
data_loaded = np.load('/content/drive/MyDrive/AIME_Manufacturing/tool_wear_data.npy')

# Convert the numpy array to a DataFrame
df_loaded = pd.DataFrame(data_loaded, columns=['Cutting Speed (V)', 'Tool Life (T)'])
  • Visualize loaded data
In [ ]:
# Original data
plt.scatter(df_loaded['Cutting Speed (V)'], df_loaded['Tool Life (T)'], label='Experimental Data')

plt.xlabel('Cutting Speed (m/min)')
plt.ylabel('Tool Life (min)')
plt.title('Loaded Data')
plt.legend()
plt.show()
No description has been provided for this image

Option 1

  • We know there is an exponential relationship between $V$ and $T$ (from $VT^n=C$)
  • Using this information, we first linearize in log-log scale

$$ \begin{align*} \\ \log{(VT^n)} &= \log{C}, \\ \log{(V)}+\log{(T^n)} &= \log{C}, \\ \log{(V)}+n\log{(T)} &= \log{C}, \\ \log{(T)} &= -\frac{1}{n}\log{(V)}+ -\frac{1}{n}\log{C} \\ \end{align*} $$


In [ ]:
df_loaded['log_V'] = np.log(df_loaded['Cutting Speed (V)'])
df_loaded['log_T'] = np.log(df_loaded['Tool Life (T)'])

# Plotting the original data and the linearized data
plt.figure(figsize=(12, 5))

# Original data
plt.subplot(1, 2, 1)
plt.scatter(df_loaded['Cutting Speed (V)'], df_loaded['Tool Life (T)'])
plt.xlabel('Cutting Speed')
plt.ylabel('Tool Life')
plt.title('Original Data')

# Linearized data
plt.subplot(1, 2, 2)
plt.scatter(df_loaded['log_V'], df_loaded['log_T'])
plt.xlabel('log(Cutting Speed)')
plt.ylabel('log(Tool Life)')
plt.title('Linearized Data')

plt.show()
No description has been provided for this image
  • Next perform linear regression
In [ ]:
# Perform linear regression on the log-transformed data
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(df_loaded['log_V'], df_loaded['log_T'])

# Calculate n and C from the slope and intercept
n_estimated = -1 / slope
C_estimated = np.exp(intercept * n_estimated)

# Plotting the original data and the linearized data
plt.figure(figsize=(14, 6))

# Original data
plt.subplot(1, 2, 1)
plt.scatter(df_loaded['Cutting Speed (V)'], df_loaded['Tool Life (T)'])
plt.xlabel('Cutting Speed (V)')
plt.ylabel('Tool Life (T)')
plt.title('Original Data')

# Linearized data
plt.subplot(1, 2, 2)
plt.scatter(df_loaded['log_V'], df_loaded['log_T'])
plt.plot(df_loaded['log_V'], slope * df_loaded['log_V'] + intercept, color='red', label='Fitted Line')
plt.xlabel('log(Cutting Speed (V))')
plt.ylabel('log(Tool Life (T))')
plt.title('Linearized Data')
plt.legend()

plt.show()
No description has been provided for this image
  • Then transform back to linear space
In [ ]:
# Original data
plt.scatter(df_loaded['Cutting Speed (V)'], df_loaded['Tool Life (T)'], label='Experimental Data')

# Fitted line
cutting_speeds_fit = np.linspace(50, 150, 500)
tool_life_fit = (C_estimated / cutting_speeds_fit) ** (1 / n_estimated)
plt.plot(cutting_speeds_fit, tool_life_fit, color='red', label='Fitted Line')

plt.xlabel('Cutting Speed (V)')
plt.ylabel('Tool Life (T)')
plt.title('Fitted Line in Linear Space')
plt.legend()
plt.show()

print(f"Estimated n: {n_estimated}")
print(f"Estimated C: {C_estimated}")
No description has been provided for this image
Estimated n: 0.24571963417534542
Estimated C: 283.6716060346621

Option 2

  • Use Scipy's optimize.curve_fit() function for optimization
In [ ]:
# Define Taylor's Tool Wear Equation for fitting
def taylors_equation(V, n, C):
    return (C / V) ** (1 / n)

# Initial guess for the parameters
initial_guess = [0.5, 100]

# Perform curve fitting: inputs(function, x, y, initial guess)
popt, pcov = scipy.optimize.curve_fit(taylors_equation, df_loaded['Cutting Speed (V)'],
                       df_loaded['Tool Life (T)'], p0=initial_guess)

# Extract the fitted parameters
n_fitted, C_fitted = popt

# Original data
plt.scatter(df_loaded['Cutting Speed (V)'], df_loaded['Tool Life (T)'], label='Experimental Data')

# Fitted line
cutting_speeds_fit = np.linspace(50, 150, 500)
tool_life_fit = (C_fitted / cutting_speeds_fit) ** (1 / n_fitted)
plt.plot(cutting_speeds_fit, tool_life_fit, color='red', label='Fitted Line')

plt.xlabel('Cutting Speed (m/min)')
plt.ylabel('Tool Life (min)')
plt.title('Fitted Line in Linear Space')
plt.legend()
plt.show()

print(f"Estimated n: {n_fitted}")
print(f"Estimated C: {C_fitted}")
No description has been provided for this image
Estimated n: 0.2541470894413146
Estimated C: 308.90136308585073

2. Rotary Machinery Signal Analysis

  • Assume we have time series vibration data from a rotary machine

(data from CWRU dataset: https://engineering.case.edu/bearingdatacenter/download-data-file)

  • 3 types of signals
    • Normal (Download normal data here )

    • Inner fault (Download inner fault data here)

    • Outer fault (Download outer fault data here )

In [ ]:
# Load the data from the .npy files
inner = np.load('/content/drive/MyDrive/AIME_Manufacturing/inner_split.npy')
inner = inner.reshape(1800, -1)
outer = np.load('/content/drive/MyDrive/AIME_Manufacturing/outer_split.npy')
outer = outer.reshape(1800, -1)
normal = np.load('/content/drive/MyDrive/AIME_Manufacturing/normal_split.npy')
normal = normal.reshape(1800, -1)

# Print the shapes or some basic information about the data to confirm it's loaded correctly
print("Inner Split Data Shape:", inner.shape)
print("Normal Split Data Shape:", normal.shape)
print("Outer Split Data Shape:", outer.shape)
Inner Split Data Shape: (1800, 527)
Normal Split Data Shape: (1800, 527)
Outer Split Data Shape: (1800, 527)
  • Plot the signals
In [ ]:
classes = ['Normal', 'Inner_fault', 'Outer_fault']
data = [normal, inner, outer]

for state, name in zip(data, classes):
    print('{} data shape: {}'.format(name, state.shape))
plt.figure(figsize = (9, 6))
for i in range(3):
    plt.subplot(3,1,i+1)
    plt.title('Class: {}'.format(classes[i]), fontsize = 15)
    plt.plot(data[i][0])
    plt.ylim([-4,4])

plt.tight_layout()
plt.show()
Normal data shape: (1800, 527)
Inner_fault data shape: (1800, 527)
Outer_fault data shape: (1800, 527)
No description has been provided for this image

2.1. Time Series Classification

  • Recall SVM

    • Previously, the goal was to find the classification boundary(ies), given the features (data)
  • For the data above, we need to determine features

  • Types of features we can use for time series

    • Basic features (e.g. mean, max, min, standard deviation)
    • Derived features / features based on domain knowledge (e.g. Fourier transform, autocorreclation)
    • Advanced statistical features (e.g. Kurtosis, skewness)

Basic Feature Selection

  • We will start by using the mean and median to represent all signals
  • $n=2$ features ($\mathbb{R}^{500} → \mathbb{R}^{2}$)
In [ ]:
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
In [ ]:
all_data = np.vstack((normal, inner, outer))
all_data = all_data / np.max(all_data)
In [ ]:
def simple_features(trainX):
    n_data = len(trainX)
    reduced = np.zeros((n_data,2))
    for i in range(n_data):
        reduced[i,0] = np.median(trainX[i,:])
        reduced[i,1] = np.mean(trainX[i,:])

    return reduced

x = simple_features(all_data)
y = [0]*len(normal) + [1]*len(inner) + [2]*len(outer)
y = np.array(y)[:, np.newaxis]

scaler = StandardScaler()
train_X_scaled = scaler.fit_transform(x)
train_x, test_x, train_y, test_y = train_test_split(train_X_scaled,
                                                    y,
                                                    test_size = 0.2,
                                                    random_state = 42)
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score


# Train SVM classifier
svm_classifier = SVC(kernel='linear', probability=True)  # Added probability for contour plotting
svm_classifier.fit(train_x, train_y.ravel())

# Predict on the test set
pred_y = svm_classifier.predict(test_x)

# Evaluate the classifier
print("Accuracy:", accuracy_score(test_y, pred_y))
print("Classification Report:")
print(classification_report(test_y, pred_y))

# Plotting decision boundaries
plt.figure(figsize=(10, 6))
plt.scatter(test_x[:, 0], test_x[:, 1], c=test_y.ravel(), s=30, cmap=plt.cm.Paired, edgecolors='k', label='Test Points')

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = test_x[:, 0].min() - 1, test_x[:, 0].max() + 1
y_min, y_max = test_x[:, 1].min() - 1, test_x[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))

Z = svm_classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.Paired)

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('SVM Decision Boundary with Test Points')
plt.legend()
plt.show()
Accuracy: 0.7814814814814814
Classification Report:
              precision    recall  f1-score   support

           0       0.71      0.72      0.71       390
           1       0.76      0.73      0.74       341
           2       0.89      0.90      0.89       349

    accuracy                           0.78      1080
   macro avg       0.78      0.78      0.78      1080
weighted avg       0.78      0.78      0.78      1080

No description has been provided for this image

Feature Selection with Domain Knowledge (Frequency Domain)

  • $n=2$ features ($\mathbb{R}^{500} → \mathbb{R}^{2}$)

  • Idea is to view dominant frequencies of signal (with Fourier Transform)

In [ ]:
# Example transformation (visualization)

signal1 = all_data[0,:]
signal2 = all_data[1800,:]
signal3 = all_data[3600,:]

# Compute FFT
fft_signal1 = np.fft.fft(signal1)
fft_signal2 = np.fft.fft(signal2)
fft_signal3 = np.fft.fft(signal3)

# Compute frequencies
freqs = np.fft.fftfreq(len(signal1))

# Plot the FFT results
fig, axs = plt.subplots(3, 1, figsize=(9, 6))

axs[0].plot(freqs, np.abs(fft_signal1))
axs[0].set_title('FFT of Signal 1')
axs[0].set_xlabel('Frequency')
axs[0].set_ylabel('Magnitude')

axs[1].plot(freqs, np.abs(fft_signal2))
axs[1].set_title('FFT of Signal 2')
axs[1].set_xlabel('Frequency')
axs[1].set_ylabel('Magnitude')

axs[2].plot(freqs, np.abs(fft_signal3))
axs[2].set_title('FFT of Signal 3')
axs[2].set_xlabel('Frequency')
axs[2].set_ylabel('Magnitude')

plt.tight_layout()
plt.show()