Fourier Neural Operator
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST
Table of Contents
1. Fourier Neural Operator¶
1.1. Motivation: Response to a General Input¶
A central question in engineering and physics is: given a system and an arbitrary input, what is the output? For many physical systems, this question can be answered systematically if the system satisfies two fundamental properties: linearity and time-invariance.
Consider a simple first-order ODE as a representative example:
$$ \dot{y} + 5y = x(t), \quad y(0) = 0 $$
For a Linear Time-Invariant (LTI) system, the response to any general input $x(t)$ can be computed from the impulse response $h(t)$ alone:
$$ y(t) = h(t) * x(t) $$
This is a remarkably powerful result. Once $h(t)$ is known from a single experiment (applying a unit impulse $\delta(t)$ and recording the response), the output for any input is determined by convolution. The LTI system is therefore a mapping from one function to another — it is, in the precise mathematical sense, an operator.
This perspective motivates the central question of neural operator learning: can we learn such an operator from data, without assuming linearity or time-invariance?
1.2. Convolution as an Operator¶
To understand why convolution arises naturally, consider decomposing any continuous signal $x(t)$ into a superposition of scaled and shifted impulses. A continuous signal can be approximated as:
$$ x(t) = \lim_{\Delta \to 0} \sum_k x(k\Delta)\, p(t - k\Delta)\, \Delta $$
where $p(t)$ is a rectangular pulse of width $\Delta$ and height $1/\Delta$. As $\Delta \to 0$, $p(t) \to \delta(t)$, and this sum converges to:
$$ x(t) = \int_{-\infty}^{\infty} x(\tau)\, \delta(t - \tau)\, d\tau $$
By linearity, each impulse $x(\tau)\delta(t - \tau)$ produces a response $x(\tau)h(t - \tau)$. By time-invariance, the shape of the response to a delayed impulse is simply the delayed impulse response. Summing all contributions gives:
$$ y(t) = \int_{-\infty}^{\infty} x(\tau)\, h(t - \tau)\, d\tau = h(t) * x(t) $$
This is the convolution integral. The LTI system is therefore a mapping in functional space: it takes the input function $x(t)$ and returns the output function $y(t)$. It can be viewed as an operator $\mathcal{G}$ such that $y = \mathcal{G}(x)$, where $\mathcal{G}$ is completely characterized by its kernel $h$.
1.3. Kernel Integral Operator¶
The convolution operator is powerful but restrictive: it applies only to LTI systems. Many physical systems of interest are nonlinear, spatially inhomogeneous, or time-varying, and convolution cannot represent them exactly.
The natural generalization is to replace the fixed impulse response $h(t - \tau)$ — which depends only on the difference $t - \tau$ — with a flexible two-argument kernel $k(t, \tau)$:
$$ y(t) = \int_{-\infty}^{\infty} h(t-\tau)\, x(\tau)\, d\tau \quad \longrightarrow \quad y(t) = \int_{-\infty}^{\infty} k(t, \tau)\, x(\tau)\, d\tau $$
The two-argument kernel $k(t, \tau)$ relaxes the LTI constraint entirely: the response at time $t$ can now depend on the source at $\tau$ in an arbitrary, non-shift-invariant way. The standard convolution is recovered as the special case $k(t, \tau) = h(t - \tau)$.
Using spatial notation, the kernel integral operator is defined as:
$$ v'(x) = \int_\Omega k(x, y)\, v(y)\, dy $$
It maps an input function $v : \Omega \to \mathbb{R}$ to an output function $v' : \Omega \to \mathbb{R}$. The kernel $k(x, y)$ determines how the value at location $y$ influences the output at location $x$. When $k(x, y) = k(x - y)$, this reduces to convolution. For a general $k(x, y)$, the operator can represent non-local, non-translation-invariant mappings.
The key insight for machine learning is that $k(t, \tau)$ can be learned from data. If we parameterize $k$ by a neural network, we can train it to approximate any target operator from input-output pairs.
1.4. Multi-layer Kernel Integral Operators¶
A single kernel integral operator is a linear map between function spaces. To represent highly nonlinear mappings — such as the solution operator of a nonlinear PDE — a single linear layer is insufficient, just as a single linear layer is insufficient for learning nonlinear functions between finite-dimensional spaces.
The standard remedy is to introduce a pointwise nonlinear activation $\sigma(\cdot)$ after the integral step:
$$ v'(x) = \sigma\!\left( \int_\Omega k(x, y)\, v(y)\, dy \right) $$
and to stack multiple such layers in sequence:
$$ x(t) \;\longrightarrow\; \text{Operator}_1 \;\longrightarrow\; \text{Operator}_2 \;\longrightarrow\; \text{Operator}_3 \;\longrightarrow\; y(t) $$
Each layer applies a kernel integral followed by a pointwise nonlinearity. The composition of multiple such layers gives the network the expressiveness needed to approximate general nonlinear operators between function spaces. This is directly analogous to how deep neural networks approximate nonlinear functions between finite-dimensional spaces by composing linear maps with nonlinearities.
The theoretical justification for this approach is the universal approximation theorem for operators, which guarantees that a sufficiently deep and wide network of this form can approximate any continuous operator to arbitrary accuracy.
1.5. Bridge to the Frequency Domain¶
While the kernel integral operator framework is mathematically elegant, directly computing $\int k(x, y) v(y) dy$ at every spatial location $x$ is computationally expensive. For a discrete grid of $n$ points, naively evaluating all pairwise interactions requires $O(n^2)$ operations, which becomes prohibitive for large $n$.
The key computational insight comes from the convolution theorem: convolution in the spatial domain corresponds to pointwise multiplication in the frequency domain.
The Fourier transform decomposes a signal into sinusoidal components:
$$ X(j\omega) = \int_{-\infty}^{\infty} x(t)\, e^{-j\omega t}\, dt \qquad \text{(analysis)} $$
$$ x(t) = \frac{1}{2\pi} \int_{-\infty}^{\infty} X(j\omega)\, e^{j\omega t}\, d\omega \qquad \text{(synthesis)} $$
The convolution theorem states:
$$ x(t) * h(t) \quad \leftrightarrow \quad X(j\omega)\, H(j\omega) $$
This is a profound simplification: the complex operation of convolution in the time domain becomes simple pointwise multiplication in the frequency domain. Furthermore, the Fast Fourier Transform (FFT) computes this transform in $O(n \log n)$ operations, making frequency-domain computation far more efficient than direct spatial convolution.
The Fourier basis functions $e^{j\omega t}$ are the eigenfunctions of LTI systems: if $e^{j\omega t}$ is the input, the output is $H(j\omega) e^{j\omega t}$, where $H(j\omega)$ is the eigenvalue. This eigenfunction property is what makes the frequency domain so natural for analyzing and learning LTI-like operators.
1.6. Learning the Kernel in the Frequency Domain¶
Returning to the kernel integral operator, the convolution special case can be written as:
$$ v'(x) = \mathcal{F}^{-1}\!\left(\mathcal{F}(k) \cdot \mathcal{F}(v)\right)(x) $$
where $\mathcal{F}(k)$ is the Fourier transform of the kernel $k$. Instead of learning the kernel $k$ in the spatial domain and then transforming it, we can directly learn a trainable weight matrix $R$ in the frequency domain:
$$ \mathcal{F}(k) \;\longrightarrow\; R $$
so the operator becomes:
$$ v'(x) = \mathcal{F}^{-1}\!\left(R \cdot \mathcal{F}(v)\right)(x) $$
$R$ is a complex-valued matrix learned by the neural network, acting as a frequency-domain filter. Each entry of $R$ controls how much of a particular frequency component is passed through or amplified. Standard neural network training (backpropagation through the FFT and inverse FFT) can learn $R$ directly from data.
An important practical choice is frequency truncation: rather than learning $R$ for all $n$ frequency components, we retain only the lowest $k_{\max}$ modes and set the rest to zero. This truncation serves two purposes. First, it dramatically reduces the number of trainable parameters. Second, it acts as a regularizer that prevents overfitting to high-frequency noise. The truncated operator can still capture the smooth, large-scale features of most PDE solutions, which tend to be dominated by low-frequency content.
1.7. The Fourier Layer¶
The purely spectral path described above captures global, long-range interactions through the frequency domain. However, physical systems also exhibit local behavior — the value at a point is influenced strongly by its immediate neighbors. To handle both scales simultaneously, each Fourier layer adds a local path $Wv(x)$ alongside the global spectral path:
$$ v'(x) = \sigma\!\left( \int_\Omega k(x,y)\, v(y)\, dy + W v(x) \right) $$
In the Fourier domain, with the convolution kernel replaced by the learnable weight $R$, this becomes:
$$ v'(x) = \sigma\!\left( \mathcal{F}^{-1}(R \cdot \mathcal{F}(v))(x) + W v(x) \right) $$
The two paths play complementary roles:
- Global path $\mathcal{F}^{-1}(R \cdot \mathcal{F}(v))$: captures long-range dependencies and global periodic patterns by operating in the frequency domain. Each frequency mode interacts with every spatial location simultaneously.
- Local path $Wv(x)$: a pointwise linear transformation applied directly in the spatial domain. It preserves and passes local spatial information that might be lost or attenuated by the low-pass truncation in the spectral path.
The combination of these two paths ensures that the Fourier layer is sensitive to both global structure and local detail. This is the complete Fourier layer, the fundamental building block of FNO.
1.8. Lifting and Projection Layers¶
The input function $a(x)$ to a PDE operator (e.g., a coefficient field or initial condition) typically has a small number of channels — often just one scalar value per spatial location. The Fourier layers, however, operate in a much higher-dimensional latent space of $d_v$ channels, which is necessary to give the network sufficient representational capacity.
Two additional fully connected layers are therefore placed at the beginning and end of the network:
Lifting layer $P$ maps the low-dimensional input $a(x) \in \mathbb{R}^{d_a}$ to the high-dimensional latent representation $v_0(x) \in \mathbb{R}^{d_v}$:
$$ v_0(x) = P(a(x)) $$
where $P$ is a pointwise fully connected network (typically a single linear layer). This operation is applied independently at each spatial location, so it does not mix information across space — it only lifts the channel dimension.
Projection layer $Q$ maps the final latent representation $v_T(x) \in \mathbb{R}^{d_v}$ back to the target output $u_\theta(x) \in \mathbb{R}^{d_u}$:
$$ u_\theta(x) = Q(v_T(x)) $$
where $Q$ is another pointwise fully connected network, typically a two-layer MLP with a nonlinear activation between the layers. The nonlinearity in $Q$ provides additional expressive power at the output stage, which is important because the Fourier layers themselves are linear in the spectral domain (the nonlinearity in each Fourier layer is applied after the spectral and local paths are combined).
The full pipeline is:
$$ a(x) \;\xrightarrow{P}\; v_0(x) \;\xrightarrow{\text{Fourier layers}}\; v_T(x) \;\xrightarrow{Q}\; u_\theta(x) $$
1.9. Full FNO Architecture¶
Assembling all components, the complete FNO consists of:
- A lifting layer $P$ that maps the input function to a high-dimensional latent space.
- A sequence of $T$ Fourier layers, each consisting of a global spectral path and a local linear path, combined and passed through a nonlinear activation.
- A projection layer $Q$ that maps the final latent representation back to the output function.
$$ a(x) \;\xrightarrow{P}\; v_0(x) \;\xrightarrow{L_1}\; v_1(x) \;\xrightarrow{L_2}\; \cdots \;\xrightarrow{L_T}\; v_T(x) \;\xrightarrow{Q}\; u_\theta(x) $$
where each Fourier layer $L_t$ computes:
$$ v_{t+1}(x) = \sigma\!\left( \mathcal{F}^{-1}(R_t \cdot \mathcal{F}(v_t))(x) + W_t\, v_t(x) \right) $$
The trainable parameters are the spectral weight matrices $\{R_t\}$, the local weight matrices $\{W_t\}$, and the parameters of $P$ and $Q$.
The full architecture learns the solution operator:
$$ \mathcal{G}_\theta : a(x) \mapsto u(x) $$
mapping an input function (e.g., a PDE coefficient field, initial condition, or forcing term) to the corresponding output function (e.g., the PDE solution field), without re-solving the PDE for each new input. Once trained, a single forward pass through the FNO produces the solution for a new input in milliseconds, compared to the seconds or minutes required by traditional numerical solvers.
A key property of FNO is discretization invariance: because the Fourier layers operate on the spectral coefficients rather than the grid values directly, the same trained network can be evaluated at different spatial resolutions without retraining. This is a significant practical advantage over standard convolutional networks, which are tied to a fixed grid resolution.
In summary, FNO operates in the Frequency domain, Convolution $\rightarrow$ Multiplication. To reflect this, the FNO model is built as follows:
(1) Lifting Layer
Feature Expension: Lifts input function $a(x)$ to a higher dimensional space
e.g., initial/boundary value fields such as vorticity field or thermal conductivity distributions
Pointwise Transformation: Applies a fully connected neural network locally at each spatial point
(2) Fourier Layers
Global Convolution via Fourier Transform: Fourier layers perform convolution in the frequency domain
Low-Frequency Mode Training: Fixed number of low-frequency Fourier modes are learned
Inverse Transform for Localization: Inverse Fourier transform is applied to return the signal to physical space
(3) Projection Layer
Dimensionality Reduction: Projects lifted input function back to the target dimensional space
Final Output Reconstruction: Converts the trained high-dimensional representation into the final predicted physical quantities
Problem properties
$$ \omega = 8 \operatorname{mm}, \quad h = 0.001 \operatorname{mm} $$
- Soft material
$$E = 100 \operatorname{MPa}, \quad \nu = 0.33, \quad \text{Failure strain} = 0.4$$
- Stiff material
$$E = 1000 \operatorname{MPa}, \quad \nu = 0.33, \quad \text{Failure strain} = 0.04$$
- Stress Fields for 2D Composities & boundary conditions
$$ \begin{align*} \frac{\partial \sigma_{xx}}{\partial x} + \frac{\partial \sigma_{xy}}{\partial y} + F_x &= 0\\\\ \frac{\partial \sigma_{yy}}{\partial y} + \frac{\partial \sigma_{xy}}{\partial x} + F_y &= 0\\\\ \end{align*} $$
$\quad\quad$ where $F_x$ and $F_y$ are the body forces in $x$ and $y$ directions, respectively
- Two Dirichlet boundary conditions at $x = 0,\, y = 0\; (B.C.①, B.C.②)$:
$$ v(x,y) = 0 \qquad \text{at} \quad y = 0\\\\ u(x,y) = 0 \qquad \text{at} \quad x = 0 $$
- Two free boundary conditions at $y = 1\; (B.C.③)$:
$$ \sigma_{yy} = 0,\quad \sigma_{yx} = 0 $$
- Free boundary condition and in-plane force boundary condition at $x = 1\; (B.C.④)$:
$$ \sigma_{xx} = F_x,\quad \sigma_{xy} = 0 $$
2.2. Solve the Stress Fields for 2D Composities using FNO¶
Load Library
import torch
import torch.nn as nn
import torch.nn.functional as F
import random
import numpy as np
import scipy.io as sio
import matplotlib.pyplot as plt
%matplotlib inline
from torch.utils import data
CUDA
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
Random Seed
def random_seed(seed = 2021):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Define Conditions
To obtain the training data, we performed a CFD simulation, representing the ground truth as images on a $64\times 64$ grid.
Download
train_mesh_pts = np.load('./SF/train_mesh_pts.npy')
test_mesh_pts = np.load('./SF/test_mesh_pts.npy')
train_a = np.load('./SF/train_a.npy')
test_a = np.load('./SF/test_a.npy')
train_gt = np.load('./SF/train_gt.npy')
test_gt = np.load('./SF/test_gt.npy')
print("train XY image shape: {}".format(train_mesh_pts[0].shape))
print("test XY image shape: {}".format(test_mesh_pts[0].shape))
print("train condition shape: {}".format(train_a.shape))
print("test condition shape: {}".format(test_a.shape))
print("train gt shape: {}".format(train_gt.shape))
print("test gt shape: {}".format(test_gt.shape))
Data Batch Generator
A = train_a
Y = train_mesh_pts
GT = train_gt
print('Loading conditions :{}'.format(A.shape))
print('desired output locs :{}'.format(Y.shape))
print('Ground truths :{}'.format(GT.shape))
A_test = test_a
Y_test = test_mesh_pts
GT_test = test_gt
# Data generator
class DataBatchGenerator(data.Dataset):
def __init__(self, a, y, gt, batch_isze=16):
'Initialization'
self.a = torch.tensor(a, dtype=torch.float32)
self.y = torch.tensor(y, dtype=torch.float32)
self.gt = torch.tensor(gt, dtype=torch.float32)
self.N = a.shape[0]
self.batch_size = batch_isze
def __len__(self):
return (self.N // self.batch_size) + 1
def __getitem__(self, index):
idx = torch.randint(low = 0, high = self.a.shape[0], size = (self.batch_size,))
a = self.a[idx]
y = self.y[idx]
gt = self.gt[idx]
return a, y, gt
batch_size = 16
train_dataset = DataBatchGenerator(A, Y, GT)
test_dataset = DataBatchGenerator(A_test, Y_test, GT_test)
Define FNO Structure
class SpectralConv2d(nn.Module):
def __init__(self, in_channels, out_channels, modes1, modes2):
super(SpectralConv2d, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.modes1 = modes1 # 사용할 Fourier 모드 수 (세로)
self.modes2 = modes2 # 사용할 Fourier 모드 수 (가로)
self.scale = 1 / (in_channels * out_channels)
# 복소수 weight parameters
self.weights1 = nn.Parameter(self.scale * torch.rand(in_channels, out_channels, modes1, modes2, dtype=torch.cfloat))
self.weights2 = nn.Parameter(self.scale * torch.rand(in_channels, out_channels, modes1, modes2, dtype=torch.cfloat))
def compl_mul2d(self, input, weights):
# input: (batch, in_channel, x, y), weights: (in_channel, out_channel, x, y)
return torch.einsum("bixy,ioxy->boxy", input, weights)
def forward(self, x):
batchsize = x.shape[0]
'''
x_ft = torch.fft.rfft2(x) # ℱ: 주파수 영역 변환
out_ft[:, :, :m1, :m2] = compl_mul2d(...) # R: 주파수 영역 곱셈 (앞쪽 모드)
out_ft[:, :, -m1:, :m2] = compl_mul2d(...) # R: 주파수 영역 곱셈 (뒤쪽 모드)
x = torch.fft.irfft2(out_ft, s=(H, W)) # ℱ⁻¹: 공간으로 역변환
'''
# Compute Fourier coefficients
x_ft = torch.fft.rfft2(x) # (batch, in_channels, H, W//2+1), complex tensor
out_ft = torch.zeros(batchsize, self.out_channels, x.size(-2), x.size(-1)//2+1, dtype=torch.cfloat, device=x.device)
out_ft[:, :, :self.modes1, :self.modes2] = self.compl_mul2d(x_ft[:, :, :self.modes1, :self.modes2], self.weights1)
out_ft[:, :, -self.modes1:, :self.modes2] = self.compl_mul2d(x_ft[:, :, -self.modes1:, :self.modes2], self.weights2)
x = torch.fft.irfft2(out_ft, s=(x.size(-2), x.size(-1)))
return x
class FNO2d(nn.Module):
def __init__(self, modes1, modes2, width, out_channels):
super(FNO2d, self).__init__()
self.modes1 = modes1
self.modes2 = modes2
self.width = width
self.out_channels = out_channels # record_steps
self.padding = 9 # pad if non-periodic
self.fc0 = nn.Linear(3, self.width) # input channel = 3: (a, x, y)
self.conv0 = SpectralConv2d(self.width, self.width, self.modes1, self.modes2)
self.conv1 = SpectralConv2d(self.width, self.width, self.modes1, self.modes2)
self.conv2 = SpectralConv2d(self.width, self.width, self.modes1, self.modes2)
self.conv3 = SpectralConv2d(self.width, self.width, self.modes1, self.modes2)
self.w0 = nn.Conv2d(self.width, self.width, 1)
self.w1 = nn.Conv2d(self.width, self.width, 1)
self.w2 = nn.Conv2d(self.width, self.width, 1)
self.w3 = nn.Conv2d(self.width, self.width, 1)
self.fc1 = nn.Linear(self.width, 128)
self.fc2 = nn.Linear(128, self.out_channels) # output channel = record_steps
def forward(self, x):
# x: (batch, s, s, 3)
# grid 정보는 이미 x에 포함되어 있음
x = self.fc0(x) # (batch, s, s, width)
x = x.permute(0, 3, 1, 2) # (batch, width, s, s)
x = F.pad(x, [0, self.padding, 0, self.padding])
x1 = self.conv0(x)
x2 = self.w0(x)
x = x1 + x2
x = F.gelu(x)
x1 = self.conv1(x)
x2 = self.w1(x)
x = x1 + x2
x = F.gelu(x)
x1 = self.conv2(x)
x2 = self.w2(x)
x = x1 + x2
x = F.gelu(x)
x1 = self.conv3(x)
x2 = self.w3(x)
x = x1 + x2
x = x[:, :, :-self.padding, :-self.padding]
x = self.fc1(x.permute(0,2,3,1))
x = F.gelu(x)
x = self.fc2(x)
return x # (batch, s, s, out_channels)
modes1 = 12
modes2 = 12
width = 32
out_channels = 1
model = FNO2d(modes1, modes2, width, out_channels).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
Loss_func = nn.MSELoss(reduction = 'mean')
Define Functions
def requires_grad(x):
return torch.tensor(x, requires_grad = True).to(device)
Train
(Again, this process may require a significant amount of time; therefore, we recommend not running it. However, we have provided a pretrained model for your convenience.)
epoch = 0
train_iter = iter(train_dataset)
while epoch < 3001:
train_batch = next(train_iter)
a_batch, y_batch, gt_batch = train_batch
a_batch = torch.cat((a_batch, y_batch), -1)
a_batch, gt_batch = requires_grad(a_batch), requires_grad(gt_batch)
s = model(a_batch)
loss = 100 * Loss_func(s.reshape(batch_size, -1), gt_batch.reshape(batch_size, -1))
optimizer.zero_grad()
loss.backward()
optimizer.step()
if epoch % 500 == 0:
with torch.no_grad():
print('Epoch: {} Loss: {:.4f}'.format(epoch, loss.item()))
epoch += 1
test_idx = 10
with torch.no_grad():
model_test = torch.load('./SF.pt')
a_test = A_test[test_idx:test_idx+1]
y_test = Y_test[test_idx:test_idx+1]
gt_test = GT_test[test_idx:test_idx+1]
a_test = np.concatenate((a_test, y_test), -1)
a_test = requires_grad(a_test)
s_test = model_test(a_test.float())
plt.figure(figsize=(9, 2.4))
plt.subplot(1, 3, 1)
plt.imshow(a_test.detach().cpu().numpy()[0, :, :, 0], cmap='gray')
plt.title('Modulus of elasticity (E)')
plt.colorbar()
# plt.show()
plt.subplot(1, 3, 2)
plt.imshow(s_test.detach().cpu().numpy()[0], vmin = np.min(gt_test), vmax = np.max(gt_test), cmap='jet')
plt.colorbar()
plt.title('Prediction ($\sigma$)')
plt.subplot(1, 3, 3)
plt.imshow(gt_test[0], cmap='jet')
plt.colorbar()
plt.title('Ground Truth ($\sigma$)')
plt.tight_layout()
plt.show()
Lab 3. Navier-Stokes with Vorticity¶
We will learn how to implement FNO with the following example: Navier-Stokes with Vorticity.
3.1. Problem Setup¶
Numerical Solution
- Numerical solution of this problem is illustrated in below figures
- Initial vorticity and vorticities at time $t = 1.0, 1.5, 2.5, 3.0$, respectively
- Solve this problem using FNO and then compare with a numerical solution
- 2D Navier-Stokes Equations with vorticity & boundary/initial conditions
$$ \begin{align*} \frac{\partial_t w(x,t)}{\partial t} + u(x,t) \frac{\partial w(x,t)}{\partial x} + v(x,t) \frac{\partial w(x,t)}{\partial y} &= \nu \left(\frac{\partial^2 w(x,t)}{\partial x^2} + \frac{\partial^2 w(x,t)}{\partial y^2} \right) + f(x)\\\\ \frac{\partial u(x,t)}{\partial x} + \frac{\partial u(x,t)}{\partial y} &= 0\\\\ w(x,0) &= w_0(x) \end{align*} $$
$\quad\quad$ where $w_0 \sim \mathcal{N}\left(0,\, \frac{73}{2}(-\Delta + 49I)^{-2.5} \right) $ and $f(x) = 0.1 \left( \sin\left(2\pi(x_1 + x_2)\right) + \cos\left(2\pi(x_1 + x_2)\right) \right) $
- Four periodic boundary conditions at four edges of domain
$$ \begin{align*} w(0,y,t) &= w(1,y,t)\\\\ w(x,0,t) &= w(x,1,t)\\\\ u(0,y,t) &= u(1,y,t)\\\\ u(x,0,t) &= u(x,1,t)\\\\ \end{align*} $$
$\quad\quad$ which means taht each boundary is continuously connected to its opposite side
3.2. Solve the Navier-Stokes Problem using FNO¶
Load Library
import torch
import torch.nn as nn
import torch.nn.functional as F
import random
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from torch.utils import data
CUDA
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
Random Seed
def random_seed(seed = 2021):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Define Conditions
To obtain the training data, we performed a CFD simulation, representing the ground truth as images on a $256\times 256$ grid.
Download
train_mesh_pts = np.load('./train_mesh_pts.npy') # numpy array, shape: (N_train, s, s, 2)
test_mesh_pts = np.load('./test_mesh_pts.npy') # numpy array, shape: (N_test, s, s, 2)
train_a = np.load('./train_a.npy') # numpy array, shape: (N_train, s, s, 1)
test_a = np.load('./test_a.npy') # numpy array, shape: (N_test, s, s, 1)
train_gt = np.load('./train_gt.npy') # numpy array, shape: (N_train, s, s, record_steps)
test_gt = np.load('./test_gt.npy') # numpy array, shape: (N_test, s, s, record_steps)
record_time = np.load('./t.npy')
print("train XY image shape: {}".format(train_mesh_pts[0].shape))
print("test XY image shape: {}".format(test_mesh_pts[0].shape))
print("train condition shape: {}".format(train_a.shape))
print("test condition shape: {}".format(test_a.shape))
print("train gt shape: {}".format(train_gt.shape))
print("test gt shape: {}".format(test_gt.shape))
Data Batch Generator
A = train_a
Y = train_mesh_pts
GT = train_gt
print('Loading conditions :{}'.format(A.shape))
print('desired output locs :{}'.format(Y.shape))
print('Ground truths :{}'.format(GT.shape))
A_test = test_a
Y_test = test_mesh_pts
GT_test = test_gt
# Data generator
class DataBatchGenerator(data.Dataset):
def __init__(self, a, y, gt, batch_isze=4):
'Initialization'
self.a = torch.tensor(a, dtype=torch.float32)
self.y = torch.tensor(y, dtype=torch.float32)
self.gt = torch.tensor(gt, dtype=torch.float32)
self.N = a.shape[0]
self.batch_size = batch_isze
def __len__(self):
return (self.N // self.batch_size) + 1
def __getitem__(self, index):
idx = torch.randint(low = 0, high = self.a.shape[0], size = (self.batch_size,))
a = self.a[idx]
y = self.y[idx]
gt = self.gt[idx]
return a, y, gt
batch_size = 4
train_dataset = DataBatchGenerator(A, Y, GT)
test_dataset = DataBatchGenerator(A_test, Y_test, GT_test)
Define FNO Structure
class SpectralConv2d(nn.Module):
def __init__(self, in_channels, out_channels, modes1, modes2):
super(SpectralConv2d, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.modes1 = modes1 # 사용할 Fourier 모드 수 (세로)
self.modes2 = modes2 # (가로)
self.scale = 1 / (in_channels * out_channels)
# 복소수 weight parameters
self.weights1 = nn.Parameter(self.scale * torch.rand(in_channels, out_channels, modes1, modes2, dtype=torch.cfloat))
self.weights2 = nn.Parameter(self.scale * torch.rand(in_channels, out_channels, modes1, modes2, dtype=torch.cfloat))
def compl_mul2d(self, input, weights):
# input: (batch, in_channel, x, y), weights: (in_channel, out_channel, x, y)
return torch.einsum("bixy,ioxy->boxy", input, weights)
def forward(self, x):
batchsize = x.shape[0]
'''
x_ft = torch.fft.rfft2(x) # ℱ: 주파수 영역 변환
out_ft[:, :, :m1, :m2] = compl_mul2d(...) # R: 주파수 영역 곱셈 (앞쪽 모드)
out_ft[:, :, -m1:, :m2] = compl_mul2d(...) # R: 주파수 영역 곱셈 (뒤쪽 모드)
x = torch.fft.irfft2(out_ft, s=(H, W)) # ℱ⁻¹: 공간으로 역변환
'''
# Compute Fourier coefficients
x_ft = torch.fft.rfft2(x) # (batch, in_channels, H, W//2+1), complex tensor
out_ft = torch.zeros(batchsize, self.out_channels, x.size(-2), x.size(-1)//2+1, dtype=torch.cfloat, device=x.device)
out_ft[:, :, :self.modes1, :self.modes2] = self.compl_mul2d(x_ft[:, :, :self.modes1, :self.modes2], self.weights1)
out_ft[:, :, -self.modes1:, :self.modes2] = self.compl_mul2d(x_ft[:, :, -self.modes1:, :self.modes2], self.weights2)
x = torch.fft.irfft2(out_ft, s=(x.size(-2), x.size(-1)))
return x
class FNO2d(nn.Module):
def __init__(self, modes1, modes2, width, out_channels):
super(FNO2d, self).__init__()
self.modes1 = modes1
self.modes2 = modes2
self.width = width
self.out_channels = out_channels # record_steps
self.padding = 9 # pad if non-periodic
self.fc0 = nn.Linear(3, self.width) # input channel = 3: (a, x, y)
self.conv0 = SpectralConv2d(self.width, self.width, self.modes1, self.modes2)
self.conv1 = SpectralConv2d(self.width, self.width, self.modes1, self.modes2)
self.conv2 = SpectralConv2d(self.width, self.width, self.modes1, self.modes2)
self.conv3 = SpectralConv2d(self.width, self.width, self.modes1, self.modes2)
self.w0 = nn.Conv2d(self.width, self.width, 1)
self.w1 = nn.Conv2d(self.width, self.width, 1)
self.w2 = nn.Conv2d(self.width, self.width, 1)
self.w3 = nn.Conv2d(self.width, self.width, 1)
self.fc1 = nn.Linear(self.width, 128)
self.fc2 = nn.Linear(128, self.out_channels) # output channel = record_steps
def forward(self, x):
# x: (batch, s, s, 3)
# grid 정보는 이미 x에 포함되어 있음
x = self.fc0(x) # (batch, s, s, width)
x = x.permute(0, 3, 1, 2) # (batch, width, s, s)
x = F.pad(x, [0, self.padding, 0, self.padding])
x1 = self.conv0(x)
x2 = self.w0(x)
x = x1 + x2
x = F.gelu(x)
x1 = self.conv1(x)
x2 = self.w1(x)
x = x1 + x2
x = F.gelu(x)
x1 = self.conv2(x)
x2 = self.w2(x)
x = x1 + x2
x = F.gelu(x)
x1 = self.conv3(x)
x2 = self.w3(x)
x = x1 + x2
x = x[:, :, :-self.padding, :-self.padding]
x = self.fc1(x.permute(0,2,3,1))
x = F.gelu(x)
x = self.fc2(x)
return x # (batch, s, s, out_channels)
record_steps = train_gt.shape[3]
modes1 = 12
modes2 = 12
width = 128
out_channels = record_steps
model = FNO2d(modes1, modes2, width, out_channels).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
Loss_func = nn.MSELoss(reduction = 'mean')
Define Functions
def requires_grad(x):
return torch.tensor(x, requires_grad = True).to(device)
Train
(Again, this process may require a significant amount of time; therefore, we recommend not running it. However, we have provided a pretrained model for your convenience.)
epoch = 0
train_iter = iter(train_dataset)
while epoch < 501:
train_batch = next(train_iter)
a_batch, y_batch, gt_batch = train_batch
a_batch = torch.cat((a_batch, y_batch), -1)
a_batch, gt_batch = requires_grad(a_batch), requires_grad(gt_batch)
v = model(a_batch)
loss = Loss_func(v.reshape(batch_size, -1), gt_batch.reshape(batch_size, -1))
optimizer.zero_grad()
loss.backward()
optimizer.step()
if epoch % 100 == 0:
with torch.no_grad():
print('Epoch: {} Loss: {:.4f}'.format(epoch, loss.item()))
epoch += 1
test_idx = 10
with torch.no_grad():
a_test = A_test[test_idx:test_idx+1]
y_test = Y_test[test_idx:test_idx+1]
gt_test = GT_test[test_idx:test_idx+1]
a_test = np.concatenate((a_test, y_test), -1)
a_test, gt_test = requires_grad(a_test), requires_grad(gt_test)
v_test = model(a_test.float())
num_plots = 4
indices = np.linspace(5, 30, num_plots, dtype=int)
fig, axs = plt.subplots(2, num_plots+1, figsize=(3*(num_plots+1), 6))
###########################
# 1) 첫 번째 열: 초기조건
###########################
init_a = a_test[0, :, :, 0].detach().cpu().numpy() # (s, s)
axs[0, 0].imshow(init_a, cmap='jet')
axs[0, 0].set_title(f"Initial Condition", fontsize=15)
axs[1, 0].axis('off')
###########################
# 2) 2열부터: 선택된 시점
###########################
for i, idx in enumerate(indices):
col = i + 1 # 열 인덱스 (첫 열은 초기조건, 두 번째 열부터 시간 시점)
# 실제 해
axs[0, col].imshow(gt_test[0, :, :, idx].detach().cpu().numpy(), cmap='jet')
axs[0, col].set_title(f"Actual t={record_time[idx]:.3f}", fontsize=15)
# 예측 해
axs[1, col].imshow(v_test[0, :, :, idx].detach().cpu().numpy(), cmap='jet')
axs[1, col].set_title(f"Pred t={record_time[idx]:.3f}", fontsize=15)
plt.tight_layout()
plt.show()
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')