Suppose we have a function
Let's take for example a function of two variables
In that case, the linear approximation becomes:
Note: the derivatives above are taken at
In matrix form:
We call
For example, let's take
What is
Suppose we want to minimize
Idea: start at
The gradient provides a local, linear approximation of f:
It gives the direction of the steepest ascent (or descent) of
The algorithm is then:
We can now find the
Problem remains:
A perceptron is a function modelling a neuron, with the following architecture:
The output of the perceptron is:
Where:
The goal of the activation function is to introduce non-linearity in the model.
In a neural network, multiple perceptrons are combined in layers:
Here, each
We call this a multi-layer perceptron (MLP).
The layers are fully connected.
Suppose we have:
Where
How to compute
In practice, we don't compute the loss over the whole dataset at each iteration. Instead, we use mini-batches.
For example, we will sample
We call an epoch a full pass over the dataset.
When learning, we want to avoid overfitting.
To achieve this, the typical strategy is to split the dataset into training and validation sets.
For example, 80% of the data is used for training, and 20% for validation.
By monitoring the loss on the validation set, we can detect overfitting:
The gradient descent as presented before is not used in practice. Many extra features are added to improve convergence.
The most common optimizer is the Adam optimizer.
It mostly adds momentum and adaptive learning rates.
PyTorch is a popular library for deep learning. It provides:
Let's first write a module for multi-layer perceptron (mlp.py
):
import torch as th
# Defining a simple MLP
class MLP(th.nn.Module):
def __init__(self, input_dimension: int, output_dimension: int):
super().__init__()
self.net = th.nn.Sequential(
th.nn.Linear(input_dimension, 256),
th.nn.ELU(),
th.nn.Linear(256, 256),
th.nn.ELU(),
th.nn.Linear(256, 256),
th.nn.ELU(),
th.nn.Linear(256, output_dimension),
)
def forward(self, x):
return self.net(x)
We will then create some (nosiy) data:
import numpy as np
xs = np.linspace(0, 6, 1000)
ys = np.sin(xs)*2.5 + 1.5 + np.random.randn(1000)*0.1
Those data needs to be transformed into PyTorch tensors:
import torch as th
xs = th.tensor(xs, dtype=th.float).unsqueeze(1)
ys = th.tensor(ys, dtype=th.float).unsqueeze(1)
We can now create our MLP and the optimizer:
from mlp import MLP
# Creating the network and optimizer
net = MLP(1, 1)
optimizer = th.optim.Adam(net.parameters(), 1e-3)
Finally, we can train our model:
# Training the model
for epoch in range(512):
# Compute the loss
loss = th.nn.functional.mse_loss(net(xs), ys)
# Computing the gradient and updating weights
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Epoch {epoch}, loss={loss.item()}")
Putting it all together, and adding some plotting, we get the example code:
Plotting the loss over time: