Skip to content

DanieleOranges/Continual-Learning-Optimal-Control

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Continual Optimal Control based on Sequential Random Fourier Gaussian Processes

Introduction

  • Dynamics Model:
    • Uses a Sequential Random Fourier Gaussian Process (SRFGP) to approximate and learn the state transition function.
    • Updates continuously with incoming trajectory data with the ability of updating real time.
  • Continual Learning:
    • The dynamics model is updated online starting from scratch, without losing past information.
    • Supports non-stationary environments where dynamics evolve over time.
  • Optimal Control Integration:
    • In order to test the validity of the transition model, a simple Trajectory Planning is implemented.

Background: Random Fourier Gaussian Processes

Gaussian Processes (GPs) are powerful non-parametric models for learning unknown functions with uncertainty estimates. However, standard GPs scale poorly with the number of data points, making them less practical for continual or online learning.

To overcome this, Random Fourier Features (RFFs) are used to approximate the kernel function of a GP. This transforms the infinite-dimensional kernel space into a finite-dimensional feature space using random projections.

  • Key idea: By sampling random frequencies from the Fourier transform of the kernel (e.g., RBF kernel), we can approximate the kernel function with inner products in this random feature space.

  • Benefits:

    • Scales to larger datasets compared to exact GPs.
    • Efficient online updates suitable for continual learning.
    • Retains the ability to capture uncertainty in predictions.

In this repository, it is implemented a Sequential Random Fourier Gaussian Process that continuously updates its parameters as new data arrives, making it suitable for dynamic environments that evolves over time.

Mathematical Background

A Gaussian Process with an RBF kernel is defined by:

$$k(x, x') = \sigma_f^2 \exp\left(-\frac{\|x - x'\|^2}{2\ell^2}\right).$$

Using Bochner's theorem, the RBF kernel can be approximated by Random Fourier Features (RFFs). We sample random frequencies ω and random phases b from the following distributions:

$$\omega_i \sim \mathcal{N}(0, \ell^{-2} I)$$ $$b_i \sim \mathcal{U}(0, 2\pi)$$

The feature map is then:

$$\phi(x) = \sqrt{\frac{2\sigma_f^2}{D}} \cos(W^\top x + b), \quad \text{where } W \in \mathbb{R}^{d \times D} \text{ and } b \in \mathbb{R}^D.$$

The kernel can be approximated as:

$$k(x, x') \approx \phi(x)^\top \phi(x').$$

This reduces the GP problem to Bayesian linear regression in the RFF feature space.

Bayesian Updates for Continual Learning

We assume a prior over the weights w:

$$w \sim \mathcal{N}(0, \sigma_w^2 I)$$

The posterior mean and covariance are then updated sequentially. For each new observation (x, y), the update rules are:

$$S = \phi(x)^\top P \phi(x) + \sigma_n^2,$$ $$K = \frac{P \phi(x)}{S},$$ $$m \leftarrow m + \alpha ( K (y - \phi(x)^\top m) ),$$ $$P \leftarrow P - \alpha ( K \phi(x)^\top P ),$$

where m is the posterior mean and P is the posterior covariance of the weight distribution. The parameter α works as an update scaling factor that controls how strongly each new observation affects the posterior. A value of α = 1 corresponds to a full Bayesian update, which is used as the standard value.

Example Animation

The following animation shows how the SRFGP learns the dynamics over time:

Streaming GP

Optimal Control results

NLP Formulation:

In its first iteration, to make it quickly compatible with the implemented PyTorch model, the NLP problem uses the L-BFGS optimizer of PyTorch itself. This means the solution must be in the form:

$$\min_{u_{0:N-1}} \hat{L} = \sum_{t=0}^{N-1} c\big(\hat{x}_{t+1}, u_t\big), \quad \text{where } \hat{x}_{t+1} = f_\theta(\hat{x}_t, u_t)$$

This has multiple drawbacks meaning:

  • No hard constraints support
  • Everything must be scripted in PyTorch to run
  • Not ideal for RT enviroments
  • Iterations are limited to 2, as a compromise between accuracy and computational timings.

Each time the solver is called for a new timestep, the last (approximate) solution is used as a warm starting point for the next one.

Default environment:

In the default environment, a track is randomly generated, along with some ego vehicle initial conditions.

Model starts without any data points, updating every time a new data point (a simulation step) arrives. In this implementation a buffer is constantly receiving new data points to fit, with a maximum default size of 100 points.

The controller optimizes trajectory using only the predictive mean at instant t.

Default episode trajectory

Update operation (for batch of 100 points) can already be performed in real time without any optimization, every single simulation step. Following results are obtained on a Apple Macbook Air M2 machine with 8 Gb of RAM.

Default episode timings

Regularized environment:

As the transition model is a SRFGP we do get both the predictive mean and variance of the transition function, that is:

$$p(\hat{x}) := p(x_{t+1} \mid x_t, u_t) = \mathcal{N}(\mu_t, \Sigma_t)$$

The latter can be introduced in the optimization module by means of 2 main strategies:

  • Directly integrate the actual covariance into the cost function (penalty-like or geometrical). Such solution can be simply integrated like: $$\hat{L} = \sum_{t=0}^{N-1} \left[ c\big(\mu_{t+1}(x_{t},u_t), u_{t}\big) + \alpha \|\Sigma_t\|_2 \right]$$
  • The loss function is based on the expected value of the cost function for the given timestep, that is: $$\hat{L} = \sum_{t=0}^{N-1} \mathbb{E}_{\hat{x}_{t+1} \sim p(\hat{x}_{t+1} \mid x_t, u_t)} \left[ c\big(\hat{x}_{t+1}, u_{t}\big) \right]$$ Practically, this can be implemented using a Monte Carlo approach.

This introduction will have the effect of penalizing uncertainties of the transition model within the optimization, leading to safer and more robust trajectories, shown in the plot below:

Regulrization speeds

The resulting animated trajectories, where the standard is the first, and regularized the second, are shown below:

Default episode trajectory Default episode trajectory

Installation

uv pip install -r requirements.txt

Future Work

  • Switch to a proper NLP solver, that can implement techniques like collocation methods, leading to faster solving times, and the ability of implementing state, control constraints, instead of using penalizations within the cost function.
  • Optimize the currently naive way of SRFGP of handling multiple outputs, with the possibility of exporting it in C++

License

MIT License.

About

Continual Optimal Control based on Sequential Random Fourier Gaussian Processes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages