A tensor library in Rust with pytorch-style autograd and python bindings.
Note: this project is an academic endeavour to learn more about rust and pytorch. It is in no way a meant to be used in production applications.
This project is built using maturin.
git clone [email protected]:bastienlc/autograd.git
cd autograd
python -m venv .venv
source .venv/bin/activate
pip install -U pip maturin
maturin develop --release
# install examples dependencies
pip install torch numpy matplotlib tqdm torchvisionNote: this project uses nightly rust (for the feature mapped_lock_guards).
Note: make sure you didn't source a conda env before running
maturin develop. You can still create your venv using a conda env.
import autograd
a = autograd.Tensor([2], [0.1, 0.2], requires_grad=True, grad=None, graph=None)
b = autograd.Tensor([2], [0.3, 0.4], requires_grad=True, grad=None, graph=None)
c = (a + b).reduce_sum()
c.backward(None)
print(c.get_data()) # [1.0]
print(a.get_grad().get_data(), b.get_grad().get_data()) # [1.0, 1.0] [1.0, 1.0]Take a look at examples/example.py for a side-by-side comparison with pytorch on MNIST.
We compare our library's performance with torch (default) performance. We consider matrices -A, A.transpose(), etc.) and binary (e.g. A@B, A+B, etc.) operations. For each operations we look at the forward and backward performance (for the backward, we use gradients of ones).
The benchmark is ran with the profiling script on an Intel Core i7-13705H.
For small values of
This is a rust project. It uses pyo3 to get python bindings. The architecture is mainly inspired by pytorch. As such we have a CoreTensor object that holds data, a grad and a graph. For easy parallelization without data duplication, we use the rust pattern Arc<RwLock<CoreTensor>>, i.e. atomically reference counted tensors with interior mutability. If this doesn't mean anything to you, maybe it's your cue to learn rust.
The forward operations are directly implemented on Tensor. The backward operations are implemented using a Backward trait. For instance the backward operation for the addition looks like this:
pub struct AddOperation {
lhs: Tensor,
rhs: Tensor,
}
impl Backward for AddOperation {
fn do_backward(&mut self, grad: Option<Tensor>, _: Option<Tensor>) {
let grad = grad.unwrap();
self.lhs.do_backward(Some(grad.clone()), None);
self.rhs.do_backward(Some(grad.clone()), None);
}
}AddOperation will be a node in the graph. When doing the forward, pass, this node will form the result tensor graph.
For now we only have tests in python comparing forward/backward of all the operations with pytorch and numpy implementations.
source .venv/bin/activate
pip install pytest
pytest- Add strided tensors for
$\mathcal{O}(1)$ transposition. - Optimize different kernels.
- Add missing utilities (zero_grad, zeros, ones, randn, operations with native pytorch types, etc).
- Add more operations.
- Make implementation details of
Tensorprivate. - Investigate possible memory leak.
- Investigate possible error in some backward implementation.
- Write a real documentation.
