Deep Learning Experiment Template

A modular, reproducible deep learning template designed for seamless transition between local development and Slurm clusters.

Tech Stack:

uv: Blazing fast Python package management and virtual environment handling.
Hydra: Compositional configuration management.
Submitit: Slurm job submission directly from Python.
Weights & Biases: Experiment tracking and visualization.

Repository Structure

├── configs/                 # Hydra configuration files
│   ├── config.yaml          # Main config (defaults)
│   ├── model/               # Model architecture hyperparameters
│   └── hydra/launcher/      # Slurm submission settings
├── data/                    # Data storage (git-ignored)
├── logs/                    # Local logs and Slurm output files
├── scripts/                 # Executable entry points
│   └── train.py             # Main training script
├── src/                     # Source code (installed as editable package)
│   └── my_model/            # Your actual python package
│       ├── models/          # PyTorch modules
│       └── utils/           # Utilities (W&B logging, etc.)
├── pyproject.toml           # Dependency definitions
└── uv.lock                  # Exact dependency versions for reproducibility

Quick Start

1. Prerequisites

You need uv installed.

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Installation

Initialize the environment and install dependencies (including the local src package).

uv sync

3. Weights & Biases Setup

Ensure you are logged in to tracking experiments.

uv run wandb login

Running Experiments

This repository uses Hydra to handle configurations. You can override any parameter from the command line.

Option A: Local Run

Good for debugging and small-scale testing.

# Run with defaults

uv run python scripts/train.py

# Override hyperparameters
uv run python scripts/train.py model.num_classes=100 epochs=5

Option B: Slurm Cluster Run

Submit jobs to the cluster directly from your workstation (or head node) without writing .sbatch files. The submitit plugin handles the job submission.

Note: The --multirun flag is required to trigger the launcher plugin, even for single jobs.

# Submit a single job to Slurm
uv run python scripts/train.py --multirun

# Run a Hyperparameter Sweep (runs 2 jobs with different LRs)
uv run python scripts/train.py --multirun lr=1e-3,1e-4

Configuration & Slurm Settings

modifying Slurm Parameters

To change partition, time limits, or GPU requests, edit configs/hydra/launcher/submitit_slurm.yaml.

Key Fields:

partition: The cluster partition to use (default: gpu).
timeout_min: Max runtime in minutes.
gpus_per_node: Number of GPUs requested.
setup: Bash commands to run before Python starts (e.g., module load cuda).

Example Override via CLI:

uv run python scripts/train.py --multirun \
    hydra.launcher.partition=high_priority \
    hydra.launcher.timeout_min=60

Development Workflow

Add Dependencies:

uv add matplotlib

Format Code:

uv run ruff check .

Run Tests:

uv run pytest

Troubleshooting

Interpolation key 'hydra.job.name' not found:
- This happens if you removed the oc.select safety wrapper in the launcher config. Ensure your config uses ${oc.select:hydra.job.name,local_run} to handle both local and cluster contexts.
W&B Offline Mode:
- If compute nodes have no internet, edit configs/config.yaml to set wandb.mode: "offline". You can sync runs later using wandb sync wandb/run-....

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
scripts		scripts
src/my_model		src/my_model
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Experiment Template

Repository Structure

Quick Start

1. Prerequisites

2. Installation

3. Weights & Biases Setup

Running Experiments

Option A: Local Run

Option B: Slurm Cluster Run

Configuration & Slurm Settings

modifying Slurm Parameters

Development Workflow

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Experiment Template

Repository Structure

Quick Start

1. Prerequisites

2. Installation

3. Weights & Biases Setup

Running Experiments

Option A: Local Run

Option B: Slurm Cluster Run

Configuration & Slurm Settings

modifying Slurm Parameters

Development Workflow

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages