ADT_STR

Automatic Drum Transcription model.

CLAP-based Sample Curation

This repository implements an unsupervised method to automatically curate a large and diverse corpus of one-shot drum samples from unlabeled audio sources.

The approach leverages CLAP audio features to create a structured one-shot library for synthetic data generation, starting from a small curated set of reference samples. Given a handcrafted set of labeled one-shots G and a large unstructured library U, both are encoded through CLAP's audio encoder. For each reference sample in G, similarity scores are computed against all samples in U and the most similar ones are selected. This produces a structured library C that inherits the categorical organization of G while scaling to the size of U.

The method is scalable and works with any unstructured library of one-shot samples.

Project Structure

ADT_STR/
├── train.py              # Training script
├── inference.py          # Inference and evaluation script
├── model.py              # Model architecture
├── config.py             # Configuration dataclasses
├── configs/              # Configuration files
│   ├── config_default.yaml
│   ├── train/            # Training configs
│   └── eval/             # Evaluation configs
├── modules/              # Core modules
│   ├── midi_tokenizer.py
│   ├── synthetiser.py
│   └── segmenter.py
├── data_modules/         # Dataset and data processing
│   ├── train_dataset.py
│   └── eval_dataset.py
└── utils/                # Utility functions

Configuration

The configuration system uses YAML files with a default configuration (configs/config_default.yaml) that gets merged with experiment-specific configs.

Training configs are placed in configs/train/, evaluation configs in configs/eval/.

Training

The training script uses HuggingFace's Trainer and is designed to work with accelerate for multi-GPU training.

Single GPU

python train.py configs/train/setting-1.yaml

Multi-GPU with Accelerate

accelerate launch train.py configs/train/setting-1.yaml

The training config should specify:

training.batch_size: Per-device batch size
training.num_epochs: Number of training epochs
training.learning_rate: Learning rate
LakhDatasetConfig.dataset_path: Path to the training dataset
synthetiser.oneshot_path: Path to drum oneshot samples
logging.output_dir: Output directory for checkpoints

Inference

Run evaluation on ENST or MDB datasets:

python inference.py configs/eval/ENSTinference.yaml

python inference.py configs/eval/MDBinference.yaml

The inference config should specify:

inference.checkpoint_path: Path to the model checkpoint
EvalDatasetConfig.dataset_path: Path to the evaluation dataset
inference.output_path: Output directory for results

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
configs		configs
data_modules		data_modules
modules		modules
scripts		scripts
utils		utils
.gitignore		.gitignore
DATASET_AUGMENTATION_PIPELINE.md		DATASET_AUGMENTATION_PIPELINE.md
README.md		README.md
build_model.py		build_model.py
config.py		config.py
eval.py		eval.py
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADT_STR

CLAP-based Sample Curation

Project Structure

Configuration

Training

Single GPU

Multi-GPU with Accelerate

Inference

Results

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ADT_STR

CLAP-based Sample Curation

Project Structure

Configuration

Training

Single GPU

Multi-GPU with Accelerate

Inference

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages