Automatic Drum Transcription model.
This repository implements an unsupervised method to automatically curate a large and diverse corpus of one-shot drum samples from unlabeled audio sources.
The approach leverages CLAP audio features to create a structured one-shot library for synthetic data generation, starting from a small curated set of reference samples. Given a handcrafted set of labeled one-shots G and a large unstructured library U, both are encoded through CLAP's audio encoder. For each reference sample in G, similarity scores are computed against all samples in U and the most similar ones are selected. This produces a structured library C that inherits the categorical organization of G while scaling to the size of U.
The method is scalable and works with any unstructured library of one-shot samples.
ADT_STR/
├── train.py # Training script
├── inference.py # Inference and evaluation script
├── model.py # Model architecture
├── config.py # Configuration dataclasses
├── configs/ # Configuration files
│ ├── config_default.yaml
│ ├── train/ # Training configs
│ └── eval/ # Evaluation configs
├── modules/ # Core modules
│ ├── midi_tokenizer.py
│ ├── synthetiser.py
│ └── segmenter.py
├── data_modules/ # Dataset and data processing
│ ├── train_dataset.py
│ └── eval_dataset.py
└── utils/ # Utility functions
The configuration system uses YAML files with a default configuration (configs/config_default.yaml) that gets merged with experiment-specific configs.
Training configs are placed in configs/train/, evaluation configs in configs/eval/.
The training script uses HuggingFace's Trainer and is designed to work with accelerate for multi-GPU training.
python train.py configs/train/setting-1.yamlaccelerate launch train.py configs/train/setting-1.yamlThe training config should specify:
training.batch_size: Per-device batch sizetraining.num_epochs: Number of training epochstraining.learning_rate: Learning rateLakhDatasetConfig.dataset_path: Path to the training datasetsynthetiser.oneshot_path: Path to drum oneshot sampleslogging.output_dir: Output directory for checkpoints
Run evaluation on ENST or MDB datasets:
python inference.py configs/eval/ENSTinference.yamlpython inference.py configs/eval/MDBinference.yamlThe inference config should specify:
inference.checkpoint_path: Path to the model checkpointEvalDatasetConfig.dataset_path: Path to the evaluation datasetinference.output_path: Output directory for results

