-
Notifications
You must be signed in to change notification settings - Fork 0
MLWM Training Runbook
This runbook records the local training workflow for the neural robust image watermark engine.
Training environment:
py -3.12 -m venv .venv-ml
.\.venv-ml\Scripts\python.exe -m pip install --upgrade pip
.\.venv-ml\Scripts\python.exe -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
.\.venv-ml\Scripts\python.exe -m pip install -r blind_watermark\requirements-ml.txtRuntime/package environment:
py -3.12 -m venv .venv-pack
.\.venv-pack\Scripts\python.exe -m pip install --upgrade pip
.\.venv-pack\Scripts\python.exe -m pip install -r blind_watermark\requirements-onnx.txt pyinstallerGPU verification:
.\.venv-ml\Scripts\python.exe -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"Expected local result:
- CUDA available:
True - GPU:
NVIDIA GeForce RTX 5060 Laptop GPU
The Unsplash Lite package at:
C:\Users\Ha183\Downloads\Compressed\unsplash-research-dataset-lite-latest
contains metadata TSV/CSV files, not image files. Download resized images from photos.csv000:
.\.venv-ml\Scripts\python.exe -m blind_watermark.mlwm.download_unsplash_lite --photos-file "C:\Users\Ha183\Downloads\Compressed\unsplash-research-dataset-lite-latest\photos.csv000" --out-dir data\unsplash_lite_raw --limit 5000 --width 1024 --quality 85 --workers 12Current local download result:
- Requested:
5000 - OK:
4999 - Failed:
1
Prepare train/val directories:
.\.venv-ml\Scripts\python.exe -m blind_watermark.mlwm.prepare_dataset --source data\unsplash_lite_raw --out-dir data --min-size 512 --val-ratio 0.1 --copy-mode hardlink --cleanCurrent local prepared dataset:
- Train:
4488 - Validation:
499 - Manifest:
data/dataset_manifest.json
Use --clean to prevent older smoke samples from mixing into the current dataset.
Run a short validation pass before long training:
.\.venv-ml\Scripts\python.exe -m blind_watermark.mlwm.train --config configs\mlwm\smoke.yamlLatest real-data smoke result:
- Run:
artifacts/mlwm_v1/runs/20260426T074520+0000_6056931 - Best epoch:
2 - Best score:
0.680125
Run only when the RTX 5060 can be occupied for several uninterrupted hours:
.\.venv-ml\Scripts\python.exe -m blind_watermark.mlwm.train --config configs\mlwm\main.yamlMonitor:
- GPU memory usage
metrics_epoch.csv- validation payload accuracy
- exact match rate
- checkpoint creation
run_manifest.json
If GPU memory fails, reduce image_size to 448 for the first full experiment.
Export from the best checkpoint to a temporary candidate directory first:
.\.venv-ml\Scripts\python.exe -m blind_watermark.mlwm.export_onnx --config configs\mlwm\export.yaml --checkpoint <best.ckpt> --out-dir artifacts\mlwm_v1\tmp\candidate_001Check runtime readiness:
.\.venv-pack\Scripts\python.exe blind_watermark\bwm_helper.py --mode check --models-dir artifacts\mlwm_v1\tmp\candidate_001Only copy ONNX files into resources/models/neural_wm after benchmark acceptance.
A candidate model can be promoted only when:
- ONNX export succeeds as single-file
encoder.onnxanddecoder.onnx. - Helper reports
neuralReady=true. - Benchmark results meet or clearly justify the acceptance threshold.
-
model.jsonrecords:- model version
- Git commit
- dataset manifest hash
- config hash
- ONNX SHA-256 values
- benchmark summary
Do not commit raw datasets, intermediate runs, or temporary exports.