Background
`scripts/build_backgrounds_chrombpnet.py:200` does:
```python
os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpu)
```
unconditionally — clobbering any pre-set `CUDA_VISIBLE_DEVICES` from the calling shell. So this parallel-launch pattern (suggested in the handoff at `audits/2026-04-29_chrombpnet_cdf_rebuild/HANDOFF.md` for sharded multi-GPU runs):
```bash
Terminal 1 (intended GPU 0):
CUDA_VISIBLE_DEVICES=0 python scripts/build_backgrounds_chrombpnet.py --gpu 0 ...
Terminal 2 (intended GPU 1):
CUDA_VISIBLE_DEVICES=1 python scripts/build_backgrounds_chrombpnet.py --gpu 0 ...
```
…doesn't work as expected: both jobs end up on physical GPU 0, fighting for memory, because each script invocation overrides `CUDA_VISIBLE_DEVICES=N` from its own `--gpu N` arg.
Reproduction
During PR #70's Phase 1 redo, two parallel invocations:
- `CUDA_VISIBLE_DEVICES=0 ... --part variants --assay ATAC_DNASE --gpu 0`
- `CUDA_VISIBLE_DEVICES=1 ... --part baselines --assay ATAC_DNASE --gpu 0`
…both ended up on physical GPU 0. The second job OOM'd at `MaxAllocSize: 327706624` (~312 MB free between the first job's allocation and the third user's job on the same GPU).
Workaround for the rebuild was to pass `--gpu 1` explicitly to the second invocation and skip the outer `CUDA_VISIBLE_DEVICES` setting — but this is the opposite of what an experienced cluster user would expect.
Suggested fix
```python
Honour pre-set CUDA_VISIBLE_DEVICES if present; only set from --gpu when nothing
was passed in via env.
if "CUDA_VISIBLE_DEVICES" not in os.environ:
os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpu)
```
Or warn loudly when the script is overriding an existing env var:
```python
if "CUDA_VISIBLE_DEVICES" in os.environ and os.environ["CUDA_VISIBLE_DEVICES"] != str(args.gpu):
logger.warning(
"Overriding caller CUDA_VISIBLE_DEVICES=%s with --gpu=%s",
os.environ["CUDA_VISIBLE_DEVICES"], args.gpu,
)
os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpu)
```
The first option is cleaner and matches Unix env-var conventions (env var > CLI arg unless arg explicitly overrides).
Why it matters
The handoff's sharded-multi-GPU invocation pattern requires a workaround that's non-obvious. Anyone following the handoff with `CUDA_VISIBLE_DEVICES=N` will silently land on the wrong GPU.
Related
Background
`scripts/build_backgrounds_chrombpnet.py:200` does:
```python
os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpu)
```
unconditionally — clobbering any pre-set `CUDA_VISIBLE_DEVICES` from the calling shell. So this parallel-launch pattern (suggested in the handoff at `audits/2026-04-29_chrombpnet_cdf_rebuild/HANDOFF.md` for sharded multi-GPU runs):
```bash
Terminal 1 (intended GPU 0):
CUDA_VISIBLE_DEVICES=0 python scripts/build_backgrounds_chrombpnet.py --gpu 0 ...
Terminal 2 (intended GPU 1):
CUDA_VISIBLE_DEVICES=1 python scripts/build_backgrounds_chrombpnet.py --gpu 0 ...
```
…doesn't work as expected: both jobs end up on physical GPU 0, fighting for memory, because each script invocation overrides `CUDA_VISIBLE_DEVICES=N` from its own `--gpu N` arg.
Reproduction
During PR #70's Phase 1 redo, two parallel invocations:
…both ended up on physical GPU 0. The second job OOM'd at `MaxAllocSize: 327706624` (~312 MB free between the first job's allocation and the third user's job on the same GPU).
Workaround for the rebuild was to pass `--gpu 1` explicitly to the second invocation and skip the outer `CUDA_VISIBLE_DEVICES` setting — but this is the opposite of what an experienced cluster user would expect.
Suggested fix
```python
Honour pre-set CUDA_VISIBLE_DEVICES if present; only set from --gpu when nothing
was passed in via env.
if "CUDA_VISIBLE_DEVICES" not in os.environ:
os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpu)
```
Or warn loudly when the script is overriding an existing env var:
```python
if "CUDA_VISIBLE_DEVICES" in os.environ and os.environ["CUDA_VISIBLE_DEVICES"] != str(args.gpu):
logger.warning(
"Overriding caller CUDA_VISIBLE_DEVICES=%s with --gpu=%s",
os.environ["CUDA_VISIBLE_DEVICES"], args.gpu,
)
os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpu)
```
The first option is cleaner and matches Unix env-var conventions (env var > CLI arg unless arg explicitly overrides).
Why it matters
The handoff's sharded-multi-GPU invocation pattern requires a workaround that's non-obvious. Anyone following the handoff with `CUDA_VISIBLE_DEVICES=N` will silently land on the wrong GPU.
Related