Tokenizer transplant enables composition across incompatible vocabularies, but coefficient reuse can be weaponized. TokenForge shows how a single "breaker token" can stay inert in a donor model while becoming high-salience after transplant into a base model, creating an asymmetric realizability gap.
The attacker estimates cross-model feature overlap from public text, then solves a dual-objective optimization that suppresses donor salience while maximizing base salience under the transplant operator. The resulting breaker token is inert in the donor but high-impact in the base.
external_tokensurgeon/scripts/run_experiment.py orchestrates the end-to-end workflow:
mu collection, token design, donor patching, and transplant/merge. It writes outputs under --run-dir
including mu/, design/, patched_donor/, merged/, and eval/ (unless --skip-eval is set).
Minimal example:
python external_tokensurgeon/scripts/run_experiment.py \
--method tokensurgeon \
--run-dir runs/demo \
--base-model <base_model_id_or_path> \
--donor-model <donor_model_id_or_path> \
--tokens "<breaker_token>"Notes:
- Repeat
--tokensto design multiple tokens. - Use
--device cpufor CPU-only runs; GPU is strongly recommended. - Use
--trust-remote-codewhen required by a model repo. --merge-methodand related flags control the transplant/merge strategy.
After run_experiment.py, evaluate sequence emission rate (SER) with
scripts/run_ser_vllm.py. The default tokens file is
runs/<name>/patched_donor/tokens.txt.
Single-task example (Hugging Face datasets):
python scripts/run_ser_vllm.py \
--model runs/demo/merged \
--tokens-file runs/demo/patched_donor/tokens.txt \
--dataset <hf_dataset_name> \
--prompt-template <alpaca_chat|squad_qa|gsm8k_cot|text32|plain_text|humaneval_code> \
--split validation \
--limit 256 \
--output runs/demo/ser.jsonMulti-task example (recommended for paper-style SER sweeps):
python scripts/run_ser_vllm.py \
--model runs/demo/merged \
--tokens-file runs/demo/patched_donor/tokens.txt \
--tasks-file <tasks.json> \
--output-dir runs/demo/sertasks.json is a list of objects with name, dataset, dataset_config, split,
limit, and prompt_template fields. vLLM is used when available; the script
falls back to Hugging Face generation if needed.
@misc{liu2025trojanvocabularystealthysabotage,
title={The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition},
author={Xiaoze Liu and Weichen Yu and Matt Fredrikson and Xiaoqian Wang and Jing Gao},
year={2025},
eprint={2601.00065},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2601.00065},
}