TxBench-PP is a verifiable benchmark for small-molecule preclinical pharmacology and the first focused slice of a broader benchmarking effort across drug-discovery stages and therapeutic modalities.
evals/: seven public example evaluations with prompt/config metadata.trajectories/: public trajectories for selected model-harness configurations for the public evaluations.