Summary — verified end-to-end
When running an Experiment.run_evals sweep over a multi-axis automl.List(...) config space, MLflow tracking only records three pipeline hyperparameters per run: model, rag_search_type, rag_k. Other swept axes — text-splitter / chunk size, embedder, reranker, vector store, prompt template, batch size, online-strategy params, rate-limit settings — are silently dropped from MLflow. Users cannot reconstruct, from MLflow alone, which sampled config produced a given metric.
Workaround: re-instantiate RFRandomSearch.get_runs(seed=X) with the same seed (deterministic enumeration) and inspect run["pipeline"] for each run_id to recover the full config. This is fragile (depends on seed stability across versions) and breaks the principle that MLflow is the run-of-truth.
Verified end-to-end via a working notebook that uses rapidfireai's normal public API (Experiment, RFLangChainRagSpec, RFAPIModelConfig, RFRandomSearch) with real OpenAI gpt-4o-mini calls. No source-poking, no monkey-patching. Notebook: https://gist.github.com/kamran-rapidfireAI/be8c5e604945a0340b9e21c9915fce7c
End-to-end repro
The notebook runs RFRandomSearch(num_runs=3, seed=42) over 3 axes inside RFLangChainRagSpec:
RFLangChainRagSpec(
text_splitter=List([None, RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=512)]),
embedding_cfg=List([
{"class": OpenAIEmbeddings, "model": "text-embedding-3-small", ...},
{"class": OpenAIEmbeddings, "model": "text-embedding-3-large", ...},
]),
search_cfg={"type": "similarity", "k": 2},
reranker_cfg=List([None, {"class": CrossEncoderReranker, ...}]),
...
)
(2×2×2 = 8-config grid, 3 sampled.) The 3 sampled configs differ materially on text_splitter and embedder:
[0] text_splitter=chunk_size=512 embedder=text-embedding-3-small
[1] text_splitter=page-level embedder=text-embedding-3-large
[2] text_splitter=page-level embedder=text-embedding-3-small
After the run, querying the MLflow params table:
SELECT e.name, r.run_uuid, p.key, p.value
FROM experiments e JOIN runs r USING (experiment_id) JOIN params p USING (run_uuid)
WHERE e.name = 'bug246_repro' ORDER BY r.run_uuid, p.key;
bug246_repro 3ba5d5...64491 model bug246_gpt4omini
bug246_repro 3ba5d5...64491 rag_k 2
bug246_repro 3ba5d5...64491 rag_search_type similarity
bug246_repro b522d5...88bd4 model bug246_gpt4omini
bug246_repro b522d5...88bd4 rag_k 2
bug246_repro b522d5...88bd4 rag_search_type similarity
bug246_repro c7c1ba...1baa model bug246_gpt4omini
bug246_repro c7c1ba...1baa rag_k 2
bug246_repro c7c1ba...1baa rag_search_type similarity
All 3 runs have identical params despite the underlying configs differing on text_splitter and embedder. A reviewer cannot tell the 3 runs apart on the swept axes from MLflow alone. The tags table for this experiment has only mlflow.runName ∈ {'1','2','3'} — no swept-axis info hidden there either.
Axes swept but missing from MLflow:
text_splitter / chunk_size
embedding_cfg (text-embedding-3-small vs -3-large)
reranker_cfg
batch_size
online_strategy_kwargs
sampling_params was not logged in this run either
Expected behavior
All swept hyperparameters surfaced as MLflow params so they can be filtered/grouped/compared in the UI without out-of-band replay. At minimum: chunk-size (or full text_splitter config), embedder class+kwargs, reranker class+kwargs, vector_store class+kwargs, search_kwargs beyond k, batch_size, online_strategy kwargs, prompt identity.
Minor side observation
The MLflow model value is the endpoint name (bug246_gpt4omini), not the underlying provider model id (gpt-4o-mini). That's a smaller information-loss issue adjacent to this one.
Environment
- rapidfireai:
main (HEAD 91d94de); same behavior on 0.15.2 PyPI
- Python 3.12, Linux
- MLflow backend: default OSS (sqlite at
~/rapidfireai/db/rapidfire_mlflow.db)
- Setup used in the verification notebook: experiment_name
bug246_repro, 2 questions, 1 shard, 3 runs (~77s total)
Summary — verified end-to-end
When running an
Experiment.run_evalssweep over a multi-axisautoml.List(...)config space, MLflow tracking only records three pipeline hyperparameters per run:model,rag_search_type,rag_k. Other swept axes — text-splitter / chunk size, embedder, reranker, vector store, prompt template, batch size, online-strategy params, rate-limit settings — are silently dropped from MLflow. Users cannot reconstruct, from MLflow alone, which sampled config produced a given metric.Workaround: re-instantiate
RFRandomSearch.get_runs(seed=X)with the same seed (deterministic enumeration) and inspectrun["pipeline"]for eachrun_idto recover the full config. This is fragile (depends on seed stability across versions) and breaks the principle that MLflow is the run-of-truth.Verified end-to-end via a working notebook that uses rapidfireai's normal public API (
Experiment,RFLangChainRagSpec,RFAPIModelConfig,RFRandomSearch) with real OpenAI gpt-4o-mini calls. No source-poking, no monkey-patching. Notebook: https://gist.github.com/kamran-rapidfireAI/be8c5e604945a0340b9e21c9915fce7cEnd-to-end repro
The notebook runs
RFRandomSearch(num_runs=3, seed=42)over 3 axes insideRFLangChainRagSpec:(2×2×2 = 8-config grid, 3 sampled.) The 3 sampled configs differ materially on
text_splitterandembedder:After the run, querying the MLflow params table:
All 3 runs have identical params despite the underlying configs differing on
text_splitterandembedder. A reviewer cannot tell the 3 runs apart on the swept axes from MLflow alone. Thetagstable for this experiment has onlymlflow.runName ∈ {'1','2','3'}— no swept-axis info hidden there either.Axes swept but missing from MLflow:
text_splitter/chunk_sizeembedding_cfg(text-embedding-3-small vs -3-large)reranker_cfgbatch_sizeonline_strategy_kwargssampling_paramswas not logged in this run eitherExpected behavior
All swept hyperparameters surfaced as MLflow params so they can be filtered/grouped/compared in the UI without out-of-band replay. At minimum: chunk-size (or full text_splitter config), embedder class+kwargs, reranker class+kwargs, vector_store class+kwargs, search_kwargs beyond
k, batch_size, online_strategy kwargs, prompt identity.Minor side observation
The MLflow
modelvalue is the endpoint name (bug246_gpt4omini), not the underlying provider model id (gpt-4o-mini). That's a smaller information-loss issue adjacent to this one.Environment
main(HEAD91d94de); same behavior on 0.15.2 PyPI~/rapidfireai/db/rapidfire_mlflow.db)bug246_repro, 2 questions, 1 shard, 3 runs (~77s total)