Skip to content

[BUG] MLflow only logs 3 pipeline params (model, rag_search_type, rag_k); chunk_size / embedder / reranker invisible in sweeps #246

@kamran-rapidfireAI

Description

@kamran-rapidfireAI

Summary — verified end-to-end

When running an Experiment.run_evals sweep over a multi-axis automl.List(...) config space, MLflow tracking only records three pipeline hyperparameters per run: model, rag_search_type, rag_k. Other swept axes — text-splitter / chunk size, embedder, reranker, vector store, prompt template, batch size, online-strategy params, rate-limit settings — are silently dropped from MLflow. Users cannot reconstruct, from MLflow alone, which sampled config produced a given metric.

Workaround: re-instantiate RFRandomSearch.get_runs(seed=X) with the same seed (deterministic enumeration) and inspect run["pipeline"] for each run_id to recover the full config. This is fragile (depends on seed stability across versions) and breaks the principle that MLflow is the run-of-truth.

Verified end-to-end via a working notebook that uses rapidfireai's normal public API (Experiment, RFLangChainRagSpec, RFAPIModelConfig, RFRandomSearch) with real OpenAI gpt-4o-mini calls. No source-poking, no monkey-patching. Notebook: https://gist.github.com/kamran-rapidfireAI/be8c5e604945a0340b9e21c9915fce7c

End-to-end repro

The notebook runs RFRandomSearch(num_runs=3, seed=42) over 3 axes inside RFLangChainRagSpec:

RFLangChainRagSpec(
    text_splitter=List([None, RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=512)]),
    embedding_cfg=List([
        {"class": OpenAIEmbeddings, "model": "text-embedding-3-small", ...},
        {"class": OpenAIEmbeddings, "model": "text-embedding-3-large", ...},
    ]),
    search_cfg={"type": "similarity", "k": 2},
    reranker_cfg=List([None, {"class": CrossEncoderReranker, ...}]),
    ...
)

(2×2×2 = 8-config grid, 3 sampled.) The 3 sampled configs differ materially on text_splitter and embedder:

[0] text_splitter=chunk_size=512  embedder=text-embedding-3-small
[1] text_splitter=page-level      embedder=text-embedding-3-large
[2] text_splitter=page-level      embedder=text-embedding-3-small

After the run, querying the MLflow params table:

SELECT e.name, r.run_uuid, p.key, p.value
FROM experiments e JOIN runs r USING (experiment_id) JOIN params p USING (run_uuid)
WHERE e.name = 'bug246_repro' ORDER BY r.run_uuid, p.key;
bug246_repro  3ba5d5...64491  model            bug246_gpt4omini
bug246_repro  3ba5d5...64491  rag_k            2
bug246_repro  3ba5d5...64491  rag_search_type  similarity
bug246_repro  b522d5...88bd4  model            bug246_gpt4omini
bug246_repro  b522d5...88bd4  rag_k            2
bug246_repro  b522d5...88bd4  rag_search_type  similarity
bug246_repro  c7c1ba...1baa   model            bug246_gpt4omini
bug246_repro  c7c1ba...1baa   rag_k            2
bug246_repro  c7c1ba...1baa   rag_search_type  similarity

All 3 runs have identical params despite the underlying configs differing on text_splitter and embedder. A reviewer cannot tell the 3 runs apart on the swept axes from MLflow alone. The tags table for this experiment has only mlflow.runName ∈ {'1','2','3'} — no swept-axis info hidden there either.

Axes swept but missing from MLflow:

  • text_splitter / chunk_size
  • embedding_cfg (text-embedding-3-small vs -3-large)
  • reranker_cfg
  • batch_size
  • online_strategy_kwargs
  • sampling_params was not logged in this run either

Expected behavior

All swept hyperparameters surfaced as MLflow params so they can be filtered/grouped/compared in the UI without out-of-band replay. At minimum: chunk-size (or full text_splitter config), embedder class+kwargs, reranker class+kwargs, vector_store class+kwargs, search_kwargs beyond k, batch_size, online_strategy kwargs, prompt identity.

Minor side observation

The MLflow model value is the endpoint name (bug246_gpt4omini), not the underlying provider model id (gpt-4o-mini). That's a smaller information-loss issue adjacent to this one.

Environment

  • rapidfireai: main (HEAD 91d94de); same behavior on 0.15.2 PyPI
  • Python 3.12, Linux
  • MLflow backend: default OSS (sqlite at ~/rapidfireai/db/rapidfire_mlflow.db)
  • Setup used in the verification notebook: experiment_name bug246_repro, 2 questions, 1 shard, 3 runs (~77s total)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions