Migrate documentation into docs/ with full history#1321
Conversation
docs: scope MoE & Routing page to R3 and remove all --use-miles-router occurrences
Verified on `radixark/miles@origin/main` (commit 17eaf73): $ git grep MILES_DEBUG origin/main (empty) The real `MILES_*` env vars Miles reads from `os.environ` are: - MILES_BACKEND (deprecated) - MILES_HOST_IP - MILES_PREFER_IPV6 - MILES_HTTP_POST_ACTORS_PER_NODE - MILES_EXPERIMENTAL_ROLLOUT_REFACTOR - MILES_TEST_R3_THRESHOLD - MILES_SCRIPT_* - MILES_TEST_ENABLE_INFINITE_RUN `MILES_DEBUG` is not any of these; setting it has no effect. Drop the row from the CLI reference Environment-variables table and the matching line from the developer debug page's "Useful debugging knobs" block.
There is no `v0.1.0` tag on `radixark/miles` (verified earlier with `git ls-remote --tags`), and the page describes a release that hasn't shipped. Remove it rather than continue carrying an aspirational release post. Changes: - Delete `docs/blog/release-v0.1.0.md`. - Remove the catalogue card from `docs/blog/index.md`. - Remove the nav entry from `mkdocs.yml`. Same shape as #37 (which removed the placeholder OPD example).
Cross-checked the table against `radixark/miles@origin/main` (commit 17eaf73): - `reward` is not the metric name — Miles logs `rollout/raw_reward` (`miles/ray/rollout.py:685`). Rename the row. - There is no `kl` metric. The two real KL metrics emitted by the trainer are: * `kl_loss` — KL of the actor vs. the reference model (computed in `loss.py:738-746`, only logged when `--use-kl-loss` is set). * `ppo_kl` — in-batch new-vs-old policy KL. The "policy diverged from ref" row is about the first one; rename `kl` -> `kl_loss` and add a note that it only appears under `--use-kl-loss`. - Add a row for `train_rollout_logprob_abs_diff` — the canonical signal for train/inference precision drift, computed in `loss.py:756` as the mean absolute difference between trainer-side and rollout-side log-probs. Other rows are unchanged.
The flag-mapping table claimed `--use-distributed-optimizer` was "(opt-in)" for the Megatron backend. Verified on `radixark/miles@17eaf73` — Miles overrides it inside `set_default_megatron_args` (`miles/backends/megatron_utils/arguments.py:14`): def set_default_megatron_args(args): # always use zero optimizer args.use_distributed_optimizer = True So the distributed optimizer is forced on for the Megatron backend regardless of what the user passes. Update the parenthetical to match.
The Router-endpoints table rendered `/{path:path}` as if it were a
literal URL. It's actually FastAPI's catch-all route syntax —
`:path` lets the variable match any string including slashes, so the
row is the rule that proxies any URL not matched above to a selected
SGLang worker (`miles/router/router.py`).
Replace the cell with prose ("any other path") plus a few representative
forwarded paths (`/generate`, `/v1/chat/completions`, `/health`) so
readers know what they'll see in practice. The other two rows are
untouched.
docs: remove the Release v0.1.0 blog page
The shown signature didn't match the actual call surface. Three issues, mirroring #1055: - Said `async def`, but the legacy `call_rollout_fn` invokes the function synchronously (`output = fn(*args, **kwargs, ...)`); an async impl returns a coroutine and breaks. - Omitted `data_source`. Real call is `fn(args, rollout_id, data_source, evaluation=evaluation)` (per `miles/ray/rollout.py:565` and the default impl `miles.rollout.sglang_rollout.generate_rollout`). - Used keyword-only `*, evaluation=False`. Real param is a regular keyword argument. Reference example: `examples/fully_async/fully_async_rollout.py:333`: def generate_rollout_fully_async(args, rollout_id, data_buffer, evaluation=False): Update the doc snippet to match.
docs(monitoring): use real metric names in 'What to watch'
docs(usage): mark --use-distributed-optimizer as forced on, not opt-in
docs(monitoring): clarify the router catch-all route
docs: drop fictional MILES_DEBUG=1 env var
The shown signature said `samples: list[Sample]` and iterated as if each item were a `Sample`, but Miles passes `list[list[Sample]]` — a list of `n_samples_per_prompt`-size groups (verified at `miles/rollout/sglang_rollout.py:462` and `miles/rollout/inference_rollout/inference_rollout_train.py:151`). A user copy-pasting the previous signature would crash with `AttributeError: 'list' object has no attribute 'remove_sample'`. Update the example to nested-iterate groups, and add a one-line note explaining the shape. Companion to #1056, which fixes the matching arg-help string in the upstream parser.
docs(customization): fix --rollout-sample-filter-path signature
docs(customization): fix --rollout-function-path signature
Mirror the GLM page template (Model Introduction → Variants → Environment Setup → Launch → Recipe Configuration → Pairs Well With) for DeepSeek V4 Flash. Documents the 8-node H200 production recipe, the prepare-spmd world_size workaround, and the practical knobs needed to keep rollout stable (httpx pool, sglang mem fraction, train memory margin, save-storm mitigation, truncation-drift trap). Also surface V4 in the family index — adds the V4 row to the variants table and a "Fastest path to train" stanza.
The shown signature for `--buffer-filter-path` annotated `rollout_id:
int`, but Miles always passes `None` at the only call site
(`miles/rollout/data_source.py:186`):
samples = self.buffer_filter(self.args, None, self.buffer, num_samples)
Widen the annotation to `int | None` so it reflects what the function
actually receives.
- Rename deepseek-v4.md to deepseek-v4-flash.md (file scope is Flash-only). - Drop the smoke-only 4-layer variant row and single-node smoke command. - Drop the V4-Pro-is-roadmap line on the Flash page. - Reword the GB300 image note to "will be published soon". - Add a DeepSeek-V4-Pro variants row + placeholder page on the family index. - Update mkdocs nav and internal cross-links to the new filenames.
- Remove the wandb-run-name argparse and three operational-workaround bullets (sglang wake_up OOM, save-storm pod death, truncation-drift) from §5.5 — these are operational lessons, not part of the recipe. - Correct the V4-Flash active-parameter count from ~37 B to 13 B and fill in V4-Pro as 49 B / 1.6 T across the family index, the Flash page intro, and the Pro placeholder.
docs(customization): widen --buffer-filter-path rollout_id annotation
Easier to scan than a single prose paragraph; matches the bullet style used in the surrounding sections.
The 2026/02 entry linked text was "Server arguments" but the linked page's actual title is "CLI Reference". Use the page's title so the link text matches what the reader lands on.
The link texts in 'Latest updates' were ad-hoc descriptions
("Low-precision guide", "Multi-agent walkthrough", "Design doc",
"Post", "Docs") that didn't match the linked pages' titles. Replace
each with the actual page title so the link text matches what the
reader lands on.
The Introducing Miles post duplicated a stripped-down version of the docker pull / docker run / bash recipe that already lives in the Quick Start page (and is easy to drift from). Drop the code block and point the reader directly at Quick Start, which has the canonical commands.
Per review comment on PR #57.
- Add train.py / train_async.py as the canonical entry points. - Replace stale miles_plugins blurb with its current subpackages (mbridge/, megatron_bridge/, models/). - Expand scripts/ with amd/, tools/, and run_*.py Python launchers.
docs(home): split Supported Hardware into NVIDIA / AMD bullets
docs(home): align all Latest-updates link texts with target page titles
docs: document deepseek-v4 disaggregated rollout mode (miles#1310)
…ite restructure, deepseek single-image) - #1290: drop router middleware_hub references (architecture, customization, monitoring, rollout-endpoints) - #1227: update tests/ tree to location-based CI discovery (fast/, fast-gpu/, ci/, e2e/) in developer/architecture - #1311: collapse deepseek-v4 recipes to a single image, drop the retired gb300-dev-dskv4 tag (keeping miles-doc's :latest naming)
yueming's earlier sync only ported agentic-chat-template.md; #1291 also touched these two files: - user-guide/index.md: update the Agentic Chat Templates row to the TITO wording - user-guide/rollout-endpoints.md: repoint the agentic example from openai_format/dapo_math to examples/experimental/swe-agent-v2
docs: sync content from miles repo
There was a problem hiding this comment.
Code Review
This pull request migrates and expands the documentation for the Miles reinforcement learning framework, introducing a comprehensive set of guides under the docs_new directory. The review feedback identifies several documentation bugs and formatting issues, including outdated repository links in docs.json, an incorrect relative path in the agentic chat template guide, invalid Mermaid diagram syntax in the quick start and architecture guides, and JSX expressions in the FAQ's Accordion titles that could break Mintlify builds. Code suggestions are provided to resolve all of these issues.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| "title": "Raise a docs issue", | ||
| "description": "Flag a wording, formatting, or accuracy issue on this page.", | ||
| "icon": "github", | ||
| "href": "https://github.com/radixark/miles-doc/issues/new?template=doc-polish.yml" |
There was a problem hiding this comment.
The link points to the old standalone radixark/miles-doc repository. Since the documentation has been migrated to the main repository, this should be updated to point to radixark/miles.
| "href": "https://github.com/radixark/miles-doc/issues/new?template=doc-polish.yml" | |
| "href": "https://github.com/radixark/miles/issues/new?template=doc-polish.yml" |
| "items": [ | ||
| { | ||
| "label": "Raise a docs issue", | ||
| "href": "https://github.com/radixark/miles-doc/issues/new?template=doc-polish.yml" |
There was a problem hiding this comment.
The link points to the old standalone radixark/miles-doc repository. Since the documentation has been migrated to the main repository, this should be updated to point to radixark/miles.
| "href": "https://github.com/radixark/miles-doc/issues/new?template=doc-polish.yml" | |
| "href": "https://github.com/radixark/miles/issues/new?template=doc-polish.yml" |
| }, | ||
| { | ||
| "label": "Source on GitHub", | ||
| "href": "https://github.com/radixark/miles-doc" |
There was a problem hiding this comment.
| ## Add a new model | ||
|
|
||
| Models in the table are verified by Miles maintainers — just pick the family. To support a new model (or a new append-role surface), register a `TITOTokenizer` subclass plus its fixed Jinja template (or HF-native + kwargs) and `SUPPORTED_TEMPLATES` rows in [`tito_tokenizer.py`](../../miles/utils/chat_template_utils/tito_tokenizer.py), then verify with both scripts — either failing blocks it. Each prints `Verdict: PASS/FAIL`. | ||
|
|
There was a problem hiding this comment.
The relative path ../../miles/utils/chat_template_utils/tito_tokenizer.py resolves to docs_new/miles/miles/utils/chat_template_utils/tito_tokenizer.py because this markdown file is located at docs_new/miles/docs/user-guide/agentic-chat-template.md. Since the miles package is at the root of the repository, the relative path needs to go up 4 levels to reach the root.
| `SUPPORTED_TEMPLATES` rows in [`tito_tokenizer.py`](../../../../miles/utils/chat_template_utils/tito_tokenizer.py), then verify with both scripts — either failing blocks it. Each prints `Verdict: PASS/FAIL`. |
| P[Prompt dataset] --> R[SGLang rollout] | ||
| R --> RM[Reward fn] | ||
| RM --> A[Megatron actor] | ||
| A == P2P weight sync ==> R |
| T1 <-- weight sync --> R1 | ||
| T1 <-- weight sync --> R2 |
|
|
||
| </Accordion> | ||
|
|
||
| <Accordion title={<>I'm OOM during training. What is <code>max_tokens_per_gpu</code>?</>}> |
There was a problem hiding this comment.
Using JSX elements/expressions like {<>...</>} inside the title attribute of the <Accordion> component can cause build failures or rendering issues in Mintlify, as the title prop is typically expected to be a plain string. Using a plain string is safer and fully compatible.
| <Accordion title={<>I'm OOM during training. What is <code>max_tokens_per_gpu</code>?</>}> | |
| <Accordion title="I'm OOM during training. What is max_tokens_per_gpu?"> |
|
|
||
| </Accordion> | ||
|
|
||
| <Accordion title={<>Multi-node training fails with <code>transformers cannot find a model</code>.</>}> |
There was a problem hiding this comment.
Using JSX elements/expressions like {<>...</>} inside the title attribute of the <Accordion> component can cause build failures or rendering issues in Mintlify, as the title prop is typically expected to be a plain string. Using a plain string is safer and fully compatible.
| <Accordion title={<>Multi-node training fails with <code>transformers cannot find a model</code>.</>}> | |
| <Accordion title="Multi-node training fails with 'transformers cannot find a model'."> |
83b3b63 to
27d8f86
Compare
|
Owesome. Man you saved me. |
Standard /docs subpath hosting drops Mintlify's nested-directory requirement, so content moves from docs/miles/docs/ up to docs/. All internal links and docs.json paths lose the /miles/docs prefix.
|
LGTM! btw here could we see the preview web generated by Mintlify? |
This migrates the documentation into the main repo. The old
docs/directory (a stale snapshot) is replaced by the current docs, served with Mintlify at miles.radixark.com/docs.docs/is the Mintlify content root, pages live directly inside it.docs.jsonand README doc links point at this repo / the new docs domain.mint dev(pages, nav, assets render) andmint broken-links(zero broken links).docs/README.mdcovers the layout and local preview.Hosting cutover (after merge): Mintlify dashboard → repo
radixark/miles, monorepo path/docs, custom domainmiles.radixark.comwith "Host at /docs" enabled. The oldwww.radixark.com/miles/docsURLs get 301 redirects from the main site.main.