Skip to content

Commit 1e31243

Browse files
authored
Add support for nested guide directories and clarify branch sync strategy (#118)
* Add to README how to test a different branch on guide sync Signed-off-by: Pete Cheslock <[email protected]> * Enhance README and guide generator to support nested directories for dynamic guides - Updated README.md to include instructions for configuring remote guides from nested directories, detailing the use of `targetFilename` for top-level page generation. - Modified guide-generator.js to add new dynamic guides for 'Prefix Cache Storage' and 'Prefix Cache Storage - CPU', including sidebar positions and descriptions. Signed-off-by: Pete Cheslock <[email protected]> * Nest the CPU example under the main header Signed-off-by: Pete Cheslock <[email protected]> * Update guide generator and .gitignore for tiered prefix cache documentation - Renamed 'prefix-cache-storage' to 'tiered-prefix-cache' in guide-generator.js, updating titles and target filenames accordingly. - Adjusted .gitignore to reflect the new directory structure for tiered prefix cache documentation. Signed-off-by: Pete Cheslock <[email protected]> * Update components-data.yaml and sync-release.mjs for v0.4.0 release - Updated release information in components-data.yaml to reflect version v0.4.0, including release date and URL. - Modified sidebar labels for several components to remove quotes for consistency. - Updated version numbers for llm-d-modelservice and llm-d-infra components. - Enhanced regex in sync-release.mjs to capture additional version formats. Signed-off-by: Pete Cheslock <[email protected]> * Add GKE with 0.4 release Signed-off-by: Pete Cheslock <[email protected]> * Permanently move links to well lit paths from old post Signed-off-by: Pete Cheslock <[email protected]> --------- Signed-off-by: Pete Cheslock <[email protected]>
1 parent 5eb5533 commit 1e31243

File tree

7 files changed

+95
-38
lines changed

7 files changed

+95
-38
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ docs/community/sigs.md
2222
docs/guide/guide.md
2323
docs/guide/Installation/*.md
2424
docs/guide/InfraProviders/*.md
25+
docs/guide/Installation/tiered-prefix-cache/*.md
2526
docs/usage/**/*.md
2627
# Keep category files for sidebar configuration
2728
!docs/guide/Installation/_category_.json

README.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,55 @@ git push # Triggers automatic deployment
9797
- Component descriptions and version tags
9898
- **Components** sync from their individual release tags
9999
- **Guides** sync from the llm-d/llm-d release tag
100+
- **Architecture docs** sync from the llm-d/llm-d release tag
100101
- **Community docs** always sync from `main` branch (latest)
101102

103+
### Understanding Versioned vs. Always-Current Content
104+
105+
The remote content system supports two sync strategies:
106+
107+
**Versioned Content** (syncs from release tags):
108+
- **Guides** (`docs/guide/`) - Uses `RELEASE_INFO.version` from `components-data.yaml`
109+
- **Architecture** (`docs/architecture/`) - Uses `RELEASE_INFO.version` from `components-data.yaml`
110+
- **Components** (`docs/architecture/Components/`) - Each component uses its own `version` field from `components-data.yaml`
111+
- **Infrastructure Providers** (`docs/guide/InfraProviders/`) - Uses `RELEASE_INFO.version` from `components-data.yaml`
112+
113+
These docs are pinned to specific release tags (e.g., `v0.3.1`) to ensure documentation matches the released code. When you update `release.version` in `components-data.yaml`, all versioned content automatically syncs from the new tag.
114+
115+
**Always-Current Content** (syncs from `main` branch):
116+
- **Community docs** (`docs/community/`) - Contributing guidelines, Code of Conduct, Security Policy, SIGs
117+
- These are configured via `COMMON_REPO_CONFIGS['llm-d-main'].branch = 'main'` in `component-configs.js`
118+
119+
Community documentation stays current with the latest policies and processes, independent of releases. The `branch` field in `COMMON_REPO_CONFIGS` controls this behavior.
120+
121+
**How it works:**
122+
- `generateRepoUrls()` in `component-configs.js` prefers `version` over `branch` when both exist
123+
- Versioned content sources call `findRepoConfig('llm-d')` and use `RELEASE_INFO.version`
124+
- Community sources call `findRepoConfig('llm-d')` and use `repoConfig.branch` (which is `'main'`)
125+
- This separation lets you cut releases without worrying about stale community policies
126+
127+
### Testing content from a feature branch
128+
129+
To preview remote docs from a work-in-progress branch (for example `liu-cong-debug`), temporarily set `release.version` in `remote-content/remote-sources/components-data.yaml` to that branch name. Run `npm start` or `npm run build` to pull the branch content into the site. When testing is done, change `release.version` back to the released tag so production remains on the official docs.
130+
131+
### Supporting remote guides from nested directories
132+
133+
Dynamic guides are configured in `remote-content/remote-sources/guide/guide-generator.js`. Each entry in `DYNAMIC_GUIDES` points at a `README.md` inside `guides/<dirName>/` in the main repo. By default, the generator mirrors the directory structure when it creates docs: `dirName: 'some-folder/sub-guide'` produces `some-folder/sub-guide.md` under `docs/guide/Installation`, and the sidebar groups pages under a folder.
134+
135+
If you want to surface a nested source as a top-level page, add an optional `targetFilename` to the guide definition. Example:
136+
137+
```javascript
138+
{
139+
dirName: 'prefix-cache-storage/cpu',
140+
title: 'Prefix Cache Storage - CPU',
141+
description: '',
142+
sidebarPosition: 5,
143+
targetFilename: 'prefix-cache-storage-cpu.md'
144+
}
145+
```
146+
147+
With `targetFilename`, the generator still reads `guides/prefix-cache-storage/cpu/README.md`, but it writes the output to `docs/guide/Installation/prefix-cache-storage-cpu.md`, letting the page appear alongside other top-level guides. Leave `targetFilename` out to keep the default nested behavior.
148+
102149
**Manual updates:** You can also manually edit `components-data.yaml` if needed.
103150

104151
### Adding New Components

blog/2025-07-29_llm-d-v0.2-our-first-well-lit-paths.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,9 @@ Our deployments have been tested and benchmarked on recent GPUs, such as H200 no
2727

2828
We’ve defined and improved three well-lit paths that form the foundation of this release:
2929

30-
* [**Intelligent inference scheduling over any vLLM deployment**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/inference-scheduling): support for precise prefix-cache aware routing with no additional infrastructure, out-of-the-box load-aware scheduling for better tail latency that “just works”, and a new configurable scheduling profile system enable teams to see immediate latency wins and still customize scheduling behavior for their workloads and infrastructure.
31-
* [**P/D disaggregation**:](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/pd-disaggregation) support for separating prefill and decode workloads to improve latency and GPU utilization for long-context scenarios.
32-
* [**Wide expert parallelism for DeepSeek R1 (EP/DP)**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/wide-ep-lws): support for large-scale multi-node deployments using expert and data parallelism patterns for MoE models. This includes optimized deployments leveraging NIXL+UCX for inter-node communication, with fixes and improvements to reduce latency, and demonstrates the use of LeaderWorkerSet for Kubernetes-native inference orchestration.
30+
* [**Intelligent inference scheduling over any vLLM deployment**](https://github.com/llm-d/llm-d/tree/main/guides/inference-scheduling): support for precise prefix-cache aware routing with no additional infrastructure, out-of-the-box load-aware scheduling for better tail latency that “just works”, and a new configurable scheduling profile system enable teams to see immediate latency wins and still customize scheduling behavior for their workloads and infrastructure.
31+
* [**P/D disaggregation**:](https://github.com/llm-d/llm-d/tree/main/guides/pd-disaggregation) support for separating prefill and decode workloads to improve latency and GPU utilization for long-context scenarios.
32+
* [**Wide expert parallelism for DeepSeek R1 (EP/DP)**](https://github.com/llm-d/llm-d/tree/main/guides/wide-ep-lws): support for large-scale multi-node deployments using expert and data parallelism patterns for MoE models. This includes optimized deployments leveraging NIXL+UCX for inter-node communication, with fixes and improvements to reduce latency, and demonstrates the use of LeaderWorkerSet for Kubernetes-native inference orchestration.
3333

3434
All of these scenarios are reproducible: we provide reference hardware specs, workloads, and benchmarking harness support, so others can evaluate, reproduce, and extend these benchmarks easily. This also reflects improvements to our deployment tooling and benchmarking framework, a new "machinery" that allows users to set up, test, and analyze these scenarios consistently.
3535

remote-content/remote-sources/components-data.yaml

Lines changed: 15 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,55 +2,49 @@
22
# This file contains static data for generating the Components documentation page
33
# Update this file when there are new releases or component changes
44
#
5-
# Last synced from: https://github.com/llm-d/llm-d/releases/tag/v0.3.1
6-
# Sync date: 2025-11-11T15:13:04.200Z
5+
# Last synced from: https://github.com/llm-d/llm-d/releases/tag/v0.4.0
6+
# Sync date: 2025-12-01T21:17:30.109Z
77

88
release:
9-
version: v0.3.1
10-
releaseDate: '2025-11-06'
11-
releaseDateFormatted: November 6, 2025
12-
releaseUrl: https://github.com/llm-d/llm-d/releases/tag/v0.3.1
13-
releaseName: v0.3.1 Release
9+
version: v0.4.0
10+
releaseDate: '2025-11-26'
11+
releaseDateFormatted: November 26, 2025
12+
releaseUrl: https://github.com/llm-d/llm-d/releases/tag/v0.4.0
13+
releaseName: Release v0.4.0
1414
components:
1515
- name: llm-d-inference-scheduler
1616
org: llm-d
17-
sidebarLabel: "Inference Scheduler"
17+
sidebarLabel: Inference Scheduler
1818
description: This scheduler that makes optimized routing decisions for inference requests to the llm-d inference framework.
1919
sidebarPosition: 1
2020
version: v0.3.2
2121
- name: llm-d-modelservice
2222
org: llm-d-incubation
23-
sidebarLabel: "Model Service"
23+
sidebarLabel: Model Service
2424
description: '`modelservice` is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet).'
2525
sidebarPosition: 2
26-
version: llm-d-modelservice-v0.2.10
27-
- name: llm-d-routing-sidecar
28-
org: llm-d
29-
sidebarLabel: "Routing Sidecar"
30-
description: A reverse proxy redirecting incoming requests to the prefill worker specified in the x-prefiller-host-port HTTP request header.
31-
sidebarPosition: 3
32-
version: v0.3.0
26+
version: llm-d-modelservice-v0.3.8
3327
- name: llm-d-inference-sim
3428
org: llm-d
35-
sidebarLabel: "Inference Simulator"
29+
sidebarLabel: Inference Simulator
3630
description: A light weight vLLM simulator emulates responses to the HTTP REST endpoints of vLLM.
3731
sidebarPosition: 4
3832
version: v0.6.1
3933
- name: llm-d-infra
4034
org: llm-d-incubation
41-
sidebarLabel: "Infrastructure"
35+
sidebarLabel: Infrastructure
4236
description: A helm chart for deploying gateway and gateway related infrastructure assets for llm-d.
4337
sidebarPosition: 5
44-
version: v1.3.3
38+
version: v1.3.4
4539
- name: llm-d-kv-cache-manager
4640
org: llm-d
47-
sidebarLabel: "KV Cache Manager"
41+
sidebarLabel: KV Cache Manager
4842
description: This repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms.
4943
sidebarPosition: 6
5044
version: v0.3.0
5145
- name: llm-d-benchmark
5246
org: llm-d
53-
sidebarLabel: "Benchmark Tools"
47+
sidebarLabel: Benchmark Tools
5448
description: This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.
5549
sidebarPosition: 7
5650
version: v0.3.0

remote-content/remote-sources/guide/guide-generator.js

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -73,36 +73,44 @@ const DYNAMIC_GUIDES = [
7373
description: 'Well-lit path for intelligent inference scheduling with load balancing',
7474
sidebarPosition: 3
7575
},
76+
{
77+
dirName: 'tiered-prefix-cache',
78+
title: 'Prefix Cache Offloading',
79+
description: 'Well-lit path for separating prefill and decode operations',
80+
sidebarPosition: 4,
81+
targetFilename: 'tiered-prefix-cache/index.md'
82+
},
83+
{
84+
dirName: 'tiered-prefix-cache/cpu',
85+
title: 'Prefix Cache Offloading - CPU',
86+
description: 'Well-lit path for separating prefill and decode operations',
87+
sidebarPosition: 5,
88+
targetFilename: 'tiered-prefix-cache/cpu.md'
89+
},
7690
{
7791
dirName: 'pd-disaggregation',
7892
title: 'Prefill/Decode Disaggregation',
7993
description: 'Well-lit path for separating prefill and decode operations',
80-
sidebarPosition: 4
94+
sidebarPosition: 6
8195
},
8296
{
8397
dirName: 'precise-prefix-cache-aware',
8498
title: 'Precise Prefix Cache Aware Routing',
8599
description: 'Feature guide for precise prefix cache aware routing',
86-
sidebarPosition: 5
100+
sidebarPosition: 7
87101
},
88102
{
89103
dirName: 'wide-ep-lws',
90104
title: 'Wide Expert Parallelism with LeaderWorkerSet',
91105
description: 'Well-lit path for wide expert parallelism using LeaderWorkerSet',
92-
sidebarPosition: 6
106+
sidebarPosition: 8
93107
},
94108
{
95109
dirName: 'simulated-accelerators',
96110
title: 'Accelerator Simulation',
97111
description: 'Feature guide for llm-d accelerator simulation',
98-
sidebarPosition: 7
99-
},
100-
{
101-
dirName: 'predicted-latency-based-scheduling',
102-
title: 'Predicted Latency Based Load Balancing',
103-
description: 'Well-lit path for predicted latency based load balancing',
104-
sidebarPosition: 8
105-
},
112+
sidebarPosition: 9
113+
}
106114
];
107115

108116
/**
@@ -147,6 +155,7 @@ function createGuidePlugins() {
147155
// Add dynamic guides
148156
DYNAMIC_GUIDES.forEach((guide) => {
149157
const sourceFile = `guides/${guide.dirName}/README.md`;
158+
const targetFilename = guide.targetFilename || `${guide.dirName}.md`;
150159

151160
plugins.push([
152161
'docusaurus-plugin-remote-content',
@@ -166,7 +175,7 @@ function createGuidePlugins() {
166175
sidebarLabel: guide.title,
167176
sidebarPosition: guide.sidebarPosition,
168177
filename: sourceFile,
169-
newFilename: `${guide.dirName}.md`,
178+
newFilename: targetFilename,
170179
repoUrl,
171180
branch: releaseVersion,
172181
content,

remote-content/remote-sources/infra-providers/infra-providers-generator.js

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,14 +39,20 @@ const INFRA_PROVIDERS = [
3939
{
4040
dirName: 'aks',
4141
title: 'Azure Kubernetes Service',
42-
description: 'Deploy llm-d on Azure Kubernetes Service',
42+
description: 'Deploy llm-d on Azure Kubernetes Service (AKS)',
4343
sidebarPosition: 1
4444
},
4545
{
4646
dirName: 'digitalocean',
4747
title: 'DigitalOcean Kubernetes Service (DOKS)',
4848
description: 'Deploy llm-d on DigitalOcean Kubernetes Service (DOKS)',
4949
sidebarPosition: 2
50+
},
51+
{
52+
dirName: 'gke',
53+
title: 'Google Kubernetes Engine (GKE)',
54+
description: 'Deploy llm-d on Google Kubernetes Engine (GKE)',
55+
sidebarPosition: 3
5056
}
5157
];
5258

remote-content/remote-sources/sync-release.mjs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ function extractComponents(releaseBody) {
8989
// Extract version from diff if available
9090
let version = null;
9191
if (diff) {
92-
const versionMatch = diff.match(/\s*(v[\d.]+)/);
92+
const versionMatch = diff.match(/\s*(v[\d.]+(?:-[a-zA-Z0-9.]+)?)/);
9393
if (versionMatch) {
9494
version = versionMatch[1];
9595
}

0 commit comments

Comments
 (0)