Add support for nested guide directories and clarify branch sync strategy (#118)

petecheslock · web-flow · commit 1e31243207bf · 2025-12-01T15:46:16.000-06:00
* Add to README how to test a different branch on guide sync

Signed-off-by: Pete Cheslock &lt;pete.cheslock@redhat.com&gt;

* Enhance README and guide generator to support nested directories for dynamic guides

- Updated README.md to include instructions for configuring remote guides from nested directories, detailing the use of `targetFilename` for top-level page generation.
- Modified guide-generator.js to add new dynamic guides for 'Prefix Cache Storage' and 'Prefix Cache Storage - CPU', including sidebar positions and descriptions.

Signed-off-by: Pete Cheslock &lt;pete.cheslock@redhat.com&gt;

* Nest the CPU example under the main header

Signed-off-by: Pete Cheslock &lt;pete.cheslock@redhat.com&gt;

* Update guide generator and .gitignore for tiered prefix cache documentation

- Renamed 'prefix-cache-storage' to 'tiered-prefix-cache' in guide-generator.js, updating titles and target filenames accordingly.
- Adjusted .gitignore to reflect the new directory structure for tiered prefix cache documentation.

Signed-off-by: Pete Cheslock &lt;pete.cheslock@redhat.com&gt;

* Update components-data.yaml and sync-release.mjs for v0.4.0 release

- Updated release information in components-data.yaml to reflect version v0.4.0, including release date and URL.
- Modified sidebar labels for several components to remove quotes for consistency.
- Updated version numbers for llm-d-modelservice and llm-d-infra components.
- Enhanced regex in sync-release.mjs to capture additional version formats.

Signed-off-by: Pete Cheslock &lt;pete.cheslock@redhat.com&gt;

* Add GKE with 0.4 release

Signed-off-by: Pete Cheslock &lt;pete.cheslock@redhat.com&gt;

* Permanently move links to well lit paths from old post

Signed-off-by: Pete Cheslock &lt;pete.cheslock@redhat.com&gt;

---------

Signed-off-by: Pete Cheslock &lt;pete.cheslock@redhat.com&gt;
diff --git a/.gitignore b/.gitignore
@@ -22,6 +22,7 @@ docs/community/sigs.md
 docs/guide/guide.md
 docs/guide/Installation/*.md
 docs/guide/InfraProviders/*.md
+docs/guide/Installation/tiered-prefix-cache/*.md
 docs/usage/**/*.md
 # Keep category files for sidebar configuration
 !docs/guide/Installation/_category_.json
diff --git a/README.md b/README.md
@@ -97,8 +97,55 @@ git push                           # Triggers automatic deployment
 - Component descriptions and version tags
 - **Components** sync from their individual release tags
 - **Guides** sync from the llm-d/llm-d release tag
+- **Architecture docs** sync from the llm-d/llm-d release tag
 - **Community docs** always sync from `main` branch (latest)
 
+### Understanding Versioned vs. Always-Current Content
+
+The remote content system supports two sync strategies:
+
+**Versioned Content** (syncs from release tags):
+- **Guides** (`docs/guide/`) - Uses `RELEASE_INFO.version` from `components-data.yaml`
+- **Architecture** (`docs/architecture/`) - Uses `RELEASE_INFO.version` from `components-data.yaml`
+- **Components** (`docs/architecture/Components/`) - Each component uses its own `version` field from `components-data.yaml`
+- **Infrastructure Providers** (`docs/guide/InfraProviders/`) - Uses `RELEASE_INFO.version` from `components-data.yaml`
+
+These docs are pinned to specific release tags (e.g., `v0.3.1`) to ensure documentation matches the released code. When you update `release.version` in `components-data.yaml`, all versioned content automatically syncs from the new tag.
+
+**Always-Current Content** (syncs from `main` branch):
+- **Community docs** (`docs/community/`) - Contributing guidelines, Code of Conduct, Security Policy, SIGs
+- These are configured via `COMMON_REPO_CONFIGS['llm-d-main'].branch = 'main'` in `component-configs.js`
+
+Community documentation stays current with the latest policies and processes, independent of releases. The `branch` field in `COMMON_REPO_CONFIGS` controls this behavior.
+
+**How it works:**
+- `generateRepoUrls()` in `component-configs.js` prefers `version` over `branch` when both exist
+- Versioned content sources call `findRepoConfig('llm-d')` and use `RELEASE_INFO.version`
+- Community sources call `findRepoConfig('llm-d')` and use `repoConfig.branch` (which is `'main'`)
+- This separation lets you cut releases without worrying about stale community policies
+
+### Testing content from a feature branch
+
+To preview remote docs from a work-in-progress branch (for example `liu-cong-debug`), temporarily set `release.version` in `remote-content/remote-sources/components-data.yaml` to that branch name. Run `npm start` or `npm run build` to pull the branch content into the site. When testing is done, change `release.version` back to the released tag so production remains on the official docs.
+
+### Supporting remote guides from nested directories
+
+Dynamic guides are configured in `remote-content/remote-sources/guide/guide-generator.js`. Each entry in `DYNAMIC_GUIDES` points at a `README.md` inside `guides/<dirName>/` in the main repo. By default, the generator mirrors the directory structure when it creates docs: `dirName: 'some-folder/sub-guide'` produces `some-folder/sub-guide.md` under `docs/guide/Installation`, and the sidebar groups pages under a folder.
+
+If you want to surface a nested source as a top-level page, add an optional `targetFilename` to the guide definition. Example:
+
+```javascript
+{
+  dirName: 'prefix-cache-storage/cpu',
+  title: 'Prefix Cache Storage - CPU',
+  description: '…',
+  sidebarPosition: 5,
+  targetFilename: 'prefix-cache-storage-cpu.md'
+}
+```
+
+With `targetFilename`, the generator still reads `guides/prefix-cache-storage/cpu/README.md`, but it writes the output to `docs/guide/Installation/prefix-cache-storage-cpu.md`, letting the page appear alongside other top-level guides. Leave `targetFilename` out to keep the default nested behavior.
+
 **Manual updates:** You can also manually edit `components-data.yaml` if needed.
 
 ### Adding New Components
diff --git a/blog/2025-07-29_llm-d-v0.2-our-first-well-lit-paths.md b/blog/2025-07-29_llm-d-v0.2-our-first-well-lit-paths.md
@@ -27,9 +27,9 @@ Our deployments have been tested and benchmarked on recent GPUs, such as H200 no
 
 We’ve defined and improved three well-lit paths that form the foundation of this release:
 
-* [**Intelligent inference scheduling over any vLLM deployment**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/inference-scheduling): support for precise prefix-cache aware routing with no additional infrastructure, out-of-the-box load-aware scheduling for better tail latency that “just works”, and a new configurable scheduling profile system enable teams to see immediate latency wins and still customize scheduling behavior for their workloads and infrastructure.  
-* [**P/D disaggregation**:](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/pd-disaggregation) support for separating prefill and decode workloads to improve latency and GPU utilization for long-context scenarios.  
-* [**Wide expert parallelism for DeepSeek R1 (EP/DP)**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/wide-ep-lws): support for large-scale multi-node deployments using expert and data parallelism patterns for MoE models. This includes optimized deployments leveraging NIXL+UCX for inter-node communication, with fixes and improvements to reduce latency, and demonstrates the use of LeaderWorkerSet for Kubernetes-native inference orchestration.
+* [**Intelligent inference scheduling over any vLLM deployment**](https://github.com/llm-d/llm-d/tree/main/guides/inference-scheduling): support for precise prefix-cache aware routing with no additional infrastructure, out-of-the-box load-aware scheduling for better tail latency that “just works”, and a new configurable scheduling profile system enable teams to see immediate latency wins and still customize scheduling behavior for their workloads and infrastructure.  
+* [**P/D disaggregation**:](https://github.com/llm-d/llm-d/tree/main/guides/pd-disaggregation) support for separating prefill and decode workloads to improve latency and GPU utilization for long-context scenarios.  
+* [**Wide expert parallelism for DeepSeek R1 (EP/DP)**](https://github.com/llm-d/llm-d/tree/main/guides/wide-ep-lws): support for large-scale multi-node deployments using expert and data parallelism patterns for MoE models. This includes optimized deployments leveraging NIXL+UCX for inter-node communication, with fixes and improvements to reduce latency, and demonstrates the use of LeaderWorkerSet for Kubernetes-native inference orchestration.
 
 All of these scenarios are reproducible: we provide reference hardware specs, workloads, and benchmarking harness support, so others can evaluate, reproduce, and extend these benchmarks easily. This also reflects improvements to our deployment tooling and benchmarking framework, a new "machinery" that allows users to set up, test, and analyze these scenarios consistently.
 
diff --git a/remote-content/remote-sources/components-data.yaml b/remote-content/remote-sources/components-data.yaml
@@ -2,55 +2,49 @@
 # This file contains static data for generating the Components documentation page
 # Update this file when there are new releases or component changes
 #
-# Last synced from: https://github.com/llm-d/llm-d/releases/tag/v0.3.1
-# Sync date: 2025-11-11T15:13:04.200Z
+# Last synced from: https://github.com/llm-d/llm-d/releases/tag/v0.4.0
+# Sync date: 2025-12-01T21:17:30.109Z
 
 release:
-  version: v0.3.1
-  releaseDate: '2025-11-06'
-  releaseDateFormatted: November 6, 2025
-  releaseUrl: https://github.com/llm-d/llm-d/releases/tag/v0.3.1
-  releaseName: v0.3.1 Release
+  version: v0.4.0
+  releaseDate: '2025-11-26'
+  releaseDateFormatted: November 26, 2025
+  releaseUrl: https://github.com/llm-d/llm-d/releases/tag/v0.4.0
+  releaseName: Release v0.4.0
 components:
   - name: llm-d-inference-scheduler
     org: llm-d
-    sidebarLabel: "Inference Scheduler"
+    sidebarLabel: Inference Scheduler
     description: This scheduler that makes optimized routing decisions for inference requests to the llm-d inference framework.
     sidebarPosition: 1
     version: v0.3.2
   - name: llm-d-modelservice
     org: llm-d-incubation
-    sidebarLabel: "Model Service"
+    sidebarLabel: Model Service
     description: '`modelservice` is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet).'
     sidebarPosition: 2
-    version: llm-d-modelservice-v0.2.10
-  - name: llm-d-routing-sidecar
-    org: llm-d
-    sidebarLabel: "Routing Sidecar"
-    description: A reverse proxy redirecting incoming requests to the prefill worker specified in the x-prefiller-host-port HTTP request header.
-    sidebarPosition: 3
-    version: v0.3.0
+    version: llm-d-modelservice-v0.3.8
   - name: llm-d-inference-sim
     org: llm-d
-    sidebarLabel: "Inference Simulator"
+    sidebarLabel: Inference Simulator
     description: A light weight vLLM simulator emulates responses to the HTTP REST endpoints of vLLM.
     sidebarPosition: 4
     version: v0.6.1
   - name: llm-d-infra
     org: llm-d-incubation
-    sidebarLabel: "Infrastructure"
+    sidebarLabel: Infrastructure
     description: A helm chart for deploying gateway and gateway related infrastructure assets for llm-d.
     sidebarPosition: 5
-    version: v1.3.3
+    version: v1.3.4
   - name: llm-d-kv-cache-manager
     org: llm-d
-    sidebarLabel: "KV Cache Manager"
+    sidebarLabel: KV Cache Manager
     description: This repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms.
     sidebarPosition: 6
     version: v0.3.0
   - name: llm-d-benchmark
     org: llm-d
-    sidebarLabel: "Benchmark Tools"
+    sidebarLabel: Benchmark Tools
     description: This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.
     sidebarPosition: 7
     version: v0.3.0
diff --git a/remote-content/remote-sources/guide/guide-generator.js b/remote-content/remote-sources/guide/guide-generator.js
@@ -73,36 +73,44 @@ const DYNAMIC_GUIDES = [
     description: 'Well-lit path for intelligent inference scheduling with load balancing',
     sidebarPosition: 3
   },
+  {
+    dirName: 'tiered-prefix-cache',
+    title: 'Prefix Cache Offloading',
+    description: 'Well-lit path for separating prefill and decode operations',
+    sidebarPosition: 4,
+    targetFilename: 'tiered-prefix-cache/index.md'
+  },
+  {
+    dirName: 'tiered-prefix-cache/cpu',
+    title: 'Prefix Cache Offloading - CPU',
+    description: 'Well-lit path for separating prefill and decode operations',
+    sidebarPosition: 5,
+    targetFilename: 'tiered-prefix-cache/cpu.md'
+  },
   {
     dirName: 'pd-disaggregation', 
     title: 'Prefill/Decode Disaggregation',
     description: 'Well-lit path for separating prefill and decode operations',
-    sidebarPosition: 4
+    sidebarPosition: 6
   },
   {
     dirName: 'precise-prefix-cache-aware',
     title: 'Precise Prefix Cache Aware Routing',
     description: 'Feature guide for precise prefix cache aware routing',
-    sidebarPosition: 5
+    sidebarPosition: 7
   },
   {
     dirName: 'wide-ep-lws',
     title: 'Wide Expert Parallelism with LeaderWorkerSet',
     description: 'Well-lit path for wide expert parallelism using LeaderWorkerSet',
-    sidebarPosition: 6
+    sidebarPosition: 8
   },
   {
     dirName: 'simulated-accelerators',
     title: 'Accelerator Simulation',
     description: 'Feature guide for llm-d accelerator simulation',
-    sidebarPosition: 7
-  },
-  {
-    dirName: 'predicted-latency-based-scheduling',
-    title: 'Predicted Latency Based Load Balancing',
-    description: 'Well-lit path for predicted latency based load balancing',
-    sidebarPosition: 8
-  },
+    sidebarPosition: 9
+  }
 ];
 
 /**
@@ -147,6 +155,7 @@ function createGuidePlugins() {
   // Add dynamic guides
   DYNAMIC_GUIDES.forEach((guide) => {
     const sourceFile = `guides/${guide.dirName}/README.md`;
+    const targetFilename = guide.targetFilename || `${guide.dirName}.md`;
     
     plugins.push([
       'docusaurus-plugin-remote-content',
@@ -166,7 +175,7 @@ function createGuidePlugins() {
               sidebarLabel: guide.title,
               sidebarPosition: guide.sidebarPosition,
               filename: sourceFile,
-              newFilename: `${guide.dirName}.md`,
+              newFilename: targetFilename,
               repoUrl,
               branch: releaseVersion,
               content,
diff --git a/remote-content/remote-sources/infra-providers/infra-providers-generator.js b/remote-content/remote-sources/infra-providers/infra-providers-generator.js
@@ -39,14 +39,20 @@ const INFRA_PROVIDERS = [
   {
     dirName: 'aks',
     title: 'Azure Kubernetes Service',
-    description: 'Deploy llm-d on Azure Kubernetes Service',
+    description: 'Deploy llm-d on Azure Kubernetes Service (AKS)',
     sidebarPosition: 1
   },
   {
     dirName: 'digitalocean',
     title: 'DigitalOcean Kubernetes Service (DOKS)',
     description: 'Deploy llm-d on DigitalOcean Kubernetes Service (DOKS)',
     sidebarPosition: 2
+  },
+  {
+    dirName: 'gke',
+    title: 'Google Kubernetes Engine (GKE)',
+    description: 'Deploy llm-d on Google Kubernetes Engine (GKE)',
+    sidebarPosition: 3
   }
 ];
 
diff --git a/remote-content/remote-sources/sync-release.mjs b/remote-content/remote-sources/sync-release.mjs
@@ -89,7 +89,7 @@ function extractComponents(releaseBody) {
     // Extract version from diff if available
     let version = null;
     if (diff) {
-      const versionMatch = diff.match(/→\s*(v[\d.]+)/);
+      const versionMatch = diff.match(/→\s*(v[\d.]+(?:-[a-zA-Z0-9.]+)?)/);
       if (versionMatch) {
         version = versionMatch[1];
       }

Original file line number	Diff line number	Diff line change
`@@ -89,7 +89,7 @@ function extractComponents(releaseBody) {`
`89`	`89`	`// Extract version from diff if available`
`90`	`90`	`let version = null;`
`91`	`91`	`if (diff) {`
`92`		`- const versionMatch = diff.match(/→\s*(v[\d.]+)/);`
	`92`	`+ const versionMatch = diff.match(/→\s*(v[\d.]+(?:-[a-zA-Z0-9.]+)?)/);`
`93`	`93`	`if (versionMatch) {`
`94`	`94`	`version = versionMatch[1];`
`95`	`95`	`}`