Permanently move links to well lit paths from old post

petecheslock · petecheslock · commit 94fb016a957b · 2025-12-01T16:31:38.000-05:00
Signed-off-by: Pete Cheslock &lt;pete.cheslock@redhat.com&gt;
diff --git a/blog/2025-07-29_llm-d-v0.2-our-first-well-lit-paths.md b/blog/2025-07-29_llm-d-v0.2-our-first-well-lit-paths.md
@@ -27,9 +27,9 @@ Our deployments have been tested and benchmarked on recent GPUs, such as H200 no
 
 We’ve defined and improved three well-lit paths that form the foundation of this release:
 
-* [**Intelligent inference scheduling over any vLLM deployment**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/inference-scheduling): support for precise prefix-cache aware routing with no additional infrastructure, out-of-the-box load-aware scheduling for better tail latency that “just works”, and a new configurable scheduling profile system enable teams to see immediate latency wins and still customize scheduling behavior for their workloads and infrastructure.  
-* [**P/D disaggregation**:](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/pd-disaggregation) support for separating prefill and decode workloads to improve latency and GPU utilization for long-context scenarios.  
-* [**Wide expert parallelism for DeepSeek R1 (EP/DP)**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/wide-ep-lws): support for large-scale multi-node deployments using expert and data parallelism patterns for MoE models. This includes optimized deployments leveraging NIXL+UCX for inter-node communication, with fixes and improvements to reduce latency, and demonstrates the use of LeaderWorkerSet for Kubernetes-native inference orchestration.
+* [**Intelligent inference scheduling over any vLLM deployment**](https://github.com/llm-d/llm-d/tree/main/guides/inference-scheduling): support for precise prefix-cache aware routing with no additional infrastructure, out-of-the-box load-aware scheduling for better tail latency that “just works”, and a new configurable scheduling profile system enable teams to see immediate latency wins and still customize scheduling behavior for their workloads and infrastructure.  
+* [**P/D disaggregation**:](https://github.com/llm-d/llm-d/tree/main/guides/pd-disaggregation) support for separating prefill and decode workloads to improve latency and GPU utilization for long-context scenarios.  
+* [**Wide expert parallelism for DeepSeek R1 (EP/DP)**](https://github.com/llm-d/llm-d/tree/main/guides/wide-ep-lws): support for large-scale multi-node deployments using expert and data parallelism patterns for MoE models. This includes optimized deployments leveraging NIXL+UCX for inter-node communication, with fixes and improvements to reduce latency, and demonstrates the use of LeaderWorkerSet for Kubernetes-native inference orchestration.
 
 All of these scenarios are reproducible: we provide reference hardware specs, workloads, and benchmarking harness support, so others can evaluate, reproduce, and extend these benchmarks easily. This also reflects improvements to our deployment tooling and benchmarking framework, a new "machinery" that allows users to set up, test, and analyze these scenarios consistently.