Pegaflow

KV cache on the wings of Pegasus.

PegaFlow is a high-performance KV cache storage engine for LLM inference. Offload KV cache from GPU to host memory or SSD, and share it across nodes via RDMA.

Decoupled from inference lifecycle — runs as an independent sidecar; KV cache survives engine restarts, scales independently, and is shared across instances
Topology-aware, PCIe-saturating transfers — NUMA-aware pinned memory + layer-wise DMA to maximize hardware bandwidth
GIL-free Rust core — zero Python overhead on the hot path; your inference engine keeps its threads
Production-ready observability — built-in Prometheus metrics and OTLP export, not an afterthought
Pluggable — works with vLLM as a drop-in KV connector

News

2026-05-18 — vLLM x Novita AI: PegaFlow for Production-Grade External KV Cache, a joint blog post with the vLLM team.

Architecture

Framework Integration

Framework	Status	Link
vLLM	✅ Ready	Quick Start

Quick Start

1. Install

uv pip install pegaflow-llm        # CUDA 12
uv pip install pegaflow-llm-cu13   # CUDA 13

2. Start PegaFlow Server

pegaflow-server

3. Launch your inference engine

vLLM:

vllm serve Qwen/Qwen3-0.6B \
  --kv-transfer-config '{"kv_connector": "PegaKVConnector", "kv_role": "kv_both", "kv_connector_module_path": "pegaflow.connector"}'

For full server options, multi-node setup, and advanced configuration, see Server Configuration.

Development

Build from source

export PYO3_PYTHON=$(which python)
export LD_LIBRARY_PATH=$(python -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))"):$LD_LIBRARY_PATH

cargo run -r                    # start server
cd python && maturin develop -r # build Python bindings

The default source build targets CUDA 12.8. If your environment uses CUDA 13, disable the default CUDA feature and enable cuda-13 explicitly:

cargo run -r --no-default-features --features cuda-13 --bin pegaflow-server
cd python && uv run maturin develop -r --no-default-features --features cuda-13
./scripts/build-wheel.sh --release --no-default-features --features cuda-13

We use Conventional Commits — run cz c for an interactive commit prompt.

Benchmarks

KV Cache Benchmark

H800 reference numbers with Llama-3.1-8B (8 prompts, 10K-token prefill, 1-token decode, 4.0 req/s):

Configuration	TTFT mean (ms)	TTFT p99 (ms)
PegaFlow (Cold)	572.5	1113.7
PegaFlow (Warm)	61.5	77.0

The warm-start path achieves ~9x faster TTFT compared to cold-start, demonstrating effective KV cache sharing across requests.

Documentation

Server Configuration — full CLI options, SSD cache, multi-node setup
Python Package — Python bindings and vLLM connector configuration
P2P KV Cache Sharing — cross-node RDMA setup, tuning, and troubleshooting
P/D Router — prefill/decode disaggregation
vLLM I/O Patch — optional patch for better transfer throughput
Metrics — Prometheus and OTLP metrics reference
Goals & Non-Goals — project scope and design philosophy

Name		Name	Last commit message	Last commit date
Latest commit History 319 Commits
.claude/skills		.claude/skills
.github/workflows		.github/workflows
assets		assets
docs		docs
examples		examples
pegaflow-common		pegaflow-common
pegaflow-core		pegaflow-core
pegaflow-metaserver		pegaflow-metaserver
pegaflow-proto		pegaflow-proto
pegaflow-server		pegaflow-server
pegaflow-transfer		pegaflow-transfer
python		python
scripts		scripts
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pegaflow.code-workspace		pegaflow.code-workspace
prek.toml		prek.toml
typos.toml		typos.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pegaflow

News

Architecture

Framework Integration

Quick Start

1. Install

2. Start PegaFlow Server

3. Launch your inference engine

Development

Build from source

Benchmarks

KV Cache Benchmark

Documentation

About

Uh oh!

Releases 23

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pegaflow

News

Architecture

Framework Integration

Quick Start

1. Install

2. Start PegaFlow Server

3. Launch your inference engine

Development

Build from source

Benchmarks

KV Cache Benchmark

Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages