Enable adaptive Skippy speculation by i386 · Pull Request #446 · Mesh-LLM/mesh-llm

i386 · 2026-05-06T10:49:34Z

Summary

Add a new skippy-speculative crate for reusable n-gram policy, verification-span classification, repair strategy, and MTP notes.
Add OpenAI --openai-ngram-auto with prompt/repeated-suffix gating, shared acceptance/cost telemetry, and adaptive repair behavior.
Default the embedded mesh Skippy serving preset to adaptive n-gram auto mode while keeping standalone skippy-server opt-in.
Add scripts/skippy-openai-ngram-bench.sh and README/Mermaid diagrams that document how n-gram, draft-model, and fallback decode are compared.

Architecture

skippy-speculative owns speculation policy and telemetry-shaped decisions; skippy-server remains responsible for target verification and streaming.
Auto n-gram mode enables immediately for favorable prompt shape or after repeated suffix hits; otherwise it stays in cold observation to avoid verifier overhead.
Draft-model speculation remains explicit because the realistic pair tested was slower despite functional compatibility.

Protocol

No mesh wire/protobuf/gossip protocol changes.
CLI and OpenAI server flags are additive; older mesh nodes continue to interoperate as before.

Validation

LLAMA_STAGE_BUILD_DIR=.deps/llama.cpp/build-stage-abi-metal cargo test -p skippy-speculative -p skippy-server --lib
LLAMA_STAGE_BUILD_DIR=.deps/llama.cpp/build-stage-abi-metal cargo test -p mesh-llm inference::skippy --lib
LLAMA_STAGE_BUILD_DIR=.deps/llama.cpp/build-stage-abi-metal cargo check -p skippy-server -p mesh-llm -p skippy-bench
bash -n scripts/skippy-openai-ngram-bench.sh
cargo fmt --all -- --check
git diff --check

Benchmark Evidence

Qwen 3 4B depth-1 mixed, 64 max tokens: ngram-auto 32.31 tok/s vs baseline 30.89 tok/s, 1.05x.
Qwen 3 4B depth-1 coding-warm, 64 max tokens: ngram-auto 41.19 tok/s vs baseline 33.52 tok/s, 1.23x.
Qwen 3 4B depth-4 stress: zero errors; mixed 1.01x and coding-warm 0.99x, so safe/neutral under concurrency rather than a throughput win.
Llama 3.2 3B target plus 1B draft loaded successfully, but draft-adaptive was slower, so draft stays explicit and pair-dependent.

michaelneale · 2026-05-07T01:55:15Z

Worth trying with 10x parameter size models?

Is there some threshold where draft helps? Would you use ngram and draft?

github-actions · 2026-05-12T02:38:35Z

This pull request has not been updated in at least 5 days. It will be closed after 7 days of inactivity to keep the active review queue current. Please update it within 2 days if the changes are still moving forward.

…-auto

github-actions · 2026-05-19T02:50:35Z

Closing this pull request because it has not been updated in at least 7 days. Please reopen or create a fresh pull request when the changes are ready to continue.

i386 force-pushed the jd/skippy-speculative-auto branch from b49348a to 9aa0094 Compare May 6, 2026 11:06

i386 marked this pull request as ready for review May 6, 2026 11:06

i386 force-pushed the jd/skippy-speculative-auto branch from 9aa0094 to 252a538 Compare May 6, 2026 11:20

Promote adaptive n-gram speculation

dc235d2

i386 force-pushed the jd/skippy-speculative-auto branch from 252a538 to dc235d2 Compare May 6, 2026 11:44

Merge branch 'main' into jd/skippy-speculative-auto

82a1db3

i386 requested review from IvGolovach, michaelneale and ndizazzo May 6, 2026 19:03

i386 added the experimental label May 6, 2026

github-actions Bot added the stale label May 12, 2026

Merge remote-tracking branch 'origin/main' into jd/skippy-speculative…

d90a66d

…-auto

github-actions Bot removed the stale label May 13, 2026

ndizazzo added the stale label May 16, 2026

github-actions Bot closed this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable adaptive Skippy speculation#446

Enable adaptive Skippy speculation#446
i386 wants to merge 3 commits into
mainfrom
jd/skippy-speculative-auto

i386 commented May 6, 2026

Uh oh!

michaelneale commented May 7, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

i386 commented May 6, 2026

Summary

Architecture

Protocol

Validation

Benchmark Evidence

Uh oh!

michaelneale commented May 7, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants