Explain Skippy split capacity shortfalls#498
Merged
Merged
Conversation
Validation * Validation tier: Tier 2 - narrow runtime diagnostics, because this only changes Skippy split readiness error reporting and tests in the local runtime path. * git diff --check: PASS * git diff --cached --check: PASS * cargo fmt --all -- --check: PASS * LLAMA_STAGE_BUILD_DIR=.deps/llama-build/build-stage-abi-metal cargo test -p mesh-llm-host-runtime split_topology --lib: PASS, 8 passed * LLAMA_STAGE_BUILD_DIR=.deps/llama-build/build-stage-abi-metal cargo test -p mesh-llm-host-runtime aggregate_split_capacity_error_reports_excluded_peers --lib: PASS, 1 passed * LLAMA_STAGE_BUILD_DIR=.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm-host-runtime: PASS * LLAMA_STAGE_BUILD_DIR=.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS * Ledger: not applicable - repository has no ledger requirement for this change family. * Version: not applicable - no release metadata or public protocol changed. * Not run: just build - not required for selected validation tier; Rust-only runtime diagnostics change with no UI/assets touched. Rollback * git revert HEAD
Collaborator
|
Nice, love it! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Skippy split startup failures now explain the capacity gap instead of only reporting aggregate totals.
Why
During recent relay/join testing, a fresh invite successfully joined the mesh and the node could see a peer with relay RTT. The remaining startup failure was no longer connectivity; it was capacity:
That message proved the mesh was connected, but it did not answer the operational questions that matter next: how far short is the mesh, which participants were counted, and whether any visible peers were excluded from the split candidate set.
This PR makes that failure mode actionable without changing the split planner's placement decisions.
Example
The aggregate-capacity error now has room for the full readiness context:
The participant labels include capacity/cache/missing-artifact/RTT/transfer details, and excluded peers include the reason they were not eligible for the split.
What changed
wait_for_split_participantsnow returns aSplitParticipantSnapshot, preserving both eligible participants and excluded peers after the wait loop.SplitCapacityReadinessReport.Related
Compatibility
Branch integrity
main9cfef9008e952cfc221dcd486073fd920fc6924fe17e8a6c9a57afdb16912fed0c7effa281d986b8git rev-list --left-right --count origin/main...HEAD:0 1crates/mesh-llm-host-runtime/src/runtime/local.rsValidation
Validation tier: Tier 2 - narrow runtime diagnostics. The diff only changes Skippy split readiness error reporting and focused tests in the local runtime path.
Not run:
just build- not required for this validation tier because this is a Rust-only runtime diagnostics change with no UI or embedded asset changes.Remote CI: pending until the PR is submitted.
Rollback plan
Rollback: revert this PR.
DB downgrade: not applicable.
Data repair: not applicable.
Operational caveats: rollback restores the previous less-detailed aggregate split capacity error.
Known residual risks