fix(server): surface actionable guidance when embedding rebuild blocks startup#2618
Open
fancyboi999 wants to merge 1 commit into
Open
Conversation
…s startup On upgrade, an existing vector collection whose embedding metadata no longer matches the current config makes init_context_collection raise EmbeddingRebuildRequiredError. This propagated out of the lifespan uncaught, so openviking-server crashed at startup with a bare traceback and no recovery path — pushing operators toward deleting business data to get the server back up. Catch the error at the shared startup choke point (_initialize_runtime_state), log an operator-facing, case-aware recovery runbook (provider/model change vs. dimension change, with the verified `ov reindex` command and the allow_metadata_override caveat), then re-raise so startup still aborts and the server never serves vectors from a mismatched embedding space. Docs: - Add an upgrade-troubleshooting FAQ entry (en + zh) covering both recovery cases. - Replace the removed `python -m openviking.console.bootstrap` (gone since v0.3.18) with Web Studio at /studio in the openclaw-plugin setup docs. Closes volcengine#2273
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Upgrading OpenViking (e.g. v0.3.15 → v0.3.19+) with an existing workspace can crash
openviking-serverat startup with a bareEmbeddingRebuildRequiredErrortraceback and no recovery guidance, pushing operators toward deleting business data to get the server back up.This PR turns that expected-but-fatal upgrade condition into an actionable, operator-facing recovery runbook, and cleans up the stale deploy guidance that contributed to the "server won't start after upgrade" reports.
Root Cause
init_context_collection()(openviking/storage/collection_schemas.py:338,344) raisesEmbeddingRebuildRequiredErrorwhen an existing vector collection's embedding metadata (provider/model/dimension) no longer matches the current config — expected after a default-embedding-model change between versions.service.initialize(), called from the server lifespan via the shared choke point_initialize_runtime_state(openviking/server/app.py). It was never caught, so it propagated out of the FastAPI lifespan and uvicorn aborted startup with a raw traceback and zero remediation guidance.python -m openviking.console.bootstrapcommand (the standalone console, removed in v0.3.18) still appeared in the OpenClaw-plugin setup docs, so users following them hitModuleNotFoundError— the second half of the original report.Solution & Trade-offs
EmbeddingRebuildRequiredErrorat the single shared startup choke point_initialize_runtime_state, log a clear, case-aware recovery runbook, then re-raise so startup still aborts — the server must never serve vectors from a mismatched embedding space.embedding.allow_metadata_override=true, restart, thenov reindex viking:// --mode vectors_only --sudo --wait true.allow_metadata_overridedoes not bypass this and the vector index must be rebuilt; no risky destructive command is suggested._on_deferred_init_donepath is untouched, and no second error handler is introduced.console.bootstrapcommand with Web Studio at/studioin the OpenClaw-plugin setup docs. The historical references in the internal design doc are intentionally left as-is.Validation Evidence
End-to-end verification driving the real FastAPI lifespan (
create_app→lifespan_context→_initialize_runtime_state→service.initialize()) with a service that raisesEmbeddingRebuildRequiredError:Note: in the real lifespan the error is wrapped in an
ExceptionGroupby the MCP task group as it propagates; catching at the inner choke point logs the guidance before that wrapping, so the operator always sees it.Type of Change
Testing
Related Issues
Affected Areas
Primary: Platform / Server (
openviking/server/app.py). Supporting docs spandocs/*/faqandexamples/openclaw-plugin.Checklist