Skip to content

[Bug]:Account/user created on one instance is invisible to other instances in multi-instance deployment #2351

@xialuodba-bit

Description

@xialuodba-bit

Bug Description

When OpenViking is deployed with multiple instances behind a load balancer, account and user information created through the admin API on one instance is not propagated to other instances. Subsequent requests routed to a different instance fail authentication because
the in-memory cache of that instance does not contain the newly created account/user.

Steps to Reproduce

  1. Deploy two OpenViking instances (Instance A and Instance B) behind a round-robin load balancer sharing the same AGFS/VikingFS storage
  2. Call POST /api/v1/admin/accounts or POST /api/v1/admin/accounts/{account_id}/users — the request lands on Instance A
  3. Instance A writes the new account/user to AGFS and updates its own in-memory APIKeyManager._accounts
  4. Send an authenticated request using the new API key — the request is routed to Instance B
  5. Instance B has no knowledge of the new account/user (its cache was loaded at startup and never refreshed)

Expected Behavior

All instances share a consistent view of accounts and users. A newly created account/user should be immediately usable regardless of which instance handles the next request.

Actual Behavior

Instance B returns an authentication error (401) for the new API key because APIKeyManager._accounts on that instance is stale. The account/user only exists in AGFS (persisted) and in Instance A's memory.

Minimal Reproducible Example

# Instance A: create a new user
  import requests
  resp = requests.post("http://instance-a/api/v1/admin/accounts/acct1/users", ...)
  api_key = resp.json()["api_key"]

  # Next request hits Instance B via load balancer
  resp = requests.get("http://load-balancer/api/v1/...", headers={"Authorization": f"Bearer {api_key}"})
  # → 401 Unauthorized

Error Logs

# Instance B logs
  AuthenticationError: API key not found

OpenViking Version

v0.3.22

Python Version

3.12

Operating System

Linux

Model Backend

None

Additional Context

Three failure modes exist:

┌─────────────────────┬──────────────────────────────────────────────────┐
│ Operation │ Effect on other instances │
├─────────────────────┼──────────────────────────────────────────────────┤
│ Create account/user │ New credentials rejected (401) │
├─────────────────────┼──────────────────────────────────────────────────┤
│ Regenerate API key │ Old key still accepted (security risk) │
├─────────────────────┼──────────────────────────────────────────────────┤
│ Delete user │ Deleted user still authenticated (security risk) │
└─────────────────────┴──────────────────────────────────────────────────┘

Workaround: Route all admin API requests to a single designated instance, and restart other instances after any account change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions