Skip to content

[Bug]: cross-loop RuntimeError when embedding completion callback closes OwnedLockLease #2515

@My-Fred

Description

@My-Fred

Bug Description

OpenViking can raise a cross-loop RuntimeError during embedding/semantic pipeline cleanup when an embedding completion callback closes an OwnedLockLease from a different
event loop than the one that created the lease refresh task.

This does not appear to be an embedding request failure itself. The failure happens later, during completion/callback cleanup.

Steps to Reproduce

  1. Start OpenViking with queue, embedding, and semantic processing enabled.
  2. Trigger a workflow that creates semantic work plus downstream embedding tasks.
  3. Let the embedding completion path invoke the semantic DAG completion callback.
  4. During callback cleanup, let OwnedLockLease.close() run from a different event loop than the one that created the lease refresh task.
  5. Observe the cross-loop runtime error.

Expected Behavior

Lease cleanup should complete without error, even if the completion callback is executed from a different event loop.
If the lease refresh task is loop-bound, cleanup should be marshalled back onto the loop that owns that task.

Actual Behavior

OpenViking raises a cross-loop runtime error during cleanup, in the path:

  • embedding completion callback
  • semantic DAG completion wrapper
  • OwnedLockLease.close()
  • OwnedLockLease._stop_refresh()

The failure shape is:

Task ... got Future ... attached to a different loop

Minimal Reproducible Example

I do not currently have a small standalone script that reproduces this outside the full queue/embedding/semantic pipeline.

The smallest reliable repro so far is:

1. Run OpenViking with queue, embedding, and semantic processing enabled.
2. Trigger a workflow that creates semantic work with downstream embedding tasks.
3. Let the embedding completion tracker invoke the semantic DAG completion callback.
4. Let that callback close an `OwnedLockLease`.
5. If the callback runs on a different event loop than the one that created the lease refresh task, cleanup can fail with a cross-loop `RuntimeError`.

From local debugging, the problematic path is:

- `embedding_tracker.py`
- `semantic_dag.py` (`wrapped_on_complete`)
- `lock_lease.py` (`OwnedLockLease.close` / `_stop_refresh`)

Error Logs

Key failure shape only:


RuntimeError: Task ... got Future ... attached to a different loop
...
embedding_tracker.py
semantic_dag.py -> wrapped_on_complete()
lock_lease.py -> OwnedLockLease.close()
lock_lease.py -> _stop_refresh()

OpenViking Version

0.3.24

Python Version

3.11.x

Operating System

Linux

Model Backend

Other

Additional Context

This was observed in a host/container setup where an agent communicates with OpenViking over MCP/HTTP, and the workflow triggers resource ingestion plus downstream embedding/semantic processing.

The issue does not appear to be caused by the MCP transport itself. MCP-based operations were able to reach OpenViking successfully, and the failure surfaced later in the embedding/semantic completion path during lock cleanup.

A longer timeout made the issue easier to observe, but does not appear to be the root cause. The timeout simply allowed the pipeline to progress far enough for the completion callback and lease cleanup path to execute.

From local debugging, the failure appears to happen when an embedding completion callback eventually closes an OwnedLockLease from a different event loop than the one that created the lease refresh task.

I am intentionally not including full environment-specific logs because they may contain sensitive local details, but the core failure shape is consistent and reproducible.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

Status
In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions