Skip to content

fix: prevent race conditions when starting concurrent sessions (Node.js & Python)#690

Open
Fu-Jie wants to merge 1 commit intogithub:mainfrom
Fu-Jie:fix/concurrent-session-start-race
Open

fix: prevent race conditions when starting concurrent sessions (Node.js & Python)#690
Fu-Jie wants to merge 1 commit intogithub:mainfrom
Fu-Jie:fix/concurrent-session-start-race

Conversation

@Fu-Jie
Copy link

@Fu-Jie Fu-Jie commented Mar 5, 2026

Summary

Fixes a race condition where executing multiple createSession() calls concurrently (with autoStart: true enabled) triggered multiple underlying copilot server processes to spawn instead of sharing one.

When autoStart is enabled, the check if (this.state === "connected") return; could be bypassed by parallel invocations before the first one completed connecting. This resulted in background zombie CLI processes.

Go and .NET are not affected — Go already guards with sync.RWMutex, and .NET uses _connectionTask ??= StartCoreAsync(...) which is atomic by design.

Changes

  • nodejs/src/client.ts: Cached the initialization into a startPromise. Subsequent concurrent calls await the same promise instead of spawning a new process.
  • python/copilot/client.py: Secured the initialization inside an asyncio.Lock() to guarantee a single daemon process spawn.
  • E2E Tests: Removed .skip and @pytest.mark.skip (along with the stale TODO comment) from both nodejs/test/e2e/session.test.ts and python/e2e/test_session.py. The concurrent instantiation tests now pass cleanly across both languages.

Copilot AI review requested due to automatic review settings March 5, 2026 16:45
@Fu-Jie Fu-Jie requested a review from a team as a code owner March 5, 2026 16:45
@Fu-Jie Fu-Jie marked this pull request as draft March 5, 2026 16:52
@Fu-Jie Fu-Jie marked this pull request as ready for review March 5, 2026 16:53
@Fu-Jie Fu-Jie force-pushed the fix/concurrent-session-start-race branch from 991bc8f to 44ac913 Compare March 5, 2026 16:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Fu-Jie Fu-Jie requested a review from Copilot March 5, 2026 17:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Fu-Jie Fu-Jie requested a review from Copilot March 5, 2026 17:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment on lines 284 to +289
return;
}

this.state = "connecting";
if (this.startPromise) {
return this.startPromise;
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start() now returns a Promise in some branches but returns nothing in the early-return path shown here. If start() is no longer declared async, callers doing await client.start() may not reliably await anything. Consider ensuring start() is declared async (so return; becomes Promise<void>), or return a resolved promise in the early-return path (e.g., return this.startPromise ?? Promise.resolve();) to keep the return type consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +291 to +294
this.startPromise = (async () => {
this.state = "connecting";

try {
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startPromise is cleared on error and on explicit stop/cleanup, but it is never cleared on a successful start. Keeping a resolved promise around works, but it can become a stale gate if any other code path later transitions the client back to "disconnected" without also clearing startPromise. Consider limiting startPromise to only represent an in-flight start (e.g., set it back to null in a finally after the connection attempt completes successfully), relying on this.state === "connected" for the fast-path afterward.

Copilot uses AI. Check for mistakes.
Comment on lines +306 to +314
this.state = "connected";
} catch (error) {
this.state = "error";
this.startPromise = null;
throw error;
}
})();

return this.startPromise;
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startPromise is cleared on error and on explicit stop/cleanup, but it is never cleared on a successful start. Keeping a resolved promise around works, but it can become a stale gate if any other code path later transitions the client back to "disconnected" without also clearing startPromise. Consider limiting startPromise to only represent an in-flight start (e.g., set it back to null in a finally after the connection attempt completes successfully), relying on this.state === "connected" for the fast-path afterward.

Copilot uses AI. Check for mistakes.
Comment on lines 284 to +285
"""
if self._state == "connected":
return
async with self._start_lock:
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This acquires/awaits the lock even when already connected (the common steady-state). A small optimization is to add a fast-path check before entering the lock, and keep the check inside the lock as well (double-checked locking pattern) to preserve correctness while avoiding unnecessary lock contention/awaits.

Suggested change
"""
if self._state == "connected":
return
async with self._start_lock:
"""
# Fast-path: avoid acquiring the start lock if we're already connected.
if self._state == "connected":
return
async with self._start_lock:
# Double-check under the lock to ensure correctness with concurrent callers.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants