Skip to content

fix(gemini): native session IDs and process lifecycle reliability#690

Open
menny wants to merge 5 commits intositeboon:mainfrom
menny:fix/gemini-provider-overhaul
Open

fix(gemini): native session IDs and process lifecycle reliability#690
menny wants to merge 5 commits intositeboon:mainfrom
menny:fix/gemini-provider-overhaul

Conversation

@menny
Copy link
Copy Markdown
Contributor

@menny menny commented Apr 23, 2026

Summary

This Pull Request significantly improves the Gemini provider by adopting the native session ID architecture and fixing several process management bugs.

Key Changes

  1. Native ID Refactor: Postpones session creation until the Gemini CLI emits its native session_id (e.g., chat_...). This aligns Gemini with the Claude SDK implementation and natively resolves the "duplicate sessions in sidebar" issue.
  2. Orphan Prevention: spawnGemini now proactively checks for and aborts any existing process for a session before starting a new one, preventing background resource leaks.
  3. Reliable Resumption: Updated the resumption logic to correctly handle native IDs, ensuring the --resume flag is always passed for existing conversations.
  4. Double-Fire Protection: Implemented a settleOnce pattern to ensure the process promise only resolves/rejects once, preventing duplicate error messages to the client.
  5. Improved SIGKILL Logic: Fixed the force-kill timeout to track the specific process object, ensuring orphaned processes are reliably terminated even if session keys have changed.

Impact

  • Eliminates duplicate Gemini sessions in the UI.
  • Improves overall provider stability during rapid message exchanges.
  • Reduces server resource usage by cleaning up orphaned CLI processes.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed orphaned Gemini processes by terminating existing sessions before starting new ones
    • Improved session lifecycle management to prevent process leaks
  • Improvements

    • Server startup now ensures session store is fully loaded before accepting requests
    • Enhanced session resumption and initialization for greater reliability

menny and others added 5 commits April 22, 2026 18:57
…g requests

SessionManager loads all persisted Gemini sessions from ~/.gemini/sessions/*.json
asynchronously in its constructor via `this.ready = this.init()`. Previously,
`sessionManager.ready` was never awaited anywhere, so the server would start
accepting WebSocket connections before loadSessions() had completed.

This created a race condition on the very first request after a server restart:
if a client tried to resume an existing Gemini session, `sessionManager.getSession(sessionId)`
would return undefined (the session hadn't been loaded from disk yet), causing
the `--resume <cliSessionId>` flag to be silently omitted when spawning the
Gemini CLI process. The CLI would then start a blank new session instead of
continuing the prior conversation, with no error or indication to the user.

The fix adds `await sessionManager.ready` in startServer() immediately after
`await initializeDatabase()`, following the same pattern already established for
database initialization. This guarantees the session store is fully populated
before any request handler can call getSession(), createSession(), or addMessage().

No other providers are affected: Claude, Cursor, and Codex session providers are
stateless per-request (they read from disk or SQLite on demand) and have no
equivalent eager-loading singleton.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fetchHistory used the UI's sessionId (e.g., gemini_<timestamp>) when falling back
to getGeminiCliSessionMessages. However, the Gemini CLI stores its history
keyed by its native cliSessionId. This caused the fallback to fail silently,
preventing the restoration of conversation history from disk for existing sessions.

The fix extracts the session object from sessionManager and uses session.cliSessionId
if available, ensuring history is correctly retrieved using the CLI's native identifier.
This fix addresses several interconnected issues in Gemini session management:

1.  Prevent Orphaned Processes (Issue #1):  now checks for and aborts any existing process for a session before starting a new one. This prevents multiple CLI processes from running concurrently for the same session.
2.  Defensive Session Readiness (Issue siteboon#3): Added  in  to ensure session data is fully loaded from disk before attempting to look up .
3.  Improved SIGKILL Logic (Issue siteboon#11):  now verifies the specific process object is still active before sending SIGKILL, preventing stale key issues and accidental killing of replacement processes.
4.  Diagnostic Logging: Added warnings in  when a session is found but its  is missing, making it easier to diagnose CLI initialization failures.

These changes ensure the --resume flag is reliably passed and system resources are properly managed during rapid interactions.
This refactoring eliminates duplicate Gemini sessions in the sidebar by
aligning the session ID lifecycle with the Claude SDK implementation:

1.  Native ID Capture: Shifted session initialization from the raw stdout handler
    to the structured 'onInit' event. The system now waits for the Gemini CLI
    to emit its native 'session_id' (e.g., chat_...) before creating the session
    in the session manager.
2.  Unified Identifier: By using the native CLI ID as the primary key for both
    the UI and disk storage, we ensure that projects.js can correctly merge UI
    and CLI sessions based on a single, shared ID.
3.  Fallback Logic: Maintained random-suffixed fallback ID generation for
    unstructured plain-text mode to ensure robustness.

These changes root out the cause of duplicate sidebar entries and simplify
session management across the Gemini provider.
Implemented a 'settled' flag and 'settleOnce' helper in spawnGemini to ensure
the process promise only resolves or rejects once. This prevents duplicate
error messages from being sent to the client when both 'error' and 'close'
events fire, and ensures that the execution timeout is properly cleared when
the process finishes or fails.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

The pull request improves Gemini CLI process lifecycle management and server initialization sequencing. The server now waits for the session manager to load before accepting requests. The CLI process spawner now waits for session manager readiness, prevents orphaned processes by aborting existing instances, defers session creation until Gemini provides a session ID, and ensures lifecycle operations resolve only once. Session history retrieval now correctly maps internal session IDs to CLI session identifiers.

Changes

Cohort / File(s) Summary
Server Initialization & Session Lifecycle
server/index.js, server/gemini-cli.js
Server startup now awaits sessionManager.ready to guarantee session store availability. Gemini CLI spawning waits for session manager readiness, aborts pre-existing processes for the same session ID, defers session creation until Gemini provides event.session_id, re-keys process tracking with the native ID, and hardens lifecycle promise resolution with single-settle guards to prevent duplicate completion callbacks.
Session History Mapping
server/modules/providers/list/gemini/gemini-sessions.provider.ts
History retrieval now maps the in-memory sessionId to the corresponding CLI session identifier via sessionManager.getSession(sessionId)?.cliSessionId, ensuring message history is fetched against the correct CLI session when the in-memory cache is empty.

Poem

🐰 Processes spawn with grace, no orphans left behind,
Sessions wait their moment, cliIds mapped and aligned,
The startup holds its breath until the stores awake,
Each promise settles once—no duplicates we make,
The warren's now in order, workflows crystalline!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(gemini): native session IDs and process lifecycle reliability' directly and specifically describes the main changes: adopting native session IDs and fixing process management reliability issues.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/gemini-cli.js (1)

414-463: ⚠️ Potential issue | 🟠 Major

Close handler can remove the replacement process from activeGeminiProcesses after an abort+respawn.

With the new "abort existing process before spawning" logic (lines 23‑27), the sequence becomes:

  1. abortGeminiSession(sessionId) sends SIGTERM to P1 but leaves it in the map.
  2. The new process P2 is registered at line 240, overwriting the map entry for sessionId.
  3. P1 eventually emits close; its handler runs with its own closure where finalSessionId === sessionId.
  4. Line 425 activeGeminiProcesses.delete(finalSessionId) now removes P2 from the map, so isGeminiSessionActive, getActiveGeminiSessions, and subsequent abortGeminiSession calls silently fail on the live session.

Additionally, line 432 sends kind: 'complete' for P1 over the shared writer while P2 is streaming, which can prematurely terminate the client UI for the active run.

Mirror the identity check you already added in abortGeminiSession (lines 504‑514) so the close handler only mutates the map/ws when it still owns the entry.

🔒 Proposed guard
-            // Clean up process reference
-            const finalSessionId = capturedSessionId || sessionId || processKey;
-            activeGeminiProcesses.delete(finalSessionId);
-
-            // Save assistant response to session if we have one
-            if (finalSessionId && assistantBlocks.length > 0) {
-                sessionManager.addMessage(finalSessionId, 'assistant', assistantBlocks);
-            }
-
-            ws.send(createNormalizedMessage({ kind: 'complete', exitCode: code, isNewSession: !sessionId && !!command, sessionId: finalSessionId, provider: 'gemini' }));
+            // Clean up process reference only if this process still owns the entry.
+            // A prior spawn may have been aborted and replaced under the same key.
+            const finalSessionId = capturedSessionId || sessionId || processKey;
+            const stillOwned = activeGeminiProcesses.get(finalSessionId) === geminiProcess;
+            if (stillOwned) {
+                activeGeminiProcesses.delete(finalSessionId);
+            }
+
+            // Save assistant response to session if we have one
+            if (finalSessionId && assistantBlocks.length > 0) {
+                sessionManager.addMessage(finalSessionId, 'assistant', assistantBlocks);
+            }
+
+            if (stillOwned) {
+                ws.send(createNormalizedMessage({ kind: 'complete', exitCode: code, isNewSession: !sessionId && !!command, sessionId: finalSessionId, provider: 'gemini' }));
+            }

The same stillOwned guard should be applied at line 469 in the error handler for symmetry.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/gemini-cli.js` around lines 414 - 463, The close handler currently
unconditionally deletes activeGeminiProcesses and sends completion messages for
any process with the same session id, which can remove a newly spawned
replacement; change the handler to first verify ownership by comparing the
activeGeminiProcesses entry for finalSessionId against this geminiProcess (e.g.,
get the map entry and ensure it === geminiProcess or matches a stored pid/unique
id) and only then delete the map entry, send the 'complete' message and clean up
temp files; apply the same ownership guard in the error handler so it likewise
only mutates activeGeminiProcesses and sends error messages if the
closed/errored process is still the registered owner (referencing
activeGeminiProcesses, finalSessionId, geminiProcess, ws.send,
abortGeminiSession).
🧹 Nitpick comments (1)
server/modules/providers/list/gemini/gemini-sessions.provider.ts (1)

125-127: Logic is correct; consider typing cliSessionId on the session shape.

Functionally fine: in the new native-ID flow sessionId is already the CLI identifier, and legacy sessions carry cliSessionId, so the fallback chain works. The as any cast is the only rough edge — adding an optional cliSessionId?: string to the Session/SessionRecord type returned by sessionManager.getSession would drop the cast and document the native-ID relationship at the type level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/modules/providers/list/gemini/gemini-sessions.provider.ts` around
lines 125 - 127, The current code uses a cast (session as any)?.cliSessionId to
access cliSessionId; instead, add an optional property cliSessionId?: string to
the Session or SessionRecord type returned by sessionManager.getSession so the
shape documents that legacy sessions may have cliSessionId while native sessions
use sessionId, then update the usage in gemini-sessions.provider.ts (where
sessionManager.getSession is called and cliSessionId is computed) to remove the
any cast and read (session.cliSessionId || sessionId), keeping the fallback to
getGeminiCliSessionMessages(cliSessionId) intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/gemini-cli.js`:
- Line 461: The line calling settleOnce currently has one too many closing
parentheses which causes a syntax error; update the call to settleOnce so it
wraps the arrow function that calls reject(new Error(...)) with the correct
number of closing parentheses and terminate the statement with a semicolon —
i.e., ensure settleOnce( ... ) closes only once after the arrow function and
that reject(...) and new Error(...) are properly closed; adjust the invocation
around settleOnce, reject, new Error and the code variable accordingly.

---

Outside diff comments:
In `@server/gemini-cli.js`:
- Around line 414-463: The close handler currently unconditionally deletes
activeGeminiProcesses and sends completion messages for any process with the
same session id, which can remove a newly spawned replacement; change the
handler to first verify ownership by comparing the activeGeminiProcesses entry
for finalSessionId against this geminiProcess (e.g., get the map entry and
ensure it === geminiProcess or matches a stored pid/unique id) and only then
delete the map entry, send the 'complete' message and clean up temp files; apply
the same ownership guard in the error handler so it likewise only mutates
activeGeminiProcesses and sends error messages if the closed/errored process is
still the registered owner (referencing activeGeminiProcesses, finalSessionId,
geminiProcess, ws.send, abortGeminiSession).

---

Nitpick comments:
In `@server/modules/providers/list/gemini/gemini-sessions.provider.ts`:
- Around line 125-127: The current code uses a cast (session as
any)?.cliSessionId to access cliSessionId; instead, add an optional property
cliSessionId?: string to the Session or SessionRecord type returned by
sessionManager.getSession so the shape documents that legacy sessions may have
cliSessionId while native sessions use sessionId, then update the usage in
gemini-sessions.provider.ts (where sessionManager.getSession is called and
cliSessionId is computed) to remove the any cast and read (session.cliSessionId
|| sessionId), keeping the fallback to getGeminiCliSessionMessages(cliSessionId)
intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 16e1dde7-4fed-42fc-bfca-4c0f89250f3b

📥 Commits

Reviewing files that changed from the base of the PR and between f6200e3 and 5112582.

📒 Files selected for processing (3)
  • server/gemini-cli.js
  • server/index.js
  • server/modules/providers/list/gemini/gemini-sessions.provider.ts

Comment thread server/gemini-cli.js
error: code === null ? 'Gemini CLI process was terminated or timed out' : null
});
reject(new Error(code === null ? 'Gemini CLI process was terminated or timed out' : `Gemini CLI exited with code ${code}`));
settleOnce(() => reject(new Error(code === null ? 'Gemini CLI process was terminated or timed out' : `Gemini CLI exited with code ${code}`))));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Syntax error — extra closing parenthesis breaks the module.

There are four ) after the template literal but only three opens (settleOnce(, reject(, new Error(). Biome flagged this (Expected a semicolon or an implicit semicolon after a statement, but found none). As written, this file will fail to parse and the entire Gemini provider becomes unusable.

🐛 Proposed fix
-                settleOnce(() => reject(new Error(code === null ? 'Gemini CLI process was terminated or timed out' : `Gemini CLI exited with code ${code}`))));
+                settleOnce(() => reject(new Error(code === null ? 'Gemini CLI process was terminated or timed out' : `Gemini CLI exited with code ${code}`)));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
settleOnce(() => reject(new Error(code === null ? 'Gemini CLI process was terminated or timed out' : `Gemini CLI exited with code ${code}`))));
settleOnce(() => reject(new Error(code === null ? 'Gemini CLI process was terminated or timed out' : `Gemini CLI exited with code ${code}`)));
🧰 Tools
🪛 Biome (2.4.11)

[error] 461-461: Expected a semicolon or an implicit semicolon after a statement, but found none

(parse)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/gemini-cli.js` at line 461, The line calling settleOnce currently has
one too many closing parentheses which causes a syntax error; update the call to
settleOnce so it wraps the arrow function that calls reject(new Error(...)) with
the correct number of closing parentheses and terminate the statement with a
semicolon — i.e., ensure settleOnce( ... ) closes only once after the arrow
function and that reject(...) and new Error(...) are properly closed; adjust the
invocation around settleOnce, reject, new Error and the code variable
accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant