Skip to content

Model configuration fixes#946

Merged
dividedmind merged 7 commits into
mainfrom
fix/model-config
Apr 28, 2026
Merged

Model configuration fixes#946
dividedmind merged 7 commits into
mainfrom
fix/model-config

Conversation

@dividedmind
Copy link
Copy Markdown
Contributor

I looked at the flow of user configuring a custom OpenAI key or endpoint through the UI and found that it didn't work reliably — the current navie tab froze and the user was forced to open a new tab. I tracked it down to two problems: port not always being preserved between restarts, and the configuration change causing more than one restart (theoretically up to four, but I've only seen two). While trying to fix these, I noticed that the RPC server service has grown to be a bit of a mess, so I refactored it to explicitly model the state machine to more easily see what's going on.

I've also noticed and fixed a couple other problems such as:

  • setting an openai key disabled copilot integration unnecessarily — this was required before model config could have been pushed with rpc;
  • environment changes didn't automatically trigger rpc restarts.

I suggest reviewing commit-by-commit; the commit messages provide more detail.

To test manually:

  • open Navie with copilot enabled and OpenAI not configured,
  • use gear icon to enter the OpenAI API key,
  • notice that after a brief pause Navie becomes available again,
  • the model selector should now list both copilot and OpenAI models,
  • check the appmap-rpc logs and verify the server only restarted once.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes unreliable Navie model configuration updates (custom OpenAI key/endpoint) by stabilizing JSON-RPC server restarts and port reuse, and refactors the RPC service into an explicit server state machine to make restart behavior easier to reason about.

Changes:

  • Refactor DefaultAppLandJsonRpcService into a state machine, preserving the server port across restarts and debouncing restart triggers.
  • Simplify/standardize settings-change notifications (move to modelConfigChange() without per-key callbacks) and ensure env/model config changes trigger RPC restarts.
  • Add regression tests for port reuse and “single restart” behavior; harden webview model selection against undefined events; avoid disabling Copilot when OpenAI is configured.

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
plugin-core/src/main/java/appland/rpcService/DefaultAppLandJsonRpcService.java Refactors server lifecycle into explicit states; changes restart/port handling and startup signaling.
plugin-core/src/main/java/appland/rpcService/AppLandJsonRpcListener.java Removes serverRestarted() callback (restart completion is signaled via other events).
plugin-core/src/main/java/appland/rpcService/TestAppLandJsonRpcService.java Updates test helper to terminate the current RPC process.
plugin-core/src/main/java/appland/settings/AppMapSettingsListener.java Removes per-key secure model config callback in favor of a single modelConfigChange().
plugin-core/src/main/java/appland/settings/AppMapSettingsReloadProjectListener.java Restarts RPC server on model config + CLI env changes (debounced).
plugin-core/src/main/java/appland/settings/AppMapSecureApplicationSettingsService.java Publishes modelConfigChange() when secure model config changes.
plugin-core/src/main/java/appland/webviews/WebviewEditor.java Opens dev tools after the webview application initializes instead of on URL load.
plugin-copilot/src/main/java/appland/copilotChat/CopilotModelInfoProvider.java Stops treating custom OpenAI configuration as disabling Copilot integration.
plugin-core/src/test/java/appland/AppMapBaseTest.java Updates JSON-RPC test helpers to wait on serverStarted() only.
plugin-core/src/test/java/appland/rpcService/DefaultAppLandJsonRpcServiceTest.java Updates restart/start wait conditions; adds tests for port reuse and debounced restarts.
plugin-core/src/test/java/appland/settings/AppMapSecureApplicationSettingsServiceTest.java Updates secure model config listener tests for the new callback signature.
plugin-core/src/test/java/appland/actions/AddNavieContextFilesActionTest.java Uses the updated JSON-RPC wait helper.
appland-webview/navie.js Guards against undefined payloads in select-model events.
appland-webview/dist/navie.js Regenerated build output for the webview change.
appland-webview/dist/navie.js.map Regenerated source map for the webview change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +345 to +350
// Push config before notifying listeners so the initial state is already set when they react.
updateServerSettings();
updateModelConfig();
if (!isDisposed && !project.isDisposed()) {
project.getMessageBus().syncPublisher(AppLandJsonRpcListener.TOPIC).serverStarted();
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onPortAnnounced calls updateServerSettings() (which waits for smart mode + does synchronous RPC I/O) and updateModelConfig() before publishing serverStarted(). This can delay serverStarted substantially during indexing or slow model providers, which risks UI/test timeouts and makes the server appear “not started” even though the port is ready. Consider publishing serverStarted() immediately after transitioning to RUNNING, and schedule the config/model pushes asynchronously (e.g., via the existing SingleAlarm/pooled thread) so port readiness isn’t gated on config propagation.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional, so the server is completely ready before clients try to interact with it. I haven't observed any concerning delays in practice.

Comment on lines +310 to +312
// No try/finally needed: stopCurrentProcessSync swallows all exceptions internally,
// and serverStarted fires automatically from onPortAnnounced when the new process announces its port.
project.getMessageBus().syncPublisher(AppLandJsonRpcListener.TOPIC).beforeServerRestart();
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restartServerSync() publishes beforeServerRestart() unconditionally, even when no server is running/starting (e.g., settings changes before first start, or CLI download when STOPPED). That can cause listeners (Navie editors) to show “restarting” for what is effectively a first start. Consider only publishing beforeServerRestart() when the state is STARTING/RUNNING (or when currentProcess is non-null), and treat STOPPED as a plain start.

Suggested change
// No try/finally needed: stopCurrentProcessSync swallows all exceptions internally,
// and serverStarted fires automatically from onPortAnnounced when the new process announces its port.
project.getMessageBus().syncPublisher(AppLandJsonRpcListener.TOPIC).beforeServerRestart();
boolean publishBeforeRestart;
synchronized (this) {
publishBeforeRestart = currentProcess != null
|| state == ServerState.STARTING
|| state == ServerState.RUNNING
|| state == ServerState.CRASH_RESTARTING;
}
// No try/finally needed: stopCurrentProcessSync swallows all exceptions internally,
// and serverStarted fires automatically from onPortAnnounced when the new process announces its port.
if (publishBeforeRestart) {
project.getMessageBus().syncPublisher(AppLandJsonRpcListener.TOPIC).beforeServerRestart();
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's a real concern. It's an unlikely corner case and I don't think the consequences are significant.

Comment thread plugin-core/src/main/java/appland/rpcService/TestAppLandJsonRpcService.java Outdated
@dividedmind dividedmind force-pushed the fix/model-config branch 2 times, most recently from 6f5b629 to 03ffbfa Compare April 26, 2026 10:14
Copy link
Copy Markdown
Collaborator

@jansorg jansorg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes seem fine to me, only a few minor notes.
It worked well when I tested it.

}

private void initWebviewApplication() {
if (Registry.is("appmap.webview.open.dev.tools", false)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you observe a problem with the previous setup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it only opened for me about 30% of the time.

Comment thread plugin-core/src/main/java/appland/rpcService/DefaultAppLandJsonRpcService.java Outdated
Comment thread plugin-core/src/main/java/appland/rpcService/DefaultAppLandJsonRpcService.java Outdated
}
lastKnownPort = port;
state = ServerState.RUNNING;
crashRestartCount = 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this implements "restart after crash" differently.
Now, the count is reset when the port is announced. A (repeated) crash after the port announcement is not counted against the max restarts and the server would keep restarting.
Previously, it was counted, AFAIK.

I'm not sure what the better implementation is. There are advantages of this implementation, e.g. doing restarts after the server kept running. Your call what you'd like to do.

Copy link
Copy Markdown
Contributor Author

@dividedmind dividedmind Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentional. Previous implementation was IMO broken because the crash counter never reset; this led to the situation that when the server crashed and successfully restarted, a subsequent unrelated crash (say, hours later) would have the restart unnecessarily delayed or even (if it exceeded the implicit crash counter) suppressed. Not to mention this could have interfered with the tests which didn't have access to reset the crash counter (although this could be exposed if needed), even if I can't tell for certain if any test problems I've seen when adding more crash tests were caused directly by this, and this concern could be fixed just by resetting on explicit (re)start.

When the server announces the port it's initialized, up and running, and we can be pretty sure if it were to crash due to eg. corrupted binary or a similar issue, it would have crashed by now. The alternative would be to set up a timeout to reset the crash counter after it has been successfully running for some time (maybe a minute or so). Wdyt?

Opening devtools immediately after loadURL was unreliable because the
JCEF browser might not have been fully initialized yet. Moving the
call to initWebviewApplication ensures that the browser is ready
to display the devtools window.
Sometimes the webview sends select-model with undefined; this
happens for example when previously chosen model is no longer
available. Ignore these calls instead of logging errors.
Due to two races, changing settings could leave Navie broken until the IDE
was restarted.

First, the crash handler used implicit state (a null/non-null process object
plus a delay threshold) and could not distinguish a crash from an intentional
settings-driven restart. When the intentional restart exited the old process,
the crash handler would observe the exit and fire a competing restart, the two
paths would collide, and the service would get stuck.

Second, the crash recovery codepath allocated a fresh port instead of reusing
the existing one. Any Navie UI window that was already open had the old port
baked in and could not reconnect after the server recovered from a crash.

Refactor DefaultAppLandJsonRpcService around an explicit ServerState enum
(STOPPED, STARTING, RUNNING, STOPPING, CRASH_RESTARTING, DISPOSED). The crash
handler only fires from the RUNNING state, so intentional restarts (which
transition through STOPPING) are never mistaken for crashes. Port ownership
moves to service level and is reused across all restarts; the server is only
considered running once the port is announced, so callers always observe a
fully-ready process. The crash backoff is count-based with exponential delay
(1 s, 1.5 s, 2.25 s, then giving up) and resets on both intentional restart
and successful port announcement.

Also: replaces the per-launch ProcessAdapter allocation with a singleton
listener that guards by process identity, removes the now-redundant
serverRestarted event (serverStarted covers both first start and restart),
and consolidates duplicated "wait for server running" test helpers.
Changing a model setting (e.g. an API key) from the UI caused the Navie
RPC server to restart twice due to duplicate event listeners. All
config-driven restarts are now consolidated in AppMapSettingsReloadProjectListener
with a 1-second debounce, so rapid successive changes result in a single
restart. Any CLI environment change now also correctly triggers a restart,
not just LLM-related variables. The separate secureModelConfigChange event
is removed in favour of the unified modelConfigChange.
Previously, Copilot models were hidden from the model list whenever an
OpenAI API key was set. Since each model carries its own connection
details, both providers can coexist in the list and the user can switch
between them freely.
The test was flaky because the indexer's prepareWatchedDirectory created
tmp/appmap on the real filesystem outside the VFS, causing a later
copyDirectoryToProject to fail with "Cannot create directory". Restructure
to match the pattern all other tests in the class use: pre-create tmp/appmap
in the VFS before createAppMapYaml starts the indexer, and rely on
createAppMapYaml's afterRefreshForProjects barrier instead of a bare
copyFileToProject call for reliable startup sequencing.
loadFindingsDirectory was unblocking on the first afterFindingsChanged
event, but reloadAsync fires that event once per findings file before
emitting afterFindingsReloaded at the end. On fast machines the remaining
per-file events would all fire before expandAll() started interacting with
the AsyncTreeModel, so the tree happened to be stable by then. On slower
CI runners the events were still in flight during expansion, causing
RootNode to call structureChanged() on each one and forcing AsyncTreeModel
to restart indefinitely until the 120 s timeout.

Switch to createFindingsReloadedCondition so the test only proceeds once
all findings are loaded and the model is guaranteed to be stable.
@dividedmind dividedmind merged commit 3ce2a18 into main Apr 28, 2026
5 checks passed
@dividedmind dividedmind deleted the fix/model-config branch April 28, 2026 17:59
@appmap-releasebot
Copy link
Copy Markdown

🎉 This PR is included in version 0.83.2 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants