Skip to content

Add AST call handoff and LSP graph enrichment pipeline#809

Open
slbug wants to merge 9 commits into
safishamsi:v7from
slbug:ast_fixes_lsp_plugins_arch
Open

Add AST call handoff and LSP graph enrichment pipeline#809
slbug wants to merge 9 commits into
safishamsi:v7from
slbug:ast_fixes_lsp_plugins_arch

Conversation

@slbug
Copy link
Copy Markdown

@slbug slbug commented May 11, 2026

AST finds unresolved calls, optional LSP hooks try to resolve them, and only boringly safe evidence gets promoted back into the graph. No more turning every first, map, or id into a fake god node. Hooks are opt-in, cached, documented, and wired.

@slbug slbug force-pushed the ast_fixes_lsp_plugins_arch branch from a42afa8 to a31b60a Compare May 11, 2026 19:57
@safishamsi
Copy link
Copy Markdown
Owner

This is a well-engineered addition — the two-stage architecture (AST records unresolved_calls, optional LSP hooks promote them to INFERRED edges) is clean, the conservative promotion policy is correct, and the 1133-line test file is thorough. Three fixes needed before we merge:

1. LSP failure should degrade gracefully (must fix)
In watch.py around the LSP enrichment call, a hook failure raises an exception that causes the entire watch rebuild to return False. Since hooks are opt-in enrichment (not core), a failure should log a warning and continue with the pre-enrichment graph — not abort the rebuild. Please wrap the LSP enrichment call in a try/except that logs and continues.

2. Zombie processes on LSP client shutdown (must fix)
In lsp_definition_hook.py, the close() method calls proc.terminate() on timeout but never follows it with proc.wait(). This leaves zombie processes. Please add proc.wait(timeout=5) (with a kill() fallback) after terminate().

3. required default is asymmetric with docs (should fix)
The code defaults required=True for omitted hook config fields, but every example in docs/lsp-hooks.md sets "required": false. A user following the docs verbatim gets optional hooks, but one who omits the field gets required hooks — surprising. Please either flip the default to False or update the docs examples to omit the field.

Once these three are addressed we'll merge. Great work on this one.

@slbug
Copy link
Copy Markdown
Author

slbug commented May 14, 2026

@safishamsi done

@slbug
Copy link
Copy Markdown
Author

slbug commented May 17, 2026

@safishamsi any more changes required?

@gandhidarshak
Copy link
Copy Markdown

Excited to see this PR — I've been using graphify on a mixed Python + C++ monorepo (~50k+ files) and the INFERRED edge noise has been the single biggest pain point for automated analysis. Really appreciate both the project and what you've built here; this is exactly the right architecture IMO (AST surfaces the unresolved call sites, LSP resolves them with confidence).

A couple of things I was curious about:

  1. Language coverage — is the intent for you to add first-class support for more languages (clangd for C/C++, tsserver, etc.) over time, or is the hook architecture deliberately open-ended so users can wire in whatever LSP they need for their stack?
  2. INFERRED edge pruning — the enrichment pipeline is additive (LSP edges go in alongside existing INFERRED ones). Would it be feasible to have an option to evict the old INFERRED edge when LSP resolves the same call site? Otherwise queries still surface the heuristic edge alongside the verified one.
  3. Call-site column precision — graphify stores source_location: "L42" (line only). On lines with multiple calls or complex expressions, picking the right column for the LSP request matters. Is there a plan for recovering per-call-site columns, or does the current approach handle that?

Happy to contribute a clangd wiring patch or the pruning logic as a follow up PR if it would help move this forward.

@slbug
Copy link
Copy Markdown
Author

slbug commented May 21, 2026

Language support is intentionally open-ended. Core graphify should emit unresolved callsites and consume resolver evidence; the actual resolver can be anything external. The bundled
stdio LSP hook is just the default bridge. So clangd should mostly be config, e.g. clangd --compile-commands-dir=..., not hardcoded graphify logic.

The only graphify-side bits that might be needed for clangd are small: make sure C/C++ unresolved callsites include callee_range, verify the generic hook has the right language id,
and add a docs example / fixture test.

On pruning old INFERRED edges: agreed. Same source→target edges already get merged with LSP metadata, but wrong-target heuristic edges can still survive. I’d do pruning as a follow-
up option: if LSP resolves the same callsite, drop/suppress the older heuristic edge for that callsite.

On columns: graph.json still shows source_location: "L42" for compatibility, but the LSP handoff uses callee_range with line+character and sends definition requests at
callee_range.start. If a call has no range, the generic hook skips it rather than guessing from line-only data.

@gandhidarshak
Copy link
Copy Markdown

I’d do pruning as a follow-up option: if LSP resolves the same callsite, drop/suppress the older heuristic edge for that callsite.

Thanks a ton!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants