Skip to content

ci: add graphify structural impact analysis to PR review and structure audit#567

Open
andreatgretel wants to merge 6 commits intomainfrom
andreatgretel/feat/graphify-structural-rebase
Open

ci: add graphify structural impact analysis to PR review and structure audit#567
andreatgretel wants to merge 6 commits intomainfrom
andreatgretel/feat/graphify-structural-rebase

Conversation

@andreatgretel
Copy link
Copy Markdown
Contributor

📋 Summary

Add a graphify-based AST analysis tool that builds a
directed graph of the codebase (~6s, zero LLM cost) to surface architectural risk before the
review agent starts. Integrates into both the PR review workflow (pre-computed context for the
agent) and the Wednesday structure audit (week-over-week graph diff).

Hat tip to @nabinchha for suggesting graphify as the right tool for this.

🔗 Related Issue

Closes #472 (Phase 4 — structural analysis)

🔄 Changes

✨ Added

  • structural_impact.py
    standalone analysis tool with two modes:
    • --changed-files (PR review): extracts changed AST entities against the full codebase graph,
      reports risk level (LOW/MEDIUM/HIGH), god nodes affected, import direction violations, and
      cross-package dependencies
    • --full (structure audit): god node rankings, cross-package edge summary table, import
      violation detection, and graph diff against the previous week's cached graph
  • Step 3.5 in the
    review-code skill
    agent reads the pre-computed structural impact to calibrate review depth (HIGH risk = extra
    scrutiny on blast radius and backward compatibility)
  • graphify-out/ added to .gitignore (9MB graph cache is CI-only, persisted via actions/cache)

🔧 Changed

  • agentic-ci-pr-review.yml
    new step installs graphifyy in a lightweight temp venv and runs structural analysis before the
    agent starts. Tool is read from the base-branch checkout (base-agents/) so fork PRs cannot
    inject code that runs with GH_TOKEN in scope. Base-branch checkout expanded from
    .agents/recipes to also include .agents/tools
  • agentic-ci-daily.yml
    conditional graphifyy install for the structure suite; graphify-out/ added to the cache path
    for week-over-week graph diffs
  • pr-review recipe
    agent reads /tmp/structural-impact-{{pr_number}}.md and appends findings to the review
  • structure recipe
    new Section 4 with graphify instructions, baseline tracking, and week-over-week diff guidance

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • agentic-ci-pr-review.yml
    the security model: structural_impact.py runs from base-agents/ (base-branch sparse
    checkout), not from the PR's files, to prevent fork PRs from injecting code that executes
    with secrets in scope. The --repo-root flag ensures the tool still analyzes the PR's
    source code correctly despite living in a different checkout
  • structural_impact.py
    new 361-line tool; pure AST analysis, no DD imports at runtime. Depends on
    graphifyy (tree-sitter based) and networkx (transitive)

📊 Retroactive validation on 3 merged PRs

Ran graphify retroactively against PRs #502, #545, and #557 to validate the signal quality:

PR Graphify risk Key signal Actual critical findings
#502 (skip.when, 22 files) HIGH — 5 god nodes, 9 violations, 12/73 clusters DatasetBuilder (88 deps) + ExecutionGraph (89 deps) flagged as highest-risk allow_resize propagation bug and positional merge divergence — both in the flagged entities
#557 (jinja engine, 20 files) HIGH — config→engine violation, 52-dep env class config.__getattr__() calls into engine jsonpath filter bug at exactly that cross-package seam
#545 (async bridge, 6 files) MEDIUM — 9 clusters for 6 files Error hierarchy connectivity (52 deps) Fragile error string matching — in the flagged error class

In all 3 cases, the most critical review findings correlated with the highest-connectivity
entities in the graphify report. The risk level (LOW/MEDIUM/HIGH) matched actual review
effort needed.

🧪 Testing

  • structural_impact.py smoke-tested in both modes (--changed-files, --full)
    against the live DD codebase
  • --repo-root flag tested from external checkout directory (simulates CI base-agents/ path)
  • graphify API signatures verified against graphifyy==0.4.23
  • Pre-commit hooks pass (ruff, yaml, merge conflict check)
  • Unit tests added/updated — N/A: CI tool, no testable library code
  • E2E tests added/updated — validated retroactively against 3 merged PRs (see above)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated — N/A: .agents/ tooling, not package architecture

…e audit

Add a graphify-based AST analysis tool that builds a directed graph of the
codebase (~2s, no LLM calls) to detect architectural impact. Integrates
into both the PR review workflow (pre-computed before claude runs) and the
Wednesday structure audit (with week-over-week diff).

PR review: extracts changed files against the full codebase graph, reports
risk level (LOW/MEDIUM/HIGH), god nodes affected, import direction
violations, and cross-package dependencies. Output saved to /tmp and read
by the review agent.

Structure audit: produces god node rankings, cross-package edge summary
table, import violation detection, and graph diff against previous week's
cached graph. Baselines saved for runner memory trend tracking.
- Fix KeyError: god_nodes() returns 'degree' not 'edges' (3 call sites)
- Fix deduped vs raw violation count inconsistency in baselines.json
- Security: run structural_impact.py from base-branch checkout so fork
  PRs cannot inject code that executes with GH_TOKEN in scope
- Add --repo-root flag so the tool resolves package paths correctly when
  invoked from a different checkout directory
- Replace make install-dev + .venv with lightweight /tmp/graphify-venv
  (only graphifyy needed, saves ~2min CI per PR review)
- Add graphify-out/ to .gitignore (9MB graph cache is CI-only)
@andreatgretel andreatgretel requested a review from a team as a code owner April 21, 2026 20:42
@github-actions
Copy link
Copy Markdown
Contributor

Summary

PR #567 adds a graphify-based AST structural-impact tool and wires it into two CI workflows:

  • agentic-ci-pr-review.yml: a pre-review step installs graphifyy into a temp venv, runs the new .agents/tools/structural_impact.py against changed Python files, and writes /tmp/structural-impact-{PR}.md for the review agent to consume.
  • agentic-ci-daily.yml (structure suite): conditionally installs graphify and caches graphify-out/ for week-over-week graph diffs.
  • New tool (.agents/tools/structural_impact.py, 361 LOC): builds a directed AST graph of the three DD packages, flags god nodes, import-direction violations (interface → engine → config), cross-package edges, and emits risk levels (LOW/MEDIUM/HIGH).
  • Recipe/skill updates: pr-review recipe reads the pre-computed report and appends it to the review; structure recipe adds a Section 4 with baseline tracking; review-code skill adds Step 3.5 describing how to calibrate review depth based on risk.

Retroactive validation against 3 merged PRs is cited in the description; signal correlation with actual critical findings looks plausible.

Findings

Security — correctly handled, one residual concern

The critical security story (tool must run from trusted base-branch code, not PR code) is executed well:

  • Sparse checkout expanded from .agents/recipes to also include .agents/tools; the script is invoked from base-agents/.agents/tools/structural_impact.py, not from the PR checkout.
  • --repo-root "${{ github.workspace }}" cleanly separates trusted tool location from untrusted analysis target.
  • continue-on-error: true on the structural step means a tool regression or graphify pin drift will not block PR review.

One residual concern in agentic-ci-pr-review.yml:

CHANGED_PY=$(gh pr diff "$PR_NUMBER" --name-only | grep '\.py$' || true)
...
--changed-files $CHANGED_PY \

$CHANGED_PY is deliberately unquoted to word-split into multiple args, but that also subjects the values to IFS word-splitting and glob expansion. A PR introducing a .py file whose name contains whitespace or glob characters would produce incorrect arg boundaries. Git allows such filenames. Low risk (the tool treats each path as a file and filters by Path.exists()/.suffix), but consider reading into an array:

mapfile -t CHANGED_PY < <(gh pr diff "$PR_NUMBER" --name-only | grep '\.py$' || true)
...
--changed-files "${CHANGED_PY[@]}"

This also fixes a latent bug: if CHANGED_PY is only newlines (no matches), the unquoted expansion produces zero args and nargs="+" will error — but the early if [ -z "$CHANGED_PY" ] guard covers that case today.

Correctness

  1. _dedup key truncation (structural_impact.py:60): dedup key uses label[:30] on each side. Two distinct entities that share the first 30 characters of their label would be merged. Probably rare in practice but not obviously intended — prefer the full label, or document the truncation as intentional.

  2. pip install ... | tail -3 (both workflows): piping suppresses the pip exit code under the default set +o pipefail. A failed install would proceed to run structural_impact.py, which would then ImportError on graphify. continue-on-error catches this, but the log would be confusing. Either set pipefail for these steps or drop | tail -3.

  3. Global _REPO_ROOT mutation (structural_impact.py:21, 308): the global _REPO_ROOT pattern works but is fragile — _PACKAGE_SUBDIRS is resolved at import time against the default, while _collect_source_files reads _REPO_ROOT at call time. If anyone later changes _PACKAGE_SUBDIRS to absolute paths, the override breaks silently. Pass repo_root explicitly through the call chain, or evaluate subdirs inside the function.

  4. _get_package interface detection (structural_impact.py:53-57): matches on the literal directory segment data-designer followed by src. This correctly skips data-designer-engine / data-designer-config because the earlier substring checks win, but the fallthrough logic is subtle. A brief comment explaining the ordering dependency would save a future reader.

  5. NetworkX API compatibility (structural_impact.py:273-277): the try: node_link_graph(..., edges="links") / except TypeError: node_link_graph(...) fallback covers a real API change across NetworkX versions. Good defensive coding; worth a one-line comment referencing the NetworkX version boundary so it isn't mistaken for dead code later.

Style / project conventions

  • Missing type annotations: project requires full annotations (AGENTS.md). _build_graph, _cross_package_edges, _changed_files_mode, _full_mode, main are missing return types on some paths; _build_graph returns dict which hides its structure — consider a TypedDict or a small @dataclass. The returned G is a NetworkX DiGraph; typing it as dict defeats downstream type checking.
  • from __future__ import annotations: present at top of file. Good.
  • Absolute imports only: satisfied.
  • SPDX header: present.
  • Lazy heavy imports (AGENTS.md "Fast imports" invariant): graphify.* and networkx are imported at module top-level. For a standalone CI tool invoked directly this is fine — the invariant is about keeping data_designer import time low, and this file is in .agents/tools/, not in the package. Worth confirming the tool isn't auto-imported from anywhere in data_designer.

Test coverage

  • The checklist marks unit tests as "N/A: CI tool, no testable library code." This is partially true but several pure helpers are clearly testable and carry real logic: _get_package, _dedup, the risk-level decision tree in _changed_files_mode, and the direction-violation classifier in _cross_package_edges. A small test module (even a single pytest file in .agents/tools/tests/) would protect against silent regressions — especially since graphify is pinned loosely (graphifyy, no version) and API drift is the most likely failure mode.
  • Retroactive validation on 3 merged PRs is good signal-quality evidence, but it isn't a regression guard.

Performance

  • ~6s per PR and ~2s for full audit (per description) against a small codebase is acceptable.
  • The full graph is built even for PR review (only changed files would not be enough to compute cross-package context). That is the correct trade-off.
  • graphify-out/ at ~9MB adds to the cache footprint but only for the structure suite. Fine.

Minor / nits

  • agentic-ci-pr-review.yml:166: pip install graphifyy --quiet with no version pin. Recommend pinning (e.g., graphifyy==0.4.23, the version the PR says was tested against) to match the existing graphify API signatures verified against graphifyy==0.4.23 claim in the description. Same for the daily workflow.
  • structural_impact.py:9-12: docstring usage examples are helpful; consider adding --full --previous-graph ... example too.
  • _fmt_gods mixes bullet-list output (affected) and table output (full) in one function via a flag — two small formatters would read more cleanly, but this is purely cosmetic.
  • Recipe change in pr-review/recipe.md asks the agent to append structural impact before the Verdict section. The skill's Step 3.5 also tells the agent to use it to calibrate review depth. These two instructions are compatible but slightly duplicative — confirm the agent applies both (use during review, then append the raw report). Not a blocker.

Verdict

Approve with minor follow-ups. The core design — trusted-path execution of a pure-AST tool with --repo-root pointing at the untrusted checkout — is sound, the fallback story (continue-on-error) is correct, and the integration into both review and audit surfaces is clean.

Recommended before merge:

  1. Quote/array-expand $CHANGED_PY in the workflow (latent filename-handling bug).
  2. Pin graphifyy version in both pip install steps.
  3. Either set pipefail or drop | tail -3 around pip install so install failures surface cleanly.

Recommended after merge:
4. Add a minimal pytest for _get_package, _dedup, and the risk-level branches in _changed_files_mode — graphify API drift is the most likely future break.
5. Replace the dict return type of _build_graph with a TypedDict/dataclass for proper typing.

No blocking issues; security model is correct; retroactive validation supports the signal claim.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR adds a graphify-based structural impact analysis tool that builds a directed AST graph of the codebase and surfaces architectural risk (LOW/MEDIUM/HIGH) before the review agent runs, integrating into both the PR-review workflow and the weekly structure audit. The security model is sound — structural_impact.py always executes from the base-branch checkout so fork PRs cannot inject code that runs with secrets in scope.

  • P1 — silent empty analysis: _changed_files_mode calls extract(changed_files) independently from the full graph build to obtain changed_node_ids. If graphify's node IDs are not stable across separate invocations, every downstream lookup (affected_gods, affected_communities, cross-package edges) silently returns empty, and all PRs report LOW risk with zero findings. Deriving the ID set from G.nodes() filtered by source_file removes this implicit dependency.

Confidence Score: 4/5

Safe to merge after confirming the changed_node_ids derivation is robust across graphify invocations.

All three previously raised issues (version pinning, _dedup truncation, MEDIUM risk reason) were addressed. One new P1 remains: the correctness of the analysis depends on graphify producing stable node IDs across independent extract() calls. If that assumption holds, the tool is functionally correct as validated on three retroactive PRs. The fix is a small, self-contained change and should be confirmed before merge.

structural_impact.py line 196 — changed_node_ids derivation via separate extract call.

Important Files Changed

Filename Overview
.agents/tools/structural_impact.py New 361-line AST analysis tool using graphify; implements PR (--changed-files) and full audit (--full) modes with risk scoring, god-node detection, and import-violation reporting. MEDIUM risk reason still reports cluster count even when the high-connectivity trigger fires (open prior comment).
.github/workflows/agentic-ci-pr-review.yml Adds a pre-agent structural analysis step that installs graphifyy==0.4.23 in a throw-away venv and runs the tool from the base-branch checkout; security model for fork-PR isolation is correct, continue-on-error: true ensures graceful degradation.
.github/workflows/agentic-ci-daily.yml Adds conditional graphifyy==0.4.23 install for the structure suite and extends the cache path to include graphify-out/ for week-over-week graph diffs; changes are minimal and correct.
.agents/recipes/pr-review/recipe.md Step 1 now reads the pre-computed structural impact file if present and appends it under a ### Structural Impact heading; wording is clear and the conditional guards are correct.
.agents/recipes/structure/recipe.md Adds Section 4 (graphify structural analysis) with bash snippet, god-node tracking guidance, and baseline persistence instructions; output template extended with graph summary metrics.
.agents/skills/review-code/SKILL.md Adds Step 3.5 that reads the structural impact file and calibrates review depth; instructions are well-structured with clear HIGH/LOW risk guidance.
.gitignore Adds graphify-out/ to gitignore to exclude the 9MB CI-only graph cache; correct and minimal.

Sequence Diagram

sequenceDiagram
    participant GH as GitHub Actions
    participant Base as base-agents/ (base branch)
    participant Venv as /tmp/graphify-venv
    participant Tool as structural_impact.py
    participant WS as github.workspace (PR code)
    participant TMP as /tmp/structural-impact-{PR}.md
    participant Agent as Claude Agent

    GH->>Base: sparse-checkout .agents/recipes + .agents/tools
    GH->>WS: checkout PR HEAD (full)
    GH->>GH: gh pr diff --name-only | grep .py
    GH->>Venv: python -m venv + pip install graphifyy==0.4.23
    Venv->>Tool: run base-agents/.agents/tools/structural_impact.py
    Note over Tool: --repo-root=WS --changed-files=[...]
    Tool->>WS: _collect_source_files() → all .py in packages/
    Tool->>Tool: extract(all_files) → build_from_json → G
    Tool->>Tool: extract(changed_files) → changed_node_ids
    Tool->>Tool: god_nodes, cluster, cross_package_edges
    Tool->>TMP: write risk report (LOW/MEDIUM/HIGH)
    GH->>Agent: start with recipe prompt
    Agent->>TMP: read structural impact
    Agent->>Agent: calibrate review depth by risk level
    Agent->>GH: post review with Structural Impact section
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .agents/tools/structural_impact.py
Line: 196

Comment:
**`extract(changed_files)` called independently — nodes may not exist in `G`**

`changed_node_ids` is populated by calling `extract` on only the changed files, but `G` was built from `extract(all_files)` inside `_build_graph`. Every downstream lookup — `affected_gods`, `affected_communities`, `high_impact`, `_cross_package_edges(G, changed_node_ids)` — silently produces empty results if graphify assigns IDs that are not stable across independent invocations (e.g. sequential integers scoped to a single extraction run). The analysis would always report `LOW` risk with zero findings and no error.

A more robust approach is to derive `changed_node_ids` from the graph that was already built, matching on `source_file`:

```python
changed_paths = {str(p) for p in changed_files}
changed_node_ids = {
    nid for nid in G.nodes()
    if G.nodes[nid].get("source_file") in changed_paths
}
```

This removes the implicit assumption on graphify's ID stability and eliminates the redundant second extraction pass.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (3): Last reviewed commit: "fix(ci): use array expansion for changed..." | Re-trigger Greptile

Comment thread .github/workflows/agentic-ci-pr-review.yml Outdated
Comment thread .github/workflows/agentic-ci-daily.yml
Comment thread .agents/tools/structural_impact.py Outdated
Pin graphifyy==0.4.23 in both CI workflows to prevent
breaking changes from unpinned installs. Fix _dedup()
label truncation at 30 chars that could merge distinct
entities sharing a common prefix.
Comment thread .agents/tools/structural_impact.py Outdated
…filenames

Replace unquoted $CHANGED_PY word-split with mapfile + array
expansion to prevent glob expansion and correctly handle
filenames containing spaces or special characters.
Comment thread .agents/tools/structural_impact.py Outdated
Derive changed_node_ids from the already-built graph by matching
source_file paths instead of running a separate extraction pass.
Removes implicit dependency on graphify ID stability across
independent extractions.

Fix MEDIUM risk reason to reflect the actual trigger (cluster
spread vs high-connectivity entity) instead of always reporting
cluster count.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile encountered an error while reviewing this PR. Please reach out to support@greptile.com for assistance.

…stale artifacts

Split the workflow step to isolate GH_TOKEN from graphifyy execution,
preventing a compromised package release from exfiltrating write-scoped
tokens.

Scan both edge directions in _cross_package_edges so inbound dependents
and violations where the changed node is the target are visible. Detect
deleted files and report them as a risk signal.

Include relation type in dedup key so distinct edge types between the
same labels are not collapsed.

Clean stale /tmp artifacts before running analysis to prevent reruns
from reading old reports.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: agentic CI - automated PR reviews and scheduled maintenance

1 participant