Skip to content

Enhanced measurement#16

Merged
kenjudy merged 14 commits into
mainfrom
enhanced-measurement
Mar 26, 2026
Merged

Enhanced measurement#16
kenjudy merged 14 commits into
mainfrom
enhanced-measurement

Conversation

@kenjudy
Copy link
Copy Markdown
Contributor

@kenjudy kenjudy commented Mar 26, 2026

No description provided.

kenjudy and others added 14 commits March 26, 2026 10:41
Implements DORA capability coverage for version control practices and
small-batch working via zero-dependency hand-rolled statistics:

- computeStatistics(): quantile (p50/p90/p95), stddev, linear regression
  trend (growing/stable/shrinking), outlier detection (2σ)
- computeVelocity(): commits/day, velocity trend (accelerating/stable/decelerating)
  via first-half vs second-half rate comparison
- scoreMessageQuality(): conventional commit regex OR ≥10-word threshold
- classifyDoraArchetype(): priority-ordered team archetype classification
  (harmonious-high-achiever, foundational-challenges, legacy-bottleneck, mixed-signals)

Adds 4 new CONFIG keys: MESSAGE_QUALITY_MIN_WORDS, AI_ANALYSIS_MAX_COMMITS,
AI_DIFF_MAX_CHARS, AI_RISK_ADDITIONS_RATIO (groundwork for D3 Claude integration).

Summary JSON gains 14 new fields: p50/p90/p95/stddev_lines_changed,
p50/p90_files_changed, commit_size_trend, velocity_commits_per_day,
velocity_trend, additions_ratio_median/p90, message_quality_pct, dora_archetype.

Updates measuring-ai-code-drift-using-github-metrics.md as attributed blog
article with DORA 2025 AI Amplifier Effect research, paradox numbers,
and Option 3 (Claude API diff-level analysis). Adds metrics-specification.md
as technical reference with full DORA capability coverage map, metric
formulas, thresholds, gaps, and output format documentation.

89 tests passing, typecheck clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements optional Claude analysis for high-risk commits — runs only
when ANTHROPIC_API_KEY is set, degrades gracefully otherwise.

New functions in local-code-metrics.js:
- getCommitDiff(): fetches git show --stat + full diff, truncated to
  AI_DIFF_MAX_CHARS for API cost control
- selectClaudeCommits(): pre-filters to large commits with
  additions > deletions × AI_RISK_ADDITIONS_RATIO, sorted by churn,
  capped at AI_ANALYSIS_MAX_COMMITS (5)
- analyzeWithClaude(): sequential API calls using claude-sonnet-4-6,
  structured JSON output (ai_confidence, risk_score, patterns,
  architectural_concerns, summary), per-commit error isolation
- getAnthropicClient(): conditional require() wrapped in try/catch;
  returns null when key absent, warns if SDK missing
- CLAUDE_SYSTEM_PROMPT: module-level constant for AI pattern and
  architectural concern detection

Integration in collectLocalMetrics():
- Annotates CommitMetric objects in-place with Claude fields
- Writes local_claude_analysis.json when results exist
- Adds CLAUDE AI ANALYSIS console section after RECOMMENDATIONS
- Logs skip message when ANTHROPIC_API_KEY not set

Tests use jest.mock with { virtual: true } — no npm install required.
CommitStats typedef extended with optional Claude fields to satisfy
@ts-check. 113 tests passing, typecheck clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Adds message quality %, additions ratio, and statistical distribution
  fields to the Key Metrics table
- Documents DORA archetype classification and its four archetypes
- Documents optional Claude API integration: pre-filter logic, output
  file, graceful degradation when ANTHROPIC_API_KEY absent
- Expands Configuration section to a table covering all new CONFIG keys
- Notes Node ≥18 requirement and local_claude_analysis.json output file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Strips all emoji from README.md, CLAUDE.md, measuring-ai-code-drift-
using-github-metrics.md, and metrics-specification.md.

Replaces em-dash interjections with traditional punctuation throughout:
- Prose parentheticals: commas or parentheses
- Trailing elaborations: colon or new sentence
- Definition separators in code blocks and tables: colon
- "A -- B" connectors: semicolon or period

Also updates README.md with correct filenames (local-code-metrics.js,
code-metrics.yml, pr-metrics.yml) and adds Node 18+ requirement,
message quality and additions ratio metrics, DORA archetype table,
and reference to metrics-specification.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
code-metrics.yml:
- Inline scoreMessageQuality(), computeStats(), computeVelocityTrend(),
  classifyDoraArchetype() helpers (mirrors local-code-metrics.js logic)
- Per-commit message_quality flag added to metrics array
- Summary gains: p50/p90/p95/stddev_lines_changed, p50/p90_files_changed,
  velocity_trend, additions_ratio_median/p90, message_quality_pct,
  dora_archetype
- Issue body redesigned as a metric table with Status column, adds
  commit size distribution table and DORA archetype section with
  per-archetype description; no emoji

pr-metrics.yml:
- Per-commit additions_ratio and message_quality fields added
- Aggregates: largePct, sprawlingPct, testFirstPct, msgQualityPct,
  medianAdditionsRatio computed for DORA assessment
- New concerns: additions ratio >3.0, message quality <40%
- New strengths: message quality, test-only commits
- DORA Capability Assessment section with archetype classification
  and per-metric table
- PR comment redesigned as a metrics table; no emoji shortcodes
- Removed PDCA Framework Alignment section

113 tests passing, typecheck clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CONVENTIONAL_COMMIT_RE is a module-level const defined after
collectLocalMetrics(). Because async functions with only execSync calls
run synchronously, the entry point at line ~564 triggered the function
before CONVENTIONAL_COMMIT_RE was initialized, causing a TDZ error.

Moving if (require.main === module) to after module.exports ensures all
constants and helpers are fully initialized before any execution path
reaches scoreMessageQuality().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
computeVelocity did not sort input dates; git log outputs newest-first,
producing a negative time span and negative commits_per_day. Sort ms
array ascending before computing span. Add regression test covering
newest-first date order.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
large_commits_pct, sprawling_commits_pct, and test_first_pct were
computed inline in the summary object literal and then recomputed
identically to pass to classifyDoraArchetype. Extract as local variables
computed once before the summary object is built.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract four focused modules, each under 150 lines:
- lib/config.js     — CONFIG object (single source of truth for thresholds)
- lib/git.js        — runGitCommand, parseGitLog, isTestFile, analyzeCommit, getCommitDiff
- lib/statistics.js — computeStatistics, computeVelocity
- lib/metrics.js    — scoreMessageQuality, classifyDoraArchetype, generateInsights
- lib/claude.js     — CLAUDE_SYSTEM_PROMPT, getAnthropicClient, selectClaudeCommits, analyzeWithClaude

local-code-metrics.js becomes the orchestration entry point (372 lines,
down from 802). All public exports are re-exported from the entry point
so all existing test imports remain unchanged. The three-component
architecture (local script, code-metrics workflow, pr-metrics workflow)
is unchanged — lib/ is internal to the local script only.

Add // @ts-nocheck to lib files and exclude lib/ in tsconfig.json;
TypeScript follows require() transitively from local-code-metrics.js
and the lib files use loose object types intentionally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add jest.mock('../lib/claude') to collectLocalMetrics.test.js so the
Claude integration path (getAnthropicClient returns a client, results
annotated back into metrics, local_claude_analysis.json written) can
be tested without a real API key or installed SDK.

Two new tests:
- annotates metrics and writes local_claude_analysis.json when Claude returns results
- logs Claude analysis section to console when metrics are annotated

Default beforeEach sets getAnthropicClient.mockResolvedValue(null) so
all existing tests remain unaffected.

Line coverage: 91.3% → 95.6%
Function coverage: 92.18% → 96.42%

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Document the lib/ internal module structure under architecture section.
Update configuration section to point to lib/config.js as the single
source of truth for thresholds and TEST_FILE_PATTERNS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add dotenv so ANTHROPIC_API_KEY can be set in a .env file rather than
requiring a shell export before each run. dotenv.config() is called at
startup; if no .env file exists it fails silently.

Also upgrades eslint and eslint-plugin-jest to versions compatible with
ESLint v9 flat config (npm install dotenv had downgraded eslint to v4
via --legacy-peer-deps, breaking the flat config format).

Add .env and .env.local to .gitignore. Add .env.example with usage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add quiet: true to dotenv.config() to suppress [dotenv@17] console
  output during test runs
- Restore jest from ^25.0.0 to ^29.7.0 — inadvertently downgraded
  during ESLint repair via --legacy-peer-deps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

PR Analysis

Size: extra-large (based on production code)
Production Code: 3527 lines (18 files)
Test Code: 564 lines (7 files)
Total: 4091 lines (25 files)
Test-to-Production Ratio: 0.16:1

Concerns

  • Very large production changes - consider breaking into smaller PRs
  • Many production files changed - possible scope creep
  • 8/14 commits exceed 100 production lines

Strengths

  • Includes refactoring or cleanup work
  • 1 test-only commits
  • Message quality 64% meets discipline threshold

Commit Analysis

Total Commits: 14
Average Commit Size: 1191 production lines
Average Files per Commit: 2.9

Metric Value
Large commits (>100 prod lines) 8/14 (57%)
Sprawling commits (>5 files) 2/14 (14%)
Test-first discipline 3/14 (21%)
Message quality 9/14 (64%)
Median additions ratio 2.94
Test-only commits 1
Production-only commits 10

Test Coverage

Test Adequacy: needs-improvement

  • Low test coverage ratio - consider adding more tests

Target ratio: 0.5-2.0 test lines per production line

DORA Capability Assessment

Archetype: foundational-challenges
Weak testing or batch discipline detected. Consider strengthening practices before scaling AI usage.

Capability Metric Value Target
Small Batches Large commit % 57% <20%
Small Batches Sprawling commit % 14% <10%
Version Control Test-first discipline 21% >50%
Version Control Message quality 64% >60%
AI Risk Signal Additions ratio (median) 2.94 <3.0
Automated by Code Metrics Workflow

@kenjudy kenjudy merged commit 267fa1d into main Mar 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant