docs: Add AI docs & improve lib docs overall#84
Merged
Conversation
Schelk-style root-level pointer document for agents and LLMs landing in the repo. Catalogs where to look for each topic (SPEC, RUNBOOK, ARCHITECTURE, examples, --help), lists the three load-bearing flags (--client, --spec, --target-size), and provides task-shaped recipes for the most common operations (generate per client, add ERC-20 / 7702 EOA, verify, reproduce CI failures, diagnose state-root divergence). Constraints & gotchas section front-loads the --seed=0 footgun, the Docker requirement for cgo clients, and the --spec + --accounts=0 --contracts=0 pairing that suppresses synthetic-fill collisions.
spec-minimal.yaml: smallest valid --spec — one EOA with explicit address and balance. Verified end-to-end: generates a 18 KB geth DB with state root 0x123d9412…cfd in 244ms. examples/README.md catalogs the existing 4 spec files (minimal, erc20-mixed-sizes, eoa-bloat, full-matrix-spec-feature) with a "pick-by-intent" decision tree, cross-links docs/SPEC.md for the schema, and documents the --accounts=0 --contracts=0 pairing recommended whenever exact spec-derived addresses matter.
RUNBOOK.md is the new canonical "what to type after state-actor exits" document. One section per client (geth / reth / besu / nethermind), each one verbatim from client/<c>/e2e_test.go's TestE2ESuite — the authoritative source now that the bench-orchestration scripts moved out of the tree. Every recipe is size-agnostic. No "at scale" section, no GB-tiered tuning tables, no wall-clock numbers. Reth's vm.max_map_count + ulimit hygiene is presented as MDBX-requirement-always, not as scale-conditional advice. A closing note explicitly disclaims size-bucketing so future edits don't reintroduce it. SPEC.md gains a "Sizing your generation" subsection (the entity-budget-to-flag pattern salvaged from the soon-to-be-deleted 300GB-BLOAT-PLAN.md, without the GB framing) and a Quick-start cross-link to RUNBOOK.md + examples/README.md.
Refactors the README around four client choices + the three load-bearing flags (--client, --spec, --target-size) rather than the old geth-only narrative. Every example uses real, currently-shipped flag names. Removed: - --inject-accounts references (the flag was deleted in v1.5). - --genesis references (the flag was deleted; chainspec is synthesized from --chain-id / --fork / --gas-limit / --timestamp / --extra-data). - "Recommended Configurations" size buckets (small / CI / mainnet-like / max-throughput) — replaced by three Quick-start examples each targeting one user intent. - "Distribution Types" section (folded into the flag table row for --distribution). - Database Schema's geth-snapshot-only assumption. - Per-scale performance table (size-bucketed, drift-prone). Added: - "Boot a client against the generated DB" section linking each of the four RUNBOOK.md client recipes. - Pointer-block in the header banner: AGENTS.md, RUNBOOK.md, SPEC.md. - "When NOT to use" section (schelk pattern). - Compact 22-row flag reference table; docstring says --help is the canonical list. - --seed=0 footgun note inline with the flag table.
- File Structure section: re-enumerated the current internal/ subtree
(spec, specbuild, templates, sizecal, streamingtrie, streamsort,
entitygen, clientpolicy, syscontracts, genesisheader, oracle, reth,
neth, engineapi, e2e_testing, rpcprobe, testhex) and the full
client/{geth,reth,besu,nethermind}/ set. The previous tree predated
internal/ entirely.
- Data Flow Initialization snippet: replaced the deleted
genesis.LoadGenesis() example with the --spec workflow
(BuildSynthetic → ParseFile → Validate → specbuild.Build →
Config.PreAlloc). The deleted code path was confusing readers and
failing to compile if anyone copy-pasted it.
- Client Adapters section: dropped "client/reth/ exists on a parallel
branch" (both reth and besu have been on main for months); enumerated
all four adapters with their on-disk layouts.
- Performance Characteristics: dropped the per-operation throughput
table (size-bucketed numbers from a geth-only era). Replaced with a
one-paragraph note explaining the streaming-pipeline RAM bound and
pointing readers at --benchmark --verbose for current numbers on
their own hardware.
Replaces the geth-only narrative with the four-client framing introduced in v1.5. Removed: - Every --genesis genesis.json reference (the flag was deleted; chainspec is synthesized from --chain-id / --fork / --gas-limit / ...). - "Large State (Terabyte Scale)" section (size-bucketed; the library handles every size the same way). - Trailing "Performance Issues" subsection that was cut off mid-bullet. Added: - ERC-20 spec example as the Method 1 walkthrough (uses examples/spec-erc20-mixed-sizes.yaml verbatim). - "Picking the right --client" matching the participant's el_type. - Cross-link to RUNBOOK.md for per-client boot flags + the besu/neth "engine-API + mock CL" pattern. - Starlark module signature updated for the current flag set (client, spec_file, target_size, seed default = 1 with the footgun warning). - Method 1 example sets --db with the /geth/chaindata suffix geth expects (matches client/geth/e2e_test.go's layout).
- docs/300GB-BLOAT-PLAN.md: filename bakes a size into the doc, which contradicts the size-agnostic principle. The entity-budget-to-flag pattern worth keeping moved into docs/SPEC.md under "Sizing your generation"; the rest (--genesis, --commit-interval, the geth-wrapper Step 5) referenced flags that no longer exist. - docs/RETH_LARGE_SCALE.md (untracked): same problem — filename + content are explicitly size-classed. The size-agnostic operational hygiene (vm.max_map_count, ulimit -n) moved into docs/RUNBOOK.md#reth as plain "this is how you run reth" guidance. Wall-clock numbers and GB-tiered example invocations dropped — they rot fast and prime readers to expect different behaviour at different sizes the code doesn't actually exhibit. - docs/superpowers/ (untracked): three completed-slice planning artifacts for the reth direct-MDBX writer. Internal planning notes, not user docs.
While running the spec-gallery verification I found that
internal/specbuild/build.go:143 (truncateForTargetSize) IS applied to
spec entities — when the spec's projected trie footprint exceeds
--target-size, the entity list is silently truncated to the longest
prefix that fits and a warning is emitted on stderr. My previous text
("spec entities are never capped") was wrong on three doc pages.
- AGENTS.md gotchas: rewrite the --target-size bullet.
- README.md "Cap the database size" section: rewrite.
- README.md flag table: rewrite the --target-size row.
- docs/SPEC.md "Composability" section: replace the "Config.Validate()
fails loudly" claim (which doesn't match the code) with the actual
truncate-and-warn behaviour.
Also swap the README spec-driven Quick-start example from
spec-erc20-mixed-sizes.yaml (16 GB total, would silently truncate
against a small --target-size) to spec-minimal.yaml, so the example
runs faithfully without --target-size and produces the expected
single-EOA database.
Follow-up to the schelk-style docs overhaul that originally landed in this branch. Addresses 7 findings from a review focused on agent exposure: F1 (load-bearing): AGENTS.md is now a 26-line pointer to docs/SKILL.md, matching schelk's actual pattern. The 125 lines of self-contained content moved to docs/SKILL.md so the role of each file is clean: AGENTS.md = discovery, SKILL.md = instruction. F2: Inline YAML snippets in AGENTS.md (including one with invalid YAML — `code: "0xef0100" + "<20-byte delegate>"` was Python string concat) are replaced in docs/SKILL.md with a feature → fixture-entity-# map that points agents at examples/full-matrix-spec-feature.yaml as the canonical syntax reference. One place to maintain syntax; CI verifies it; docs follow. F3: docs/RUNBOOK.md's troubleshooting table gained per-row cross-links into the relevant client section (Geth / Reth / Besu / Nethermind) and into the new Cross-client determinism section. F4: docs/SKILL.md surfaces the 9 package-level doc.go files via two cross-link clusters — task recipes link to the doc.go relevant to that task, and the "When asked to extend" section maps each extension type to its anchor doc.go. F5: Added CLAUDE.md (2 lines) so Claude Code's discovery convention finds the agent-doc surface natively. F6: examples/README.md gains a "Field-by-field cheatsheet" appendix mapping common edit intents to YAML fields (10 rows; covers everything an agent adapting full-matrix-spec-feature.yaml would touch). F7: Added `## Cross-client determinism` H2 to docs/ARCHITECTURE.md to fix the previously-dead anchor that AGENTS.md/SKILL.md/RUNBOOK.md all link to. The section documents the three-layered mechanism (address derivation, sizecal calibration, syscontract preamble) + the CI keystone job that re-asserts the invariant. Verification: - AGENTS.md: 26 lines (≤ 50 budget) - SKILL.md: 148 lines - CLAUDE.md: 2 lines - Dead-link sweep: 0 broken targets across AGENTS / CLAUDE / README / docs/ / examples/ - Size-discrimination sweep: 0 matches outside the intentional RUNBOOK.md disclaimer - ARCHITECTURE.md#cross-client-determinism anchor: resolves
…erence Follow-up to the prior schelk-mirror commit (870bacf). In review the user pointed out that even though the F2 fix moved spec syntax out of inline YAML and pointed at the fixture, the fixture was still treated as one cross-reference among many rather than as THE canonical anchor. The fixture has a structural property prose can't match — CI keeps it correct via TestBuildFullMatrix + per-client TestE2ESuite + cross-client-genesis-root invariant. Docs should lean on the self-healing surface as the primary anchor, not bury it. Changes: - docs/SKILL.md: gains YAML frontmatter (name + description, schelk + superpowers convention). Restructured around a new "Read these first" section listing the fixture as item #1 with bold callout. New "Canonical spec reference" H2 (replaces "Write a spec" subsection + "Spec recipe builder" at the bottom) — names the fixture as load-bearing, traces the CI-invariant chain, includes a section-banner table (which feature each banner pins), keeps the intent → entity-# index, and folds the recipe-builder steps into a "How to adapt" paragraph. - AGENTS.md: > [!IMPORTANT] callout immediately after the SKILL.md pointer naming the fixture as the canonical syntax reference. Table row stays for completeness but its phrasing softens to "Canonical syntax reference (every feature, CI-pinned — see callout above)". - docs/SPEC.md: > [!IMPORTANT] callout at the top establishing the SPEC.md ↔ fixture split: SPEC.md is the schema reference (parser, validation, address algorithm), fixture is the syntax reference (every shape in practice). - README.md: > [!TIP] agent-bootstrap callout (matching schelk's pattern) telling Claude/Codex/Cursor/etc. where to start. - examples/README.md: collapse the redundant "Choose your spec" bullet list into a one-line pointer back to SKILL.md's canonical-spec-reference section. The fixture YAML itself is unchanged — adding "see SKILL.md" comments would create a circular reference. Docs adapt around the fixture; the fixture stays the source of truth. Verification: - AGENTS.md 36 lines (≤ 50 budget) - SKILL.md 170 lines (≤ 250 budget) - YAML frontmatter parses (name=state-actor, description present) - ## Read these first + ## Canonical spec reference anchors exist - examples/README.md → docs/SKILL.md#canonical-spec-reference resolves - examples/full-matrix-spec-feature.yaml: git diff origin/main = empty - Dead-link sweep: 0 - TestBuildFullMatrix passes
…S legacy honesty, --target-size everywhere) Follow-up to commit 0c22dbb addressing the 10 critical + 9 important findings from the multi-agent PR review (code-reviewer + comment-analyzer + code-simplifier). Each fix corresponds to a finding number from the review summary. ## Critical fixes C1, C2: sizecal calibration story (SPEC.md, ARCHITECTURE.md, README.md). The previous text claimed `internal/sizecal/factors.json` exists and has "per-client calibration factors". Both are wrong — calibration is Go constants in factors.go, and `BytesPerSlot(_ string)` / `BytesPerAccount(_ string)` ignore the client argument by design (per internal/sizecal/doc.go: "Identical across clients — required by the cross-client genesis-root invariance gate"). The point of the invariant is that the constant is GLOBAL, not that each client has a matched factor; the docs now reflect that. C3: CI reproduction numbers (docs/SKILL.md). CI actually uses --accounts=100 --contracts=15_000 (mirrored across all four TestE2ESuite constants); the doc said 0/0, which would not match CI. C4: TestBuildFullMatrix claim (AGENTS.md + SKILL.md + the test docstring itself). The test asserts COUNT equality across clients, not byte-identical PreAlloc slices. Byte-identity is enforced downstream by the cross-client-genesis-root aggregator. Fixed the three places that overstated the test's scope, including the test's own docstring so future readers don't get the wrong picture from the source. C5: KURTOSIS Starlark legacy honesty (docs/KURTOSIS.md). integration/stategen_launcher.star predates the multi-client + --spec work. It calls a `stategen:latest` Docker image (not built by this repo) and passes flags that no longer exist on state-actor (--genesis, --batch-size). The "Method 2 — Starlark module" section documented an aspirational API; anyone copy-pasting would hit Starlark argument-name errors. Replaced with the actual signature + a > [!WARNING] block explaining that the module will not run as-is. C6, C7, C8: --target-size regression text (README.md:34 + 158, KURTOSIS.md target_size comment). Commit 90f3f05 corrected most call sites; these three slipped through. C9: image tag pinning (docs/RUNBOOK.md + KURTOSIS.md). Pinned all four boot recipes: - geth: ethereum/client-go:v1.17.2 - reth: ghcr.io/paradigmxyz/reth:nightly@sha256:e528857e... (matches internal/reth/constants.go's PinnedRethRelease) - besu: hyperledger/besu:25.11.0 + a > [!WARNING] explaining why 26.x breaks the recipe (--miner-enabled removed) - nethermind: nethermind/nethermind:1.37.0 C10: besu boot path. RUNBOOK.md said "/data/<rocksdb files>"; besu writes to /data/database/. Corrected. ## Important fixes I1: reth test-file path (SKILL.md). reth's TestE2ESuite is in oracle_test.go, not e2e_test.go. Fixed to enumerate all four files individually so future readers don't assume the same name pattern across clients. I2: clientpolicy validation list (ARCHITECTURE.md). FlagValues in internal/clientpolicy/policy.go has BinaryTrie / TargetSize / Fork only; --archive rejection lives in main.go. Doc tree comment fixed to match. I3: docs/ tree listing (ARCHITECTURE.md). Missing SKILL.md. Added. I4: deleted "Sizing your generation" subsection (SPEC.md). Added by the prior 300GB-BLOAT-PLAN salvage but immediately contradicts the size-agnostic principle from RUNBOOK.md:286-288 (GB-tiered byte numbers; "stale within months"). The numbers also conflicted with sizecal's actual constants (the table said ~45 B per EOA; bytesPerAccount = 175). Best resolved by deletion. I5: ARCHITECTURE.md CLI-Layer diagram. Still said "Load genesis.json (optional)" — the --genesis flag was deleted. Updated to match the prose at line 99 ("Synthesize chainspec from --chain-id/--fork/--gas-limit/..."). I6: oracle/ description (ARCHITECTURE.md). "Reproduce-from-config RNG + RPC oracle utilities" — internal/oracle/ only has devkeys.go and reproduce.go; the RPC oracle lives in internal/e2e_testing/. Fixed. I7: SKILL.md intent index. Added rows for entity 6 (ERC-20 with explicit nonce override) and entity 22 (Plain EOA with omitted-nonce default) — both were unreferenced before. I8: main.go --target-size --help text. AGENTS.md declares --help canonical, but --help still said "Stop condition only — set --accounts/--contracts/--min-slots/--max-slots explicitly", which contradicted every doc in the repo. Rewrote to describe the actual truncate-and-warn behaviour from internal/specbuild/build.go:143. I9: SKILL.md anchor link. SPEC.md § Address resolution → added the explicit anchor (#address-resolution-three-deterministic-modes). ## Verification - AGENTS.md 37 lines (≤ 50 budget); SKILL.md 172 lines (≤ 250). - `go run . --help` reconciled with the new --target-size text. - `grep "Stop condition only|soft cap on synthetic fill"` → 0 matches. - `grep "factors.json|per-client calibration factor"` → 0 matches. - Only :latest mention is the intentional callout about the broken legacy stategen_launcher.star in KURTOSIS.md. - TestBuildFullMatrix passes. - examples/full-matrix-spec-feature.yaml unchanged. - Dead-link sweep: 0. - go build ./... clean.
CPerezz
added a commit
to CPerezz/state-actor
that referenced
this pull request
May 22, 2026
Pulls in the AI docs + general doc improvements from ethereum#84 (AGENTS.md, CLAUDE.md, RUNBOOK.md, SKILL.md, examples/README.md, examples/spec-minimal.yaml, plus rewrites of README.md / ARCHITECTURE.md / KURTOSIS.md / SPEC.md), then edits them to match the autofill rewrite from this branch: - README.md: CLI flag table drops the 6 removed flags (--accounts, --contracts, --max-slots, --min-slots, --distribution, --code-size); --target-size row now documents the required-when-no-spec rule and the 20/10/70 split. Quick start + spec examples drop the --accounts=0 --contracts=0 workaround. - docs/SPEC.md: "Composability with --target-size" section replaces the synthetic-fill knob list; emphasizes the spec/auto-fill composition. - docs/ARCHITECTURE.md: Config example snippet swaps NumAccounts/ NumContracts for AutoFill *autofill.Plan + TargetSize. - docs/KURTOSIS.md: drops --accounts=0 --contracts=0 from quick- integration; documents the legacy Starlark module's removed flags explicitly so users grepping for them find the deprecation note. - docs/RUNBOOK.md: 4 per-client boot recipes replace --accounts=100 --contracts=15000 --code-size=128 --min-slots=5 --max-slots=50 with --target-size=100MB. - docs/SKILL.md: spec-adaptation guide + common-tasks section now describe the auto-fill 20/10/70 default and the omit-target-size- to-avoid-collisions pattern; CI repro section updated. - examples/README.md + examples/spec-minimal.yaml + spec-eoa-bloat + spec-erc20-mixed-sizes: drop --accounts=0 --contracts=0 from invocations; spec-only runs are now collision-free by default. - CHANGELOG.md: re-added the Breaking section documenting the 6 removed flags + --target-size requirement + golden regeneration. - docs/300GB-BLOAT-PLAN.md: deleted (also gone on main). AGENTS.md and CLAUDE.md were taken from main verbatim — neither mentions the removed flags. Refs ethereum#82, ethereum#84.
CPerezz
added a commit
that referenced
this pull request
May 24, 2026
Brings in main's docs/AI-docs overhaul (#84) and the comprehensive-spec parity gate (#83). Conflict resolution: - Code (client/*/e2e_test.go, internal/e2e_testing/runphases.go, main.go): took ours — main is still on the pre-autofill API (NumAccounts/ NumContracts/MinSlots/MaxSlots in test fixtures); this PR's autofill.Plan rewrite is the substantive update. - examples/* (spec-minimal.yaml, README.md): took ours — both files in main still reference the removed --accounts/--contracts flags. - docs/{README,ARCHITECTURE,KURTOSIS,RUNBOOK,SKILL,SPEC}.md: took ours — same reason; main's docs describe the old synthetic-fill flag set that this PR retired. - internal/specbuild/full_matrix_test.go: took theirs — main's docstring is more refined (clarifies that this test pins COUNT equality only, byte-identity is enforced downstream by the cross-client-genesis-root aggregator). Test body is identical.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Documentation overhaul modeled on tempoxyz/schelk: terse, task-oriented, agent-friendly. The end-state is that a new user — human or agent — can go from
git cloneto a booted client against a state-actor-generated DB in under ten minutes, using only the docs.Scope landed in 8 commits, each independently verifiable:
AGENTS.md— agent-facing entry doc: pointers table + 8 common-task recipes + constraints/gotchas (incl. the--seed=0footgun, the Docker requirement for cgo clients, and the--accounts=0 --contracts=0pairing recommended with--spec).examples/spec-minimal.yaml+examples/README.md— smallest-possible spec example + a "pick-by-intent" gallery cataloging the existing 4 YAMLs.docs/RUNBOOK.md— the missing piece: one CI-verified boot recipe per client (geth / reth / besu / nethermind), extracted verbatim from each client'sTestE2ESuite. Includes the besu/nethermind engine-API + mock-CL caveat and the rethvm.max_map_count+ulimit -nhygiene (presented as MDBX-requirement-always, not as size-conditional advice).README.md— schelk-style rewrite: three Quick-start examples, task-shaped Usage sub-sections, compact 22-row flag reference, schelk's "When NOT to use" section. Drops every reference to the deleted--inject-accountsand--genesisflags.docs/ARCHITECTURE.md— refresh File Structure to currentinternal/layout (17 subpackages, was 0); replace the deletedgenesis.LoadGenesis()data-flow snippet with the--specworkflow; enumerate all four client adapters; drop the size-bucketed Performance Characteristics table.docs/KURTOSIS.md— refresh for--spec+--client; quote one ERC-20 example; document the per-clientel_type → --clientmapping.docs/300GB-BLOAT-PLAN.md(filename baked in a size),docs/RETH_LARGE_SCALE.md(untracked, same problem),docs/superpowers/(internal planning artifacts).--target-sizeactually truncates spec entities (the previous text "spec entities are never capped" was wrong on three doc pages). Fixed in AGENTS.md / README.md / SPEC.md.