Skip to content

docs: Add AI docs & improve lib docs overall#84

Merged
CPerezz merged 11 commits into
mainfrom
docs/schelk-style-overhaul
May 22, 2026
Merged

docs: Add AI docs & improve lib docs overall#84
CPerezz merged 11 commits into
mainfrom
docs/schelk-style-overhaul

Conversation

@CPerezz
Copy link
Copy Markdown
Collaborator

@CPerezz CPerezz commented May 22, 2026

Summary

Documentation overhaul modeled on tempoxyz/schelk: terse, task-oriented, agent-friendly. The end-state is that a new user — human or agent — can go from git clone to a booted client against a state-actor-generated DB in under ten minutes, using only the docs.

Scope landed in 8 commits, each independently verifiable:

  1. AGENTS.md — agent-facing entry doc: pointers table + 8 common-task recipes + constraints/gotchas (incl. the --seed=0 footgun, the Docker requirement for cgo clients, and the --accounts=0 --contracts=0 pairing recommended with --spec).
  2. examples/spec-minimal.yaml + examples/README.md — smallest-possible spec example + a "pick-by-intent" gallery cataloging the existing 4 YAMLs.
  3. docs/RUNBOOK.mdthe missing piece: one CI-verified boot recipe per client (geth / reth / besu / nethermind), extracted verbatim from each client's TestE2ESuite. Includes the besu/nethermind engine-API + mock-CL caveat and the reth vm.max_map_count + ulimit -n hygiene (presented as MDBX-requirement-always, not as size-conditional advice).
  4. README.md — schelk-style rewrite: three Quick-start examples, task-shaped Usage sub-sections, compact 22-row flag reference, schelk's "When NOT to use" section. Drops every reference to the deleted --inject-accounts and --genesis flags.
  5. docs/ARCHITECTURE.md — refresh File Structure to current internal/ layout (17 subpackages, was 0); replace the deleted genesis.LoadGenesis() data-flow snippet with the --spec workflow; enumerate all four client adapters; drop the size-bucketed Performance Characteristics table.
  6. docs/KURTOSIS.md — refresh for --spec + --client; quote one ERC-20 example; document the per-client el_type → --client mapping.
  7. Prune: docs/300GB-BLOAT-PLAN.md (filename baked in a size), docs/RETH_LARGE_SCALE.md (untracked, same problem), docs/superpowers/ (internal planning artifacts).
  8. Verification-driven correction: --target-size actually truncates spec entities (the previous text "spec entities are never capped" was wrong on three doc pages). Fixed in AGENTS.md / README.md / SPEC.md.

CPerezz added 11 commits May 22, 2026 16:38
Schelk-style root-level pointer document for agents and LLMs landing in
the repo. Catalogs where to look for each topic (SPEC, RUNBOOK,
ARCHITECTURE, examples, --help), lists the three load-bearing flags
(--client, --spec, --target-size), and provides task-shaped recipes for
the most common operations (generate per client, add ERC-20 / 7702 EOA,
verify, reproduce CI failures, diagnose state-root divergence).

Constraints & gotchas section front-loads the --seed=0 footgun, the
Docker requirement for cgo clients, and the --spec + --accounts=0
--contracts=0 pairing that suppresses synthetic-fill collisions.
spec-minimal.yaml: smallest valid --spec — one EOA with explicit address
and balance. Verified end-to-end: generates a 18 KB geth DB with state
root 0x123d9412…cfd in 244ms.

examples/README.md catalogs the existing 4 spec files (minimal,
erc20-mixed-sizes, eoa-bloat, full-matrix-spec-feature) with a
"pick-by-intent" decision tree, cross-links docs/SPEC.md for the
schema, and documents the --accounts=0 --contracts=0 pairing recommended
whenever exact spec-derived addresses matter.
RUNBOOK.md is the new canonical "what to type after state-actor exits"
document. One section per client (geth / reth / besu / nethermind),
each one verbatim from client/<c>/e2e_test.go's TestE2ESuite — the
authoritative source now that the bench-orchestration scripts moved
out of the tree.

Every recipe is size-agnostic. No "at scale" section, no GB-tiered
tuning tables, no wall-clock numbers. Reth's vm.max_map_count + ulimit
hygiene is presented as MDBX-requirement-always, not as scale-conditional
advice. A closing note explicitly disclaims size-bucketing so future
edits don't reintroduce it.

SPEC.md gains a "Sizing your generation" subsection (the
entity-budget-to-flag pattern salvaged from the soon-to-be-deleted
300GB-BLOAT-PLAN.md, without the GB framing) and a Quick-start
cross-link to RUNBOOK.md + examples/README.md.
Refactors the README around four client choices + the three load-bearing
flags (--client, --spec, --target-size) rather than the old geth-only
narrative. Every example uses real, currently-shipped flag names.

Removed:
- --inject-accounts references (the flag was deleted in v1.5).
- --genesis references (the flag was deleted; chainspec is synthesized
  from --chain-id / --fork / --gas-limit / --timestamp / --extra-data).
- "Recommended Configurations" size buckets (small / CI / mainnet-like /
  max-throughput) — replaced by three Quick-start examples each
  targeting one user intent.
- "Distribution Types" section (folded into the flag table row for
  --distribution).
- Database Schema's geth-snapshot-only assumption.
- Per-scale performance table (size-bucketed, drift-prone).

Added:
- "Boot a client against the generated DB" section linking each of the
  four RUNBOOK.md client recipes.
- Pointer-block in the header banner: AGENTS.md, RUNBOOK.md, SPEC.md.
- "When NOT to use" section (schelk pattern).
- Compact 22-row flag reference table; docstring says --help is the
  canonical list.
- --seed=0 footgun note inline with the flag table.
- File Structure section: re-enumerated the current internal/ subtree
  (spec, specbuild, templates, sizecal, streamingtrie, streamsort,
  entitygen, clientpolicy, syscontracts, genesisheader, oracle, reth,
  neth, engineapi, e2e_testing, rpcprobe, testhex) and the full
  client/{geth,reth,besu,nethermind}/ set. The previous tree predated
  internal/ entirely.

- Data Flow Initialization snippet: replaced the deleted
  genesis.LoadGenesis() example with the --spec workflow
  (BuildSynthetic → ParseFile → Validate → specbuild.Build →
  Config.PreAlloc). The deleted code path was confusing readers and
  failing to compile if anyone copy-pasted it.

- Client Adapters section: dropped "client/reth/ exists on a parallel
  branch" (both reth and besu have been on main for months); enumerated
  all four adapters with their on-disk layouts.

- Performance Characteristics: dropped the per-operation throughput
  table (size-bucketed numbers from a geth-only era). Replaced with a
  one-paragraph note explaining the streaming-pipeline RAM bound and
  pointing readers at --benchmark --verbose for current numbers on
  their own hardware.
Replaces the geth-only narrative with the four-client framing introduced
in v1.5.

Removed:
- Every --genesis genesis.json reference (the flag was deleted; chainspec
  is synthesized from --chain-id / --fork / --gas-limit / ...).
- "Large State (Terabyte Scale)" section (size-bucketed; the library
  handles every size the same way).
- Trailing "Performance Issues" subsection that was cut off mid-bullet.

Added:
- ERC-20 spec example as the Method 1 walkthrough (uses
  examples/spec-erc20-mixed-sizes.yaml verbatim).
- "Picking the right --client" matching the participant's el_type.
- Cross-link to RUNBOOK.md for per-client boot flags + the besu/neth
  "engine-API + mock CL" pattern.
- Starlark module signature updated for the current flag set (client,
  spec_file, target_size, seed default = 1 with the footgun warning).
- Method 1 example sets --db with the /geth/chaindata suffix geth
  expects (matches client/geth/e2e_test.go's layout).
- docs/300GB-BLOAT-PLAN.md: filename bakes a size into the doc, which
  contradicts the size-agnostic principle. The entity-budget-to-flag
  pattern worth keeping moved into docs/SPEC.md under "Sizing your
  generation"; the rest (--genesis, --commit-interval, the geth-wrapper
  Step 5) referenced flags that no longer exist.

- docs/RETH_LARGE_SCALE.md (untracked): same problem — filename + content
  are explicitly size-classed. The size-agnostic operational hygiene
  (vm.max_map_count, ulimit -n) moved into docs/RUNBOOK.md#reth as
  plain "this is how you run reth" guidance. Wall-clock numbers and
  GB-tiered example invocations dropped — they rot fast and prime
  readers to expect different behaviour at different sizes the code
  doesn't actually exhibit.

- docs/superpowers/ (untracked): three completed-slice planning
  artifacts for the reth direct-MDBX writer. Internal planning notes,
  not user docs.
While running the spec-gallery verification I found that
internal/specbuild/build.go:143 (truncateForTargetSize) IS applied to
spec entities — when the spec's projected trie footprint exceeds
--target-size, the entity list is silently truncated to the longest
prefix that fits and a warning is emitted on stderr. My previous text
("spec entities are never capped") was wrong on three doc pages.

- AGENTS.md gotchas: rewrite the --target-size bullet.
- README.md "Cap the database size" section: rewrite.
- README.md flag table: rewrite the --target-size row.
- docs/SPEC.md "Composability" section: replace the "Config.Validate()
  fails loudly" claim (which doesn't match the code) with the actual
  truncate-and-warn behaviour.

Also swap the README spec-driven Quick-start example from
spec-erc20-mixed-sizes.yaml (16 GB total, would silently truncate
against a small --target-size) to spec-minimal.yaml, so the example
runs faithfully without --target-size and produces the expected
single-EOA database.
Follow-up to the schelk-style docs overhaul that originally landed in
this branch. Addresses 7 findings from a review focused on agent
exposure:

F1 (load-bearing): AGENTS.md is now a 26-line pointer to docs/SKILL.md,
matching schelk's actual pattern. The 125 lines of self-contained
content moved to docs/SKILL.md so the role of each file is clean:
AGENTS.md = discovery, SKILL.md = instruction.

F2: Inline YAML snippets in AGENTS.md (including one with invalid YAML —
`code: "0xef0100" + "<20-byte delegate>"` was Python string concat)
are replaced in docs/SKILL.md with a feature → fixture-entity-# map
that points agents at examples/full-matrix-spec-feature.yaml as the
canonical syntax reference. One place to maintain syntax; CI verifies
it; docs follow.

F3: docs/RUNBOOK.md's troubleshooting table gained per-row cross-links
into the relevant client section (Geth / Reth / Besu / Nethermind) and
into the new Cross-client determinism section.

F4: docs/SKILL.md surfaces the 9 package-level doc.go files via two
cross-link clusters — task recipes link to the doc.go relevant to that
task, and the "When asked to extend" section maps each extension type
to its anchor doc.go.

F5: Added CLAUDE.md (2 lines) so Claude Code's discovery convention
finds the agent-doc surface natively.

F6: examples/README.md gains a "Field-by-field cheatsheet" appendix
mapping common edit intents to YAML fields (10 rows; covers everything
an agent adapting full-matrix-spec-feature.yaml would touch).

F7: Added `## Cross-client determinism` H2 to docs/ARCHITECTURE.md to
fix the previously-dead anchor that AGENTS.md/SKILL.md/RUNBOOK.md all
link to. The section documents the three-layered mechanism (address
derivation, sizecal calibration, syscontract preamble) + the CI
keystone job that re-asserts the invariant.

Verification:
- AGENTS.md: 26 lines (≤ 50 budget)
- SKILL.md: 148 lines
- CLAUDE.md: 2 lines
- Dead-link sweep: 0 broken targets across AGENTS / CLAUDE / README /
  docs/ / examples/
- Size-discrimination sweep: 0 matches outside the intentional
  RUNBOOK.md disclaimer
- ARCHITECTURE.md#cross-client-determinism anchor: resolves
…erence

Follow-up to the prior schelk-mirror commit (870bacf). In review the
user pointed out that even though the F2 fix moved spec syntax out of
inline YAML and pointed at the fixture, the fixture was still treated
as one cross-reference among many rather than as THE canonical anchor.

The fixture has a structural property prose can't match — CI keeps it
correct via TestBuildFullMatrix + per-client TestE2ESuite +
cross-client-genesis-root invariant. Docs should lean on the
self-healing surface as the primary anchor, not bury it.

Changes:

- docs/SKILL.md: gains YAML frontmatter (name + description, schelk +
  superpowers convention). Restructured around a new "Read these first"
  section listing the fixture as item #1 with bold callout. New
  "Canonical spec reference" H2 (replaces "Write a spec" subsection +
  "Spec recipe builder" at the bottom) — names the fixture as
  load-bearing, traces the CI-invariant chain, includes a section-banner
  table (which feature each banner pins), keeps the intent → entity-#
  index, and folds the recipe-builder steps into a "How to adapt"
  paragraph.

- AGENTS.md: > [!IMPORTANT] callout immediately after the SKILL.md
  pointer naming the fixture as the canonical syntax reference. Table
  row stays for completeness but its phrasing softens to "Canonical
  syntax reference (every feature, CI-pinned — see callout above)".

- docs/SPEC.md: > [!IMPORTANT] callout at the top establishing the
  SPEC.md ↔ fixture split: SPEC.md is the schema reference (parser,
  validation, address algorithm), fixture is the syntax reference
  (every shape in practice).

- README.md: > [!TIP] agent-bootstrap callout (matching schelk's
  pattern) telling Claude/Codex/Cursor/etc. where to start.

- examples/README.md: collapse the redundant "Choose your spec" bullet
  list into a one-line pointer back to SKILL.md's canonical-spec-reference
  section.

The fixture YAML itself is unchanged — adding "see SKILL.md" comments
would create a circular reference. Docs adapt around the fixture; the
fixture stays the source of truth.

Verification:
- AGENTS.md 36 lines (≤ 50 budget)
- SKILL.md 170 lines (≤ 250 budget)
- YAML frontmatter parses (name=state-actor, description present)
- ## Read these first + ## Canonical spec reference anchors exist
- examples/README.md → docs/SKILL.md#canonical-spec-reference resolves
- examples/full-matrix-spec-feature.yaml: git diff origin/main = empty
- Dead-link sweep: 0
- TestBuildFullMatrix passes
…S legacy honesty, --target-size everywhere)

Follow-up to commit 0c22dbb addressing the 10 critical + 9 important
findings from the multi-agent PR review (code-reviewer +
comment-analyzer + code-simplifier). Each fix corresponds to a finding
number from the review summary.

## Critical fixes

C1, C2: sizecal calibration story (SPEC.md, ARCHITECTURE.md, README.md).
  The previous text claimed `internal/sizecal/factors.json` exists and
  has "per-client calibration factors". Both are wrong — calibration is
  Go constants in factors.go, and `BytesPerSlot(_ string)` /
  `BytesPerAccount(_ string)` ignore the client argument by design (per
  internal/sizecal/doc.go: "Identical across clients — required by the
  cross-client genesis-root invariance gate"). The point of the
  invariant is that the constant is GLOBAL, not that each client has a
  matched factor; the docs now reflect that.

C3: CI reproduction numbers (docs/SKILL.md). CI actually uses
  --accounts=100 --contracts=15_000 (mirrored across all four
  TestE2ESuite constants); the doc said 0/0, which would not match CI.

C4: TestBuildFullMatrix claim (AGENTS.md + SKILL.md + the test
  docstring itself). The test asserts COUNT equality across clients,
  not byte-identical PreAlloc slices. Byte-identity is enforced
  downstream by the cross-client-genesis-root aggregator. Fixed the
  three places that overstated the test's scope, including the test's
  own docstring so future readers don't get the wrong picture from
  the source.

C5: KURTOSIS Starlark legacy honesty (docs/KURTOSIS.md).
  integration/stategen_launcher.star predates the multi-client +
  --spec work. It calls a `stategen:latest` Docker image (not built
  by this repo) and passes flags that no longer exist on state-actor
  (--genesis, --batch-size). The "Method 2 — Starlark module" section
  documented an aspirational API; anyone copy-pasting would hit
  Starlark argument-name errors. Replaced with the actual signature
  + a > [!WARNING] block explaining that the module will not run as-is.

C6, C7, C8: --target-size regression text (README.md:34 + 158,
  KURTOSIS.md target_size comment). Commit 90f3f05 corrected most
  call sites; these three slipped through.

C9: image tag pinning (docs/RUNBOOK.md + KURTOSIS.md). Pinned all
  four boot recipes:
  - geth: ethereum/client-go:v1.17.2
  - reth: ghcr.io/paradigmxyz/reth:nightly@sha256:e528857e... (matches
    internal/reth/constants.go's PinnedRethRelease)
  - besu: hyperledger/besu:25.11.0 + a > [!WARNING] explaining why
    26.x breaks the recipe (--miner-enabled removed)
  - nethermind: nethermind/nethermind:1.37.0

C10: besu boot path. RUNBOOK.md said "/data/<rocksdb files>"; besu
  writes to /data/database/. Corrected.

## Important fixes

I1: reth test-file path (SKILL.md). reth's TestE2ESuite is in
  oracle_test.go, not e2e_test.go. Fixed to enumerate all four files
  individually so future readers don't assume the same name pattern
  across clients.

I2: clientpolicy validation list (ARCHITECTURE.md). FlagValues in
  internal/clientpolicy/policy.go has BinaryTrie / TargetSize / Fork
  only; --archive rejection lives in main.go. Doc tree comment fixed
  to match.

I3: docs/ tree listing (ARCHITECTURE.md). Missing SKILL.md. Added.

I4: deleted "Sizing your generation" subsection (SPEC.md). Added by
  the prior 300GB-BLOAT-PLAN salvage but immediately contradicts the
  size-agnostic principle from RUNBOOK.md:286-288 (GB-tiered byte
  numbers; "stale within months"). The numbers also conflicted with
  sizecal's actual constants (the table said ~45 B per EOA;
  bytesPerAccount = 175). Best resolved by deletion.

I5: ARCHITECTURE.md CLI-Layer diagram. Still said
  "Load genesis.json (optional)" — the --genesis flag was deleted.
  Updated to match the prose at line 99 ("Synthesize chainspec from
  --chain-id/--fork/--gas-limit/...").

I6: oracle/ description (ARCHITECTURE.md). "Reproduce-from-config RNG
  + RPC oracle utilities" — internal/oracle/ only has devkeys.go and
  reproduce.go; the RPC oracle lives in internal/e2e_testing/. Fixed.

I7: SKILL.md intent index. Added rows for entity 6 (ERC-20 with
  explicit nonce override) and entity 22 (Plain EOA with omitted-nonce
  default) — both were unreferenced before.

I8: main.go --target-size --help text. AGENTS.md declares --help
  canonical, but --help still said "Stop condition only — set
  --accounts/--contracts/--min-slots/--max-slots explicitly", which
  contradicted every doc in the repo. Rewrote to describe the actual
  truncate-and-warn behaviour from internal/specbuild/build.go:143.

I9: SKILL.md anchor link. SPEC.md § Address resolution → added the
  explicit anchor (#address-resolution-three-deterministic-modes).

## Verification

- AGENTS.md 37 lines (≤ 50 budget); SKILL.md 172 lines (≤ 250).
- `go run . --help` reconciled with the new --target-size text.
- `grep "Stop condition only|soft cap on synthetic fill"` → 0 matches.
- `grep "factors.json|per-client calibration factor"` → 0 matches.
- Only :latest mention is the intentional callout about the broken
  legacy stategen_launcher.star in KURTOSIS.md.
- TestBuildFullMatrix passes.
- examples/full-matrix-spec-feature.yaml unchanged.
- Dead-link sweep: 0.
- go build ./... clean.
@CPerezz CPerezz changed the title docs: schelk-style overhaul (AGENTS.md + RUNBOOK.md + 4-client framing) docs: Add AI docs & improve lib docs overall May 22, 2026
@CPerezz CPerezz merged commit 1de1f1b into main May 22, 2026
13 checks passed
@CPerezz CPerezz deleted the docs/schelk-style-overhaul branch May 22, 2026 21:23
CPerezz added a commit to CPerezz/state-actor that referenced this pull request May 22, 2026
Pulls in the AI docs + general doc improvements from ethereum#84
(AGENTS.md, CLAUDE.md, RUNBOOK.md, SKILL.md, examples/README.md,
examples/spec-minimal.yaml, plus rewrites of README.md /
ARCHITECTURE.md / KURTOSIS.md / SPEC.md), then edits them to match
the autofill rewrite from this branch:

- README.md: CLI flag table drops the 6 removed flags (--accounts,
  --contracts, --max-slots, --min-slots, --distribution, --code-size);
  --target-size row now documents the required-when-no-spec rule and
  the 20/10/70 split. Quick start + spec examples drop the
  --accounts=0 --contracts=0 workaround.

- docs/SPEC.md: "Composability with --target-size" section replaces
  the synthetic-fill knob list; emphasizes the spec/auto-fill
  composition.

- docs/ARCHITECTURE.md: Config example snippet swaps NumAccounts/
  NumContracts for AutoFill *autofill.Plan + TargetSize.

- docs/KURTOSIS.md: drops --accounts=0 --contracts=0 from quick-
  integration; documents the legacy Starlark module's removed flags
  explicitly so users grepping for them find the deprecation note.

- docs/RUNBOOK.md: 4 per-client boot recipes replace
  --accounts=100 --contracts=15000 --code-size=128 --min-slots=5
  --max-slots=50 with --target-size=100MB.

- docs/SKILL.md: spec-adaptation guide + common-tasks section now
  describe the auto-fill 20/10/70 default and the omit-target-size-
  to-avoid-collisions pattern; CI repro section updated.

- examples/README.md + examples/spec-minimal.yaml + spec-eoa-bloat
  + spec-erc20-mixed-sizes: drop --accounts=0 --contracts=0 from
  invocations; spec-only runs are now collision-free by default.

- CHANGELOG.md: re-added the Breaking section documenting the 6
  removed flags + --target-size requirement + golden regeneration.

- docs/300GB-BLOAT-PLAN.md: deleted (also gone on main).

AGENTS.md and CLAUDE.md were taken from main verbatim — neither
mentions the removed flags.

Refs ethereum#82, ethereum#84.
CPerezz added a commit that referenced this pull request May 24, 2026
Brings in main's docs/AI-docs overhaul (#84) and the comprehensive-spec
parity gate (#83). Conflict resolution:

- Code (client/*/e2e_test.go, internal/e2e_testing/runphases.go, main.go):
  took ours — main is still on the pre-autofill API (NumAccounts/
  NumContracts/MinSlots/MaxSlots in test fixtures); this PR's
  autofill.Plan rewrite is the substantive update.
- examples/* (spec-minimal.yaml, README.md): took ours — both files in
  main still reference the removed --accounts/--contracts flags.
- docs/{README,ARCHITECTURE,KURTOSIS,RUNBOOK,SKILL,SPEC}.md: took ours
  — same reason; main's docs describe the old synthetic-fill flag set
  that this PR retired.
- internal/specbuild/full_matrix_test.go: took theirs — main's
  docstring is more refined (clarifies that this test pins COUNT
  equality only, byte-identity is enforced downstream by the
  cross-client-genesis-root aggregator). Test body is identical.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant