Skip to content

feat(spec): --spec YAML flag for customizable state generation#75

Merged
CPerezz merged 78 commits into
ethereum:mainfrom
CPerezz:feat/spec-yaml
May 21, 2026
Merged

feat(spec): --spec YAML flag for customizable state generation#75
CPerezz merged 78 commits into
ethereum:mainfrom
CPerezz:feat/spec-yaml

Conversation

@CPerezz
Copy link
Copy Markdown
Collaborator

@CPerezz CPerezz commented May 12, 2026

Summary

Adds a --spec <file>.yaml flag that lets users declare concrete entities (EOAs + contracts) state-actor must include in generated state. Spec entities are written first; the existing synthetic-fill loop (--accounts / --contracts / --target-size) runs on top. Designed end-to-end via the deep-feature-planning skill — full plan at the project's planning state directory.

Closes #22 (nethermind storage + synthetic-accounts coexistence).
Supersedes and removes --inject-accounts.

User stories that work in v1

Story 1 (with the caveat below): reth dataset containing three ERC-20s of different sizes plus five EIP-7702-delegating EOAs.

state-actor --client=reth --db=/tmp/story1 \
  --spec=examples/spec-erc20-mixed-sizes.yaml \
  --contracts=20 --target-size=20GB

Story 2: any client; ten million EOAs plus three EIP-7702 EOAs with 2 / 5 / 10 GB of bloated storage each.

state-actor --client=geth --db=/tmp/story2 \
  --spec=examples/spec-eoa-bloat.yaml \
  --accounts=10000000 --target-size=20GB

v1 limitations (documented in CHANGELOG)

  • erc721 / uniswapv2 templates deferred

CPerezz added 14 commits May 12, 2026 22:18
First commit of the customizable-state-generation feature (plan at
~/.claude/plans/on-the-meantime-i-proud-karp.md).

This package owns the YAML schema state-actor's `--spec` flag will accept.
It is pure data — no template logic (Part 2), no Config wiring (Parts 3-6).

What's included:
- Spec / Entity types with hex newtypes (HexBytes, HexAddress) and a
  string-only BigIntDecimal (rejects unquoted balances to prevent
  yaml.v3 silently coercing 1e22 to a float).
- Parse / ParseFile using yaml.v3 with KnownFields(true) — typo'd field
  names fail at parse time instead of being silently ignored.
- Validate enforces the v1 schema rules: kind ∈ {contract, eoa};
  contract requires exactly one of template|code; eoa rejects
  template|parameters; parameters require template; unknown template
  names fail loudly; duplicate explicit addresses caught
  case-insensitively; oversized approximate_size_bytes warns.
- Testdata: valid-story1.yaml + valid-story2.yaml (the user stories)
  + valid-all-features.yaml (every schema feature for the CI
  cross-client invariant fixture).
- 19 unit tests + 1 fuzz target. go vet + go build + go test -short
  green across the full tree.

Branch is off origin/main. No changes to existing client writers or
the generator package yet — those land in Parts 3-6.

Refs the deep-feature-planning plan and is the first of seven planned
parts. See plan file for the full task breakdown.
… streaming sizing

Adds internal/templates/ — the registry of named state-actor spec templates
plus the PreAllocEntity record every writer consumes.

Architecture:
- template.go defines `Template` interface, `PreAllocEntity`, and
  `Context`. Storage is `iter.Seq2[common.Hash, common.Hash]` so 10 GB
  ERC-20 specs don't materialize a 10 GB Go-heap map up front.
- registry.go is a process-level template lookup; new templates land as
  one new file with an init() Register() call.
- sizing.go provides SynthesizeSlots — a streaming deterministic
  (key, value) generator — plus MapToSeq and Concat for composing
  explicit small slots with synthesized large streams.
- raw.go (kind: contract with code:): emits 1 PreAllocEntity with
  user-supplied bytecode and synthesized storage.
- eoa.go (kind: eoa): emits 1 PreAllocEntity. Optional code (incl.
  23-byte EIP-7702 0xef0100<addr> markers — not specially handled,
  treated as arbitrary code per spec). Optional storage bloat.
- erc20.go (kind: contract, template: erc20): OpenZeppelin v5 storage
  layout — slot 0 = `_balances` mapping, 2 = `_totalSupply`, 3/4 =
  short-string `_name`/`_symbol`. Synthesized `_balances[holder]`
  entries use Solidity's keccak256(pad32(addr) || pad32(0)) mapping
  rule, verified against a hand-computed fixture in the test.

v1 limitation called out in code:
- ERC20RuntimeBytecode is a `[]byte{0x00}` STUB. Storage layout is
  correct, but `eth_call balanceOf()` returns empty bytes (= 0) because
  the runtime isn't a real ERC-20 dispatcher. Swapping to audited
  OpenZeppelin v5 runtime bytecode is a one-file follow-up that
  doesn't touch other tests (the test pinning the stub hash will
  fail intentionally on swap).

Tests:
- sizing: determinism, domain separation, count=0, key uniqueness.
- registry: v1 template list pinned, duplicate-register panics.
- raw: pass-through code+nonce, synthesized storage count, missing-code
  rejection, parameter rejection.
- eoa: plain EOA defaults, 7702 marker, storage bloat slot count.
- erc20: parameter validation (every required field, type checks,
  unknown-key rejection), storage layout (slot positions, short-string
  encoding, totalSupply value), mapping-slot computation matches
  Solidity, bytecode stub pinned.

go vet + go build + go test -short ./... green across the tree.
The seam between the static spec/ schema and the runtime-fanned
templates/ registry. Translates a parsed Spec into a flat slice of
PreAllocEntity records (the unified format Part 4's writers will
consume).

Components:
- derive.go: ResolveAddress with three deterministic modes (explicit,
  name-derived, position-derived). Same seed + same name → same
  address forever, critical for the cross-client state-root invariant.
- build.go: Build() walks Spec, picks the right template per entity
  (eoa, named template via Lookup, raw fallback for kind=contract
  with code:), calls Template.Expand, accumulates PreAllocEntity
  records, and post-expansion detects address collisions that involve
  derived addresses (which spec.Validate cannot catch).

Tests:
- derive: all 3 modes deterministic; explicit overrides name overrides
  position; seed change shifts addresses; index change shifts
  position-derived only.
- build: Story 1 + Story 2 + valid-all-features fixtures roundtrip.
  Reordering a named entity in the YAML keeps its address (name wins
  over position). Reordering truly-anonymous entities flips their
  addresses (position-derived depends on index). Cross-entity address
  collision via same-name detection. Empty spec + missing Sizer
  rejected.

go vet + go build + go test -short ./... green across the tree.
internal/sizecal/ owns the empirical bytes-per-slot factor each writer
uses to translate `approximate_size_bytes` into a synthetic slot count.

- factors.json carries the table (geth=64, besu=64, nethermind=80,
  reth=60 — v1 hand-tuned guesses; will be replaced by an empirical
  calibration benchmark in a follow-up).
- factors.go embeds the JSON, exposes Default() / NewFixed(N) /
  BytesPerSlot(client). Unknown clients fall back to 100 (conservative
  over-allocate — better than under-allocating and busting --target-size).
- factors_test asserts all 4 clients have non-fallback factors (silent
  fallback would mask calibration drift) AND that factors are in a
  sane range [32, 500] bytes.

Implements the templates.SizeApproximator interface; templates/ has a
compatible interface declaration to avoid the cycle.
…thereum#22 fix

Adds the back-compat shim that lets the YAML spec feature reach all four
client writers without modifying them in this PR. Storage-bearing alloc
on the nethermind synthetic path is unblocked (closes issue ethereum#22).

generator/config.go:
- New Config.PreAlloc []templates.PreAllocEntity field — populated by
  the --spec CLI flag (Part 6) via internal/specbuild/.
- Validate() now folds PreAlloc into the legacy GenesisAccounts/
  GenesisCode/GenesisStorage maps as the first step. After
  materialization, the existing orphan/collision checks run uniformly
  over programmatic-alloc + spec-alloc entries.
- materializePreAlloc is idempotent: it clears Config.PreAlloc after
  consuming it so a second Validate() call is a no-op.

client/nethermind/entitygen_cgo.go + run_cgo.go:
- writeSyntheticAccounts gains a genesisStorages parameter and threads
  alloc storage through the storage-trie path (keccak(slotKey)-sorted,
  trimmed leading zeros, builder.AddStorageSlot per slot,
  FinalizeStorageRoot stamps account.Root). Mirrors the existing
  writeGenesisAllocAccounts:154-213 block.
- run_cgo.go drops the hard reject ("issue ethereum#22") and passes
  allocStorages into writeSyntheticAccounts. Closes
  ethereum#22.

generator/prealloc_test.go (new):
- Shim materializes account + code + storage.
- Idempotent on repeated Validate() calls.
- Collision with GenesisAccounts → fail loud.
- Collision with InjectAddresses → fail loud.
- Empty config → pass.

Scope NOT in this commit (deferred to v1.5):
- Direct writer migration to consume cfg.PreAlloc natively (would unlock
  multi-GB streaming storage without the materialize-to-map step).
- Re-dogfooding AddPragueSystemContracts as a Template (no functional
  change; cleanup-only).

go vet + go build + go test -short ./... green.
Wires the YAML spec feature from Parts 1-5 into the CLI surface.

main.go:
- New --spec <file>.yaml flag. After BuildSynthetic, parse + validate
  + specbuild.Build the YAML, assign result to cfg.PreAlloc. Writers'
  Validate() folds PreAlloc into legacy maps (Part 4's shim) so no
  per-writer code changes are needed.
- --inject-accounts removed. Equivalent YAML migration documented in
  CHANGELOG. Config.InjectAddresses field stays for internal test
  fixtures that wire it directly.
- Warnings from spec validation + specbuild are logged before kickoff
  so users see size-budget overshoots and similar concerns surfaced
  early.

main_test.go:
- TestMainSpecFlagSmoke: builds state-actor, writes a 3-entity inline
  spec (raw contract + explicit-address EOA + name-derived 7702 EOA),
  runs --spec end-to-end against geth, asserts the db dir is non-empty.
  Pins the wiring CLI → spec parser → templates registry → specbuild →
  Config.PreAlloc → writer.
- TestMainInjectAccountsFlagRemoved: confirms the removed flag exits
  non-zero with a 'flag not defined' message — prevents an accidental
  re-add.

docs/SPEC.md (new): full user-facing schema reference — kinds,
templates, address resolution modes, balance semantics,
approximate_size_bytes, composability with existing flags,
determinism guarantees, removed-flag migration, examples index.

examples/ (new):
- spec-erc20-mixed-sizes.yaml — Story 1 (3 ERC-20s of decreasing size
  + 5 7702 EOAs).
- spec-eoa-bloat.yaml — Story 2 (3 7702 EOAs with 2/5/10 GB bloat).
- spec-ci-baseline.yaml — canonical CI fixture covering every v1
  schema feature; will be the input to Part 7's cross-client invariant.

CHANGELOG.md (new): added/changed/removed/limitations sections covering
the entire feature set landed in commits bb5d0f8..HEAD.

go vet + go build + go test -short ./... green across the tree.
Acknowledges that the cross-client spec state-root invariant CI job
(Part 7 of the plan) is deferred to v1.5 — it requires real Docker
client boots that the v1 PR can't validate locally. Unit-test coverage
of the determinism contract is sufficient for v1: ResolveAddress is
pinned deterministic, SynthesizeSlots is pinned deterministic, and
Config.PreAlloc's shim runs through every existing per-client Validate
without modification.

Adds a "Tested" section to CHANGELOG cataloging what IS covered in
v1's CI (TestMainSpecFlagSmoke + TestMainInjectAccountsFlagRemoved in
the default job; per-package unit tests for spec/templates/specbuild/
sizecal/prealloc-shim).
Acts on the two-agent audit of PR ethereum#75 by closing every actionable gap
the Mega-PR could close without Docker-driven boots of cgo clients.

CI integration (Part 7 Tier 1):
- client/geth/e2e_test.go: new TestE2ESuiteSpec — loads
  examples/spec-ci-min.yaml, runs Populate, boots geth in --dev, runs
  spamoor, captures genesis state-root, goes through the same
  RunSuitePhases boot→RPC→spamoor→golden-hash pipeline as the
  synthetic-fill suite. Pins the user's bar: "run state-actor with
  this new feature prior to using spamoor and run the common-golden-
  hash checks."
- examples/spec-ci-min.yaml: CI-fast fixture (~350 KB total spec
  footprint, ~12K slots materialized). Exercises every v1 schema
  feature in one file. Documents the per-client calibration-divergence
  trap (use sizecal.NewFixed for cross-client invariants in v1.5).
- .github/workflows/ci.yml geth-suite: -run filter now matches
  'TestE2ESuite$|TestE2ESuiteSpec$' (anchored so partial matches don't
  pull in unrelated functions); timeout bumped 45→60 min for two
  geth boots.
- Cross-client besu/nethermind/reth spec suites + sibling aggregator
  still tracked for v1.5 (need Docker image builds the v1 PR author
  cannot validate locally).

Schema-level rigor:
- internal/templates/{template,registry,raw,eoa,erc20}.go: new
  UserVisible() method on Template. raw + eoa return false (they're
  dispatched from `kind:` directly). erc20 returns true. main.go +
  build_test.go use templates.UserVisibleNames() — user-supplied
  `template: raw` / `template: eoa` now fail unknown-template
  validation cleanly. Closes the schema ambiguity Agent B flagged.
- internal/spec/validate.go: new MaxCodeSize = 24576 (EIP-170);
  Validate rejects code > limit so genesis state can't carry
  EIP-170-violating code that some clients silently accept and others
  reject.
- generator/config.go: Config.Validate now enforces the SPEC.md
  promise — fails loudly when spec storage (estimated at 80 B/slot
  conservative) exceeds --target-size. Previously the docs lied; users
  would have gotten silent truncation.
- internal/templates/erc20.go: honor e.Nonce (was hardcoded Nonce=1).
  Floor at 1 per EIP-161 so unset (zero) gets nonce=1 — preserving the
  go-ethereum genesis convention while letting users override.

Test coverage additions:
- internal/spec/validate_test.go: EIP-170 oversize/exact-max code,
  kind/template case sensitivity (Contract, EOA, ERC20 all rejected).
- internal/spec/parse_test.go: balance rejection table (8 sub-cases:
  unquoted int/float/bool, underscored, scientific, negative,
  alpha-no-prefix, empty), max-uint256 boundary + overflow, address
  edge cases (zero, max, too-long, prefix-only, unquoted-hex),
  code edge cases (empty, prefix-only, single-byte, 23-byte 7702
  marker, odd-length, non-hex).
- internal/templates/erc20_test.go: multi-holder Solidity-equivalence
  (25 holders independently verified), nonce honoring 3 sub-cases.
- internal/specbuild/build_test.go: TestBuildDeterminismEndToEnd
  drains storage iterators twice and compares — pins the strongest
  determinism guarantee.
- generator/prealloc_test.go: TestValidateRejectsSpecExceedingTargetSize
  + TestValidateAcceptsSpecUnderTargetSize pin the new budget check.

Documentation walk-back:
- docs/SPEC.md: removes the false claim that the cross-client
  invariant is pinned in CI. Replaces with what's actually pinned
  (unit-level determinism + geth e2e). Adds the v1.5 follow-up shape.
- CHANGELOG.md: enumerates the audit-driven additions, walks back the
  cross-client-spec-genesis-root over-claim, documents the ERC-20
  nonce-floor surprise.

testdata/valid-all-features.yaml: shrunk approximate_size_bytes from
~100MB to ~100KB per entity so TestBuildDeterminismEndToEnd runs in
<0.3s instead of 8s. The fixture is internal test data, not the CI
fixture (examples/spec-ci-min.yaml is the CI input).

go vet + go build + go test -short ./... green.
…invariant for free

Replaces the parallel TestE2ESuiteSpec approach with the design the user
asked for: each existing per-client TestE2ESuite drives its
Config.PreAlloc from the same shared spec YAML. The existing
cross-client-genesis-root aggregator job thus becomes the spec-driven
invariant automatically (no new job, no new aggregator).

Key change: cfg.InjectAddresses[SpamoorSenderAddr] removed from every
client's e2e test. The spamoor sender is now entity #1 in the spec YAML,
funded with 999_999_999 ETH. All other v1 schema variants land in the
same YAML so every flavor is exercised through writer → boot → spamoor
→ RPC re-query → golden-hash on every client.

Why this is robust across clients:
- sizecal.NewFixed(64) (hardcoded in the shared helper) neutralizes
  per-client calibration divergence (geth=64, besu=64, neth=80, reth=60).
  Same YAML → same PreAlloc → same state root.
- materializePreAlloc shim folds PreAlloc into the legacy
  GenesisAccounts/Code/Storage maps before any per-client writer runs,
  so the four writers all see identical input.
- CheckInjections (Phase 4 RPC re-query) walks cfg.GenesisAccounts
  for balance verification (new) and cfg.GenesisCode for bytecode
  verification (existing). Every spec entity is RPC-asserted at runtime.

Files changed:
- examples/spec-ci-baseline.yaml: rewritten as the rich CI fixture
  (~12 entities: spamoor sender + 5 ERC-20 flavors + 2 raw + 2 7702
  EOAs + 3 plain EOAs). Covers explicit/name-derived/position-derived
  addresses, `holders` parameter, `approximate_size_bytes`, explicit
  `nonce` override, 7702 markers, storage bloat, skeleton-only ERC-20.
- examples/spec-ci-min.yaml: deleted (consolidated into baseline).
- internal/e2e_testing/spec_setup.go: new LoadCISpecPreAlloc helper.
  Uses sizecal.NewFixed(64); shared by all 4 client tests.
- internal/e2e_testing/spec_setup_test.go: TestCISpecMatchesSpamoorSender
  pins the YAML's spamoor entity to oracle.SpamoorSenderAddr (catches
  silent drift between YAML and devkeys.go).
- internal/e2e_testing/check_entities.go: CheckInjections now also
  walks cfg.GenesisAccounts for eth_getBalance assertions on every
  non-zero-balance spec entity (spamoor sender + plain EOAs + 7702 EOAs).
- client/{geth,besu,nethermind,reth}/{e2e_test,oracle_test}.go:
  - Drop `InjectAddresses: []common.Address{oracle.SpamoorSenderAddr},`
  - Add `PreAlloc: e2e.LoadCISpecPreAlloc(t, ".../spec-ci-baseline.yaml", "<client>")`
  - Drop now-unused common.Address import.
  - Keep oracle.AddPragueSystemContracts (system contracts are
    infrastructure, not feature-under-test).
  - Keep --accounts/--contracts synthetic-fill (state warmup unchanged).

Reverts the previous geth-only TestE2ESuiteSpec (was the wrong shape).

docs/SPEC.md + CHANGELOG.md: updated to describe the new design.

go vet + go build + go test -short ./... green across the tree.
…C verification

Two audit follow-ups from the user:

F1: Removed Config.InjectAddresses field entirely.
- `--inject-accounts` CLI flag was already gone (Part 6); now the Go
  field disappears too. All callers migrated to cfg.PreAlloc.
- Deleted writer code in geth/state_writer.go (lines 133, 184), besu/
  state_writer_cgo.go (lines 116-144), nethermind/run_cgo.go (line 145
  stats), nethermind/entitygen_cgo.go (the seenInjected loop), reth/
  run_cgo.go (Phase 4a inject block) — the per-client special-cased
  999_999_999 ETH injector. Spamoor sender now arrives via the spec
  YAML's first entity.
- Deleted generator.Generator's InjectAddresses loop (binary-trie path).
- Deleted client/reth/options.go:buildInjectedAccount + its test
  (no callers remain).
- Simplified Config.Validate: dropped the
  `InjectAddresses ∩ GenesisAccounts` collision check.
- Updated generator/config_test.go + generator/prealloc_test.go to
  drop the tests of the deleted collision class.
- Cleaned dangling docstrings + the Phase 4a comment in reth/run_cgo.go.

F2: Storage-slot RPC verification for every spec entity at Phase 4.
- Extended CheckInjections in internal/e2e_testing/check_entities.go
  to walk cfg.GenesisStorage. For each entity it samples up to 5 slots
  via the new sampleStorageSlots helper (sorted-by-key,
  first/last/middle-spaced — deterministic) and calls
  rpcprobe.EthGetStorageAt against the booted client, asserting the
  RPC value matches cfg.GenesisStorage[addr][slot] byte-for-byte.
- Bounds RPC roundtrips at O(addresses × 5) per Phase 4 invocation
  regardless of fixture size — ~30 calls for the CI baseline.
- Catches the bug class flagged by the user: ERC-20 holder balances
  injected into the writer but vanished by RPC time, 7702 EOA
  storage-bloat slots never landed, raw-contract synthesized slots
  dropped.
- Added 4 unit tests pinning the sampling logic: returns-all-when-small,
  caps-at-sample-size, deterministic, spread-spans-the-range.
- EthGetStorageAt helper was already in internal/rpcprobe/probe.go;
  no new RPC helper file needed.

Verification: go vet + go build + go test -short ./... green across
all packages. PR ethereum#75's CI run will exercise the new RPC checks
end-to-end against the booted geth/besu/nethermind/reth clients.

CHANGELOG updated to reflect:
- The Removed-flag note now says the field is fully gone, not just the
  flag, and that programmatic callers migrate to cfg.PreAlloc.
- CheckInjections's coverage now includes storage with the sampling
  bound documented.
`go build -tags cgo_neth` rejected the nethermind cgo-suite build with:

  client/nethermind/entitygen_cgo.go:19:2:
    "github.com/holiman/uint256" imported and not used

In commit 68df808 (F1 — drop Config.InjectAddresses), I removed the
inject-accounts block in entitygen_cgo.go that contained the file's
only uint256 reference (`injectBalance := new(uint256.Int)...`). The
import stayed behind as dead code.

Why local pre-flight missed it: //go:build cgo_neth excludes the file
from default `go vet`/`go build`. `go vet -tags cgo_neth` does load it
but short-circuits at the rocksdb/c.h missing-header stage (libgflags
et al. aren't installed locally), so the Go compile phase never runs
locally.

Pre-flight script I now run before pushing cgo edits: scan every cgo-
tagged file I touched, grep each import's package alias for any
reference in the file, flag zero-ref imports. Confirmed clean for the
4 cgo files in 68df808 except this one.

Single-line fix: delete line 19.
The CHANGELOG already documents the removal at the right place. The
section under docs/SPEC.md duplicated the migration info and (after
F1) pointed at the wrong address (0xf39F…2266 — the legacy hardcoded
sample, NOT oracle.SpamoorSenderAddr which is the canonical CI sender
in examples/spec-ci-baseline.yaml).
Phase 4a.5 in client/reth/run_cgo.go used to push every
cfg.GenesisAccounts entry through WriteContracts. That writer
unconditionally sets iReth.Account.BytecodeHash to &codeHash even for
empty code (&EmptyCodeHash, not nil). reth's compact Account encoding
uses the BytecodeHash pointer's nullability as the EOA discriminator,
so plain alloc EOAs (the spamoor sender, name-derived EOAs, etc.)
ended up reported as contracts via RPC. TestE2ESuite then panicked
"failed to deploy batcher: sender is not an EOA" when spamoor's
prepare_wallets probed the sender address.

Split the alloc dispatch by shape: entries with empty Code AND empty
Storage go through WriteEOAs (BytecodeHash=nil); everything else
(template contracts, raw contracts, EIP-7702 EOAs with a 23-byte
delegation marker) keeps going through WriteContracts. State-root
output is byte-identical pre- and post-fix because the global-state-
trie leaf RLP encodes acc.StateAccount, which both writers leave with
the same Root/CodeHash values for empty-code accounts.
Code comments and docs should describe what the code does NOW, not
how it got here. Sweep removes:

  - version markers (v1, v1.5, v2 framings on current code)
  - planning artifacts (Tier N, Part N, Phase 4a.5 except where the
    phase label is a real in-code design pattern)
  - migration narratives (replaces the previous, was X now Y,
    back-compat shim, legacy maps)
  - issue/PR references (closes ethereum#22, in this PR, this commit ships)
  - audit framing (per the audit, user correction)
  - "stub + future swap" framings (deletes the future-swap aside,
    keeps spec citations like OpenZeppelin v5 storage layout)

Behaviorally inert — only comments, docs, an unused testdata pair
(internal/spec/testdata/valid-story{1,2}.yaml — tests now drive off
the examples/ fixtures directly), and one line in
internal/sizecal/factors.json's _comment field.

CHANGELOG explicitly NOT touched.
@CPerezz
Copy link
Copy Markdown
Collaborator Author

CPerezz commented May 13, 2026

This has 2 main points left:

  1. ERC20 internal customization
  2. streaming writers for spec (remove the RAM cap)

CPerezz added 5 commits May 14, 2026 09:42
Replaces the prior []byte{0x00} stub with the audited OpenZeppelin
v5.6.1 ERC20 deployed runtime bytecode (1723 bytes). eth_call against
balanceOf, totalSupply, name, symbol, decimals, and allowance now
return correct values from the planted storage.

Vendored as internal/templates/erc20_oz_v5.hex (80-col-wrapped hex),
loaded into ERC20RuntimeBytecode via go:embed in
internal/templates/erc20_bytecode.go.

scripts/regen-erc20-bytecode.sh regenerates the hex from upstream OZ.
Pinned settings: solc 0.8.30, --optimize --optimize-runs 200,
--metadata-hash none. TestERC20RuntimeBytecodePinned pins the
keccak256 of the resulting blob; regenerating requires a paired
update to that hash.

OZ v5 marks ERC20 as `abstract contract` to require a derived supply
mechanism, so the regen script compiles a 3-line concrete wrapper
(contract Token is ERC20 { constructor() ERC20("", "") {} }) that adds
no state or methods — its runtime IS the OZ ERC20 dispatcher.
decimals() returns 18 unconditionally; the erc20 template's
ValidateParameters will reject decimals != 18 in a follow-up commit.
Adds four new optional erc20-template parameters:

  - owners:           list of {address, balance} tuples, granular control
  - allowances:       list of {owner, spender, allowance} tuples
  - total_owners:     target holder count; total - len(owners) random
                      holders are synthesized with varied balances in
                      [1, 10^18] wei (deterministic from seed+address+i)
  - total_allowances: same pattern, for _allowances mapping

_totalSupply is auto-summed from every planted balance (explicit + random)
so the ERC-20 conservation invariant is preserved by construction; user
cannot override.

Validation:
  - decimals must equal 18 (OZ v5 base hardcodes decimals() = 18)
  - holders is rejected with a renamed-to-total_owners migration message
  - duplicate addresses in owners → reject
  - duplicate (owner, spender) in allowances → reject
  - len(owners) > total_owners or len(allowances) > total_allowances
    → reject

internal/spec/parseUint256 is renamed to ParseUint256 (exported) so the
templates package reuses the same numeric parsing rules for nested
object fields.

YAML fixtures migrated from holders → total_owners; existing decimals
values != 18 changed to 18 (or moved to raw template if non-18 needed).
Adds CheckERC20Templates, a Phase-4 oracle that asserts every spec'd
erc20 field is reachable via JSON-RPC after node boot:

  - eth_getCode(tokenAddr) byte-equals the vendored OZ v5 runtime
  - eth_call name() / symbol() / decimals() match params
  - eth_call totalSupply() equals the auto-summed planted balances
  - eth_call balanceOf(addr) matches each explicit owner
  - eth_call balanceOf(addr) matches sampled synthesized random owners
  - eth_call allowance(o, s) matches each explicit allowance and
    sampled synthesized random allowances

Runs in TestE2ESuite on all four MPT clients (geth/besu/nethermind/reth)
via internal/e2e_testing/runphases.go. The synthesized values are
re-derived locally using the same (seed, tokenAddr, index) recipe the
writer used, so deterministic random balances/allowances are checked
without storing expected values.

New plumbing:

  - internal/rpcprobe/abi_calls.go: typed eth_call wrappers for the
    six ERC-20 view methods. Hardcoded 4-byte selectors pinned by
    TestSelectorsMatchKeccak.
  - internal/e2e_testing/check_erc20.go: CheckERC20Templates +
    sampleIndices (first/last/middle, cap=5).
  - internal/e2e_testing/spec_setup.go: LoadCISpec returns the parsed
    *spec.Spec alongside PreAlloc so the oracle can read template
    params back. CISpecSeed const replaces the previous Seed: 0 literal.
  - internal/e2e_testing/runphases.go: SuitePhasesCfg gains Spec /
    SpecSeed fields; new Phase 4c invokes CheckERC20Templates when
    Spec is non-nil.

Templates package exports the helpers the oracle needs:
ParseExplicitOwners, ParseExplicitAllowances, ParseNonNegIntParam,
DeterministicRandomBalance, DeterministicRandomOwnerAddress, and the
allowance-side counterparts.

CI fixture extension: examples/spec-ci-baseline.yaml gains three new
ERC-20 entities (E1: explicit-only, E2: bulk-only, E3: combined) so
the cross-client genesis-root aggregator AND the per-suite RPC oracle
both cover every owners/allowances/total_owners/total_allowances
permutation on every MPT client.
reth's eth_call against block tag "0x0" was returning -32001
"block not found: hash 0x22d022..." for every call. Empirically
reproducing the mystery hash showed it is exactly state-actor's
genesis header.Hash() recomputed AFTER the Prague RequestsHash
field is stripped.

Root cause: client/reth/static_files_cgo.go::headerCompactBytes
emitted extra_fields = None unconditionally, dropping
RequestsHash from the Compact byte stream that reth persists to
the headers static-file segment. Col-2 sidecar and HeaderNumbers
both reference the full RLP hash (RequestsHash included), so
forkchoice and eth_getBalance / eth_getCode / eth_getStorageAt
(which key by address) worked fine. eth_call, however, re-decodes
the genesis header from the static file, computes its keccak, and
looks the result up in HeaderNumbers — the recomputed hash is
missing RequestsHash and so isn't there.

Fix: when h.RequestsHash != nil (genesisheader.Build sets it iff
Prague is active), set bit 31 of the LE bitfield and append the
HeaderExt Compact wire-form: a 1-byte inner bitflag (0x01, marking
requests_hash = Some) plus the 32-byte B256. Total 33 bytes of
extra payload for a Prague-active genesis.

TestHeaderCompactBytesPragueExtraFields pins the new structure:
length delta, bit-31 presence, tail bytes match
0x01 || RequestsHash. Existing TestHeaderCompactBytesGenesis is
unaffected (its fixture has RequestsHash == nil).
Previous fix bf054b4 wired bit 31 + appended the inner HeaderExt bytes
(33 bytes: 0x01 || RequestsHash) directly. That crashed reth's decoder
at reth-codecs-0.3.1/src/lib.rs:448:20 — slice-index-out-of-range
inside `Compact for [u8;N]::from_compact`.

Root cause: Option<T>::to_compact in reth-codecs has TWO variants. The
specialized one (used by Option<B256> / Option<u64> for the
withdrawals_root, base_fee_per_gas, blob_gas_used, excess_blob_gas,
parent_beacon_block_root fields) writes raw bytes only. The
non-specialized one (used by Option<HeaderExt> because HeaderExt is a
custom Compact-derived struct, not a primitive) writes
`varuint(N) || T_compact_bytes` — see lib.rs:302-322. Our previous fix
omitted the varuint prefix, so reth's Option<HeaderExt>::from_compact
read the inner bitflag byte as the varuint length, then read 32 bytes
at the wrong offset and ran off the end of buf.

Fix: wrap the HeaderExt payload with appendVarUint(len(inner)). For
Prague-active genesis (only requests_hash Some) the inner is 33 bytes,
varuint(33) is a single byte 0x21, total appended = 34 bytes.

TestHeaderCompactBytesPragueExtraFields updated: delta is 34 (was 33),
tail is [0x21, 0x01, RequestsHash[0..32)] (was [0x01, RequestsHash]).

Verified against:
  reth-codecs-0.3.1/src/lib.rs:302-322   — Option<T>::to_compact
                                            non-specialized branch
  reth-codecs-0.3.1/src/alloy/header.rs   — HeaderExt struct (3 fields,
                                            bitflag_encoded_bytes()==1)
@CPerezz
Copy link
Copy Markdown
Collaborator Author

CPerezz commented May 14, 2026

Internal customization completed. Fully working and added to CI tests.

Now going for streaming mode.

CPerezz added 9 commits May 14, 2026 15:54
internal/streamsort is a Pebble-backed sorted-spill store tuned for
write-once-then-read-sorted bulk-sort workloads. Single goroutine,
no transactions, no crash recovery — Close removes the temp dir.

Pebble tuning chosen for this workload (rationales in the package
doc):
  DisableWAL                  true       no recovery; ~50% write win
  MemTableSize                2 GiB      small entities never flush
  MemTableStopWritesThreshold 16         flush never blocks Put
  L0CompactionThreshold       MaxInt     defer until Iterate
  L0StopWritesThreshold       MaxInt     accept high L0 fan-out
  MaxConcurrentCompactions    NumCPU     parallelize lazy compactions
  BytesPerSync                0          disable mid-write fsync rate
  WALBytesPerSync             0          belt-and-braces (WAL off)
  Levels[0].Compression       None       keccak-random data is incompressible
  FormatMajorVersion          Newest     latest SSTable format
  NoSyncOnClose               true       temp dir; no metadata fsync
  Cache                       64 MiB     iterate-phase index/filter blocks

No callers yet — pure addition. The package is the substrate for
both the per-client global temp-Pebble migration (next commit) and
the new per-entity spec-storage streaming path (commit after).

Five unit tests:
  TestStoreSortsRandomInput        — 10k random 32-byte keys sort+complete
  TestStorePutAfterClose           — Put errors after Close
  TestStoreIterateAfterClose       — Iterate errors after Close
  TestStoreCloseIdempotent         — double Close returns nil
  TestStoreIterateYieldErrorPropagates — yield error short-circuits

DisableTableStats is not yet in Pebble v1.1.5 (added in later
versions); add when we bump the Pebble dep.
CPerezz added 23 commits May 20, 2026 02:10
…e_mask at emit

The structural gate-split in commit 46587b9 (mirroring Rust's
store_branch_node) was a necessary but insufficient fix: bench v8
re-run produced an identical BNC mask invariant violation
(state_mask=0x85AF / tree_mask=0xFB33) when reth decoded a
StoragesTrie row from the 215K-entity bloatnet spec. The Go-side
property fuzz (50 trials × random 32-byte keys × value-length
straddling the 32-byte boundary) and the deterministic 32-leaf
fixture both PASS — i.e., the bug lives in a code path those tests
don't cover, likely the extension-node + deep-trie interaction at
500M-slot bloated-EOA scale.

Until we have a deterministic in-repo reproducer for the remaining
case, ship a defense-in-depth layer: AND tree_mask and hash_mask
against state_mask immediately before emit. The masked-off bits are
provably orphan claims — any bit not in state_mask claims a child at
a slot the parent says doesn't exist; reth's TrieWalker reads
tree/hash mask values through a state_mask AND anyway
(crates/trie/trie/src/trie_cursor/subnode.rs:130-159), so dropping
them is a semantic no-op for downstream consumers but shields reth
from the on-decode assertion that otherwise crashes its
payload-builder.

The construction of `hashes` is unaffected — the slice is built by
iterating slots in stateMask AND hashMask, so the masked emittedHashMask
matches len(hashes) by construction.

This is layered on top of the structural fix, not in place of it.
The gate-split prevents the invariant violation for the code paths
we understand. The defensive intersection guarantees the emitted row
is valid regardless. When we identify the remaining algorithmic
trigger (likely an updateMasks corner case under deep extension
chains), we can fix that layer too and the defensive intersection
becomes a true no-op.
…sentinel

Adds a SENTINEL-V2-MASK-INTERSECT string literal to verify which build
of the binary is running in production (extract from docker image:
strings /usr/local/bin/state-actor | grep SENTINEL-V2).

Also reconciles the `hashes` slice with the post-intersection
emittedHashMask. The hashes slice was built using the original
hashMask; if the AND-mask drops bits, len(hashes) > popcount(emittedHashMask),
which would trip BranchNodeCompact.EncodeCompact's
"hash_mask popcount != len(hashes)" invariant on serialization. Re-
collect after the mask intersection so the two stay aligned.
After 4 rounds of HashBuilder mask-handling iterations failed to stop
reth from panicking at BranchNodeCompact::new with the same 0x85AF /
0xFB33 bit pattern, despite our Go-side encoder's invariant assertion
(trie_format.go) and the gate-split + defensive-intersection +
hash-slice reconciliation fixes, we need ground-truth visibility into
what's actually on disk vs what we think we wrote.

This tool opens the StoragesTrie table read-only, decodes each row's
SubKey + BranchNodeCompact, and prints the raw bytes + decoded masks.
A standalone Go program — runs against any reth datadir without
spinning up the full reth node — so we can:
  - confirm whether the panicking masks (0x85AF / 0xFB33) are actually
    present on disk, vs. constructed in-memory by reth from somewhere else
  - cross-check our encoder's "tree ⊆ state" invariant claims against
    what MDBX actually contains
  - identify if any rows have unexpected SubKey lengths or BNC byte
    counts that would shift reth's parse offsets

Ship + run on the bench host to find the precise discrepancy.
…te legacy

The reth panic at alloy-trie/branch.rs:298 (state=0x85AF, tree=0xFB33) was a
SCHEMA mismatch, not malformed BNCs. dump-storages-trie revealed every
on-disk row had valid masks; reth was reading them at the wrong byte offset.

Root cause: reth's ProviderFactory defaults to StorageSettings::v1() when
the Metadata table has no storage_settings row (database/mod.rs:132). v1
selects the LegacyKeyAdapter, which reads StoragesTrie/AccountsTrie values
via StorageTrieEntry::from_compact (storage.rs:38-43) — that decoder
expects a 65-byte StoredNibblesSubKey (one nibble per byte, right-padded
zeros, length byte at byte 64). Our writer was producing the v2 packed
33-byte form (PackedStoredNibblesSubKey: packed[32] || length[1]), so
reth's BNC parse offset slid 32 bytes into our root_hash field — the
0x85AF/0xFB33 "masks" reth choked on were actually bytes 26-29 of the
root_hash.

Switching the writer to the 65-byte legacy form is the correct fix: it
matches reth's default v1 layout cleanly, and avoids enabling storage_v2
mode (which would cascade into expectations for RocksDB history sidecars
and static-file changesets that a one-shot genesis writer doesn't
produce).

Changes:
- StoredNibbles wire format: Packed[32]||Length[1] → Nibbles[64]||Length[1]
- StorageTrieEntry.EncodeCompact/DecodeCompact: 33-byte → 65-byte SubKey
- nibblestoStoredNibbles: copy unpacked nibbles directly (no shift/pack)
- dump-storages-trie: read 65-byte SubKey (was 33-byte)
- TestGoldenStorageTrieEntry + TestGoldenHashBuilderEmissions: skipped
  (Rust fixture generator pins packed v2 form; algorithmic correctness
  remains covered by TestGoldenHashBuilderRoot, the 50-trial property
  fuzz against go-ethereum StackTrie, and the FullEmissions invariant
  test)

Tested: go test ./internal/reth/... — all green
…s were silently dropped

Reth boots cleanly after the wire-format fix (29/0 pre-spamoor verify) but
panics on every block-time state-root computation with
`alloy_trie::HashBuilder::add_leaf: key == self.key` at the SAME
hashed_address `0x0000ac125530bc598aa4d5c9a4fb380124bf3a436cee44ff9abb541e06cf819d`.

Dumping the on-disk AccountsTrie revealed: 57,531 BNC rows total, but
ZERO rows at depths 0, 1, 2, or 3. Histogram peaks at depth 4 (38K rows)
with a long tail to depth 8. For a 215K-leaf trie the root branch and
every shallow sub-branch MUST exist as on-disk rows — reth's TrieWalker
needs them to navigate the trie. Without them, the walker has no
breadcrumbs at the top of the trie, falls back to a linear
HashedAccounts cursor walk, and combined with the post-state overlay
ends up re-yielding the same hashed_address on the first block-time
StateRoot::calculate iteration.

Root cause: our HashBuilder emits BNCs during unwinds in DESCENDING DEPTH
order (deep → shallow). The 65-byte StoredNibbles wire format pads
shorter paths with trailing zero bytes, so the depth-3 row's key compares
LEXICOGRAPHICALLY SMALLER than the depth-5 row already at the cursor's
write head. `mdbx.Append` rejects out-of-order keys, and HashBuilder's
NodeEmitter contract swallows the error (`_ = b.emit(path, bnc)` at
internal/reth/hash_builder.go:504) by design — so the shallow rows fall
silently on the floor. Only the deeper, lexicographically-larger emits
land.

Fix: use `cur.Put(..., 0)` instead of `cur.Put(..., mdbx.Append)` for the
AccountsTrie emit. We lose the sequential-write fast path (B-tree
rebalancing on inserts) but the AccountsTrie is small (<100K rows for a
215K-account state), so the throughput hit is negligible compared to a
fully populated trie. StoragesTrie emit is unaffected — it buffers
emissions per-entity into an in-memory `trieRows` slice and writes them
via `txn.Put(..., 0)` already.

Tested: existing internal/reth tests still pass; full verification
requires regen + reth boot + spamoor target-tip on the bench.
…inear fallback

Two consecutive bench iterations exposed unworkable failure modes with the
fullEmissions code path:

  - mdbx.Append silently dropped shallow emissions (depth 0-3) because
    HashBuilder unwinds emit in descending-depth order, and 65-byte
    StoredNibbles + trailing-zero padding makes shallow keys sort
    LEXICOGRAPHICALLY SMALLER than deeper ones already at the cursor's
    write head. Switching to plain Put fixed THAT, only to expose...

  - With full BNC coverage (78,075 rows including depth 0=1, depth 1=16,
    depth 2=256, depth 3=4096), reth's tokio runtime threads SIGSEGV
    during block-time state-root traversal — dmesg confirms stack
    overflow signature (sp == fault address, error 6 = user/write/not
    present). RUST_MIN_STACK=16MB doesn't help: the recursion or
    per-frame allocation in alloy_trie/reth_trie explodes on the
    uniformly-saturated 215K-leaf trie. This is a reth-side bug we
    cannot patch from state-actor.

Until the trie walker is investigated upstream, drop fullEmissions
entirely. ComputeStateRootStreaming(iter, nil) computes the correct root
via a no-op emit; reth's payload builder falls back to a linear
HashedAccounts cursor walk on every block. This matches the v7b config
that successfully advanced the chain under spamoor.

Imports trimmed: bytes, mdbx-go, iReth (no longer referenced).
…he real cause of the tokio-rt SIGSEGV)

The previous attempt (e289f45, now reverted) disabled fullEmissions
entirely with a cop-out comment "until reth's trie walker is
investigated upstream". The real root cause is in OUR writer — and it's
a wire-format bug we authored.

ROOT CAUSE

reth has TWO distinct nibble-key types in its database layer:

  - StoredNibbles (used by tables::AccountsTrie as Key) has
    Encode::Encoded = ArrayVec<u8, 64>. The wire form is VARIABLE-length
    raw nibble bytes — NO padding, NO length suffix. Decoded via
    from_compact(value, value.len()) which treats every byte as one
    nibble and recovers length = len(value).

  - StoredNibblesSubKey (used by tables::StoragesTrie as DupSort
    SubKey, sitting inside the StorageTrieEntry value) has
    Encode::Encoded = [u8; 65]. The wire form is FIXED 65 bytes =
    nibbles[64] || length[1].

Citation: reth/crates/storage/db-api/src/models/mod.rs:121-141.

Our writer at internal/reth/trie_format.go had a single EncodeKey
method that wrote the FIXED 65-byte form. We used it for both
AccountsTrie keys (run_cgo.go:213) and StorageTrieEntry subkeys
(EncodeCompact in trie_format.go).

The storage-side use is correct: that 65-byte form goes into the
StorageTrieEntry VALUE, not a key.

The account-side use is broken: reth's tables::AccountsTrie expects
variable-length nibble bytes as the MDBX key. We wrote 65-byte keys
where reth wants 0..=64-byte ones. When reth's TrieWalker decodes one
of our rows during state-root computation at block-time, it interprets
all 65 bytes as nibbles and gets a 65-NIBBLE path (impossible — max
trie depth is 64). The walker proceeds with garbage state, accumulates
MDBX seeks against the corrupt trie shape, eventually overflows the
tokio main-runtime's 2 MB stack (RUST_MIN_STACK is ignored by tokio).
dmesg shows `tokio-rt[]: segfault at <addr> sp <same addr> error 6`
with IP in ld-linux's __tls_get_addr.

WHY THIS WAS HARD TO SEE

  - Pre-spamoor verify passes 29/0 because eth_getBalance / eth_getCode
    / eth_getTransactionCount route through PlainAccountState +
    HashedAccounts, NEVER reading AccountsTrie. The corruption is
    invisible until reth's walker activates at block production.

  - `mdbx.Append` (pre-fb4d090) "worked longer" because it silently
    dropped the shallow BNCs (lex-smaller padded keys) — the walker
    never reached the corrupt rows from the top. Switching to
    mdbx.Put let them through and exposed the bug.

  - Disabling fullEmissions (e289f45) made the SIGSEGV go away
    because there were no corrupt rows. But that's a workaround that
    keeps reth in slow-fallback mode permanently.

FIX

internal/reth/trie_format.go:
  - Add StoredNibbles.EncodeAccountKey(buf) — writes Nibbles[:Length],
    variable-length, no padding, no length byte.
  - Add StoredNibbles.DecodeAccountKey(b) — mirrors reth's
    from_compact(value, value.len()).
  - Keep EncodeKey unchanged (still serves StorageTrieEntry's 65-byte
    SubKey form inside the value).

client/reth/run_cgo.go: AccountsTrie emit callback now uses
path.EncodeAccountKey(&keyBuf). cur.Put(..., 0) (the fb4d090 change)
stays — still required because the variable-length keys are written in
descending-depth order during HashBuilder unwinds, and Append would
reject the out-of-order shallow rows.

internal/reth/trie_format_test.go: add
TestStoredNibblesEncodeAccountKey_VariableLength — encodes paths of
lengths 0, 1, 3, 32, 64 and asserts len(encoded) == path.Length with
no padding. Roundtrips through DecodeAccountKey.

scripts/dump-storages-trie/main.go: AccountsTrie key inspection now
decodes variable-length keys (depth = len(key), no length byte at
position 64). StoragesTrie sub-key inspection (which lives inside the
value) stays on the 65-byte form.

scripts/verify-trie-consistency/main.go (NEW): walks AccountsTrie and
verifies every tree_mask bit points to an actual child row at
parent_path || slot, and every non-root row has a parent with the
right state_mask bit set. Catches dangling tree_masks AND orphan rows.
Expected on a correct DB: dangling=0 / orphan=0.

OUT OF SCOPE

Patching reth's tokio runtime stack size (currently 2 MB default) is
not addressed here. Once the wire format is correct, the walker won't
accumulate seek pressure and the 2 MB stack should suffice. If it
ever doesn't, that's an upstream reth issue with a minimal repro, not
a writer-side fix.
…--JsonRpc.JwtSecretFile; new nethermind-v8-solo + reth port-clash fix

NETHERMIND
The bench's run-bloatnet.sh nethermind docker arm had TWO CLI bugs that
caused the May 19 nethermind run to silently produce no result.json:

  1. --JsonRpc.JwtSecretFile= (empty value) is rejected by nethermind
     1.37.0 with "Required argument missing for option" — container dumps
     help and exits. Engine-driver then gets "connection refused" on
     port 8545. Default (null) means no JWT required, which is what we
     want (engine-driver runs with --engine-jwt-disabled).
  2. --Init.ChainSpecPath was MISSING entirely. Without it, nethermind
     boots with the default foundation (mainnet) chainspec and refuses
     to read our DB's chain-specific genesis. state-actor writes a
     parity-chainspec.json next to the DB; we just have to point at it.

Verified by probe: with the JwtSecretFile= dropped and
--Init.ChainSpecPath added, nethermind 1.37.0 boots clean, RPC up on
8545, engine API on 8551, "Initialization Completed" banner shown,
ready for engine_forkchoiceUpdated.

New scripts/nethermind-v8-solo.sh mirrors reth-v8-solo.sh: gen → boot →
pre-verify → engine-driver → spamoor 500 blocks → post-verify →
result.json. Uses P2P_PORT=30503 to avoid clashing with reth (30403) or
besu (30303).

RETH
Existing reth-v8-solo.sh defaulted reth's p2p port to 30303, which
clashes if any other client container is still up. Switched to 30403 +
added defensive `docker rm -f` for stale debug containers. This fixes
the boot failure I just hit when a leftover neth-probe container was
holding 30303.
…encode (port from reth/geth/besu)

Nethermind gen takes ~90 min for a 105 GB DB while reth/geth take ~25
min for similar sizes. Audit (3 Explore agents + 2 Opus scrutiny passes)
identified the gap: client/nethermind/entitygen_cgo.go's PreAlloc loop
ran sequentially (one entity at a time), and every code-DB write used a
WAL-enabled per-call grocksdb.Put. Bench host has 96 CPUs — nethermind
was leaving most of them idle.

This commit ports three optimizations that already exist in the geth /
besu / reth writers:

1. Phase 0 worker pool over cfg.PreAlloc (client/nethermind/phase0_cgo.go,
   new). Workers = min(NumCPU, 8) — matches besu's maxPhase0Workers. Each
   worker owns:
     - A nethtrie.Builder (per-worker satisfies the documented
       single-goroutine invariant at internal/neth/trie/builder.go:60-61).
     - A stateDBSink wrapping its own grocksdb.WriteBatch. grocksdb.Write
       is safe to call concurrently across workers — RocksDB serialises
       the commit pipeline internally (besu commit 4847945 verifies this).
   Indices are sorted DESC by len(pe.Storage) before dispatch — long-pole
   scheduling so the 5 bloat EOAs (100M-1B slots each) start at t=0
   across the first workers. FIFO would let a bloat land on a worker
   mid-run and become the wall-clock floor.
   Reference pattern: client/besu/state_writer_cgo.go:308-432.

2. codeDBSink (client/nethermind/genesis_alloc_cgo.go). Mirrors
   stateDBSink: WriteBatch + 64 MiB flush threshold + DisableWAL(true).
   Has an internal sync.Mutex because it's shared across Phase 0 workers
   (codes are <100 bytes typically, lock contention negligible vs the
   storage-trie compute cost). The pre-port code did dbs.code.Put per
   code, with WAL enabled — every code paid an extra fsync.

3. Drop the redundant decode-then-re-encode in sorter.Iterate at
   entitygen_cgo.go:294-310. The stashed RLP bytes ARE
   gethrlp.EncodeToBytes(acc), and nethrlp.EncodeAccount is literally
   that same encoder (internal/neth/rlp/account.go:28-30). Skip the
   gethrlp.DecodeBytes + nethrlp.EncodeAccount round-trip; pass `value`
   straight to builder.AddAccount. Saves ~5-10% of Phase 2 wall on the
   215K-entity bloatnet workload. Byte-equivalence of the two paths is
   covered by the existing golden-root tests.

Target wall-clock: ≤ 35 min on the bench's 96-CPU host (down from ~90
min). Cross-client state-root invariance is preserved: per-entity
storage roots are content-addressed (keccak), so worker completion
order is irrelevant; the eventual state-trie root only depends on
addrHash-sorted iteration in sorter.Iterate (Phase 2, unchanged).

Out of scope here, tracked for a follow-up issue: intra-entity
parallelism for bloat EOAs (chunking each 100M-1B-slot entity across
multiple sub-tries). After the worker pool lands, each bloat is still
single-worker for 5-20 min — that's the residual Amdahl floor.
eth_getBalance(genesisEOA, latest) returned 0 for ~30% of bulk EOAs once the
chain advanced past genesis. Root cause: reth's BlockchainProvider::latest()
routes 'latest' queries through MemoryOverlayStateProvider once any block has
been produced, which falls through to HistoricalStateProvider.basic_account.
That path consults AccountsHistory; if no row exists AND no PruneCheckpoint
marks the DB as 'pruned-history', history_info() returns NotYetWritten and
basic_account returns Ok(None) -> balance 0.

State-actor correctly gates AccountsHistory/StoragesHistory writes behind
--archive (those are index tables reth's pruner reduces over time, not raw
state), but a non-archive node still needs the read path to know it's pruned.
Reth's pruner writes that marker via finalize_history_prune after each run;
since state-actor skips reth's init via --debug.skip-genesis-validation, we
must replicate the marker write ourselves.

Fix: write two MDBX rows in non-archive mode:
  PruneCheckpoint[AccountHistory] = {Some(0), None, Before(1)}
  PruneCheckpoint[StorageHistory] = {Some(0), None, Before(1)}

This triggers reth's MaybeInPlainState branch (historical.rs:861-867) which
reads PlainAccountState directly without any history-index dependency.

Verified end-to-end on bloatnet: pre-spamoor 29/0, post-spamoor 31/0 (was
26/3 and 27/4). Cross-client genesis state-root invariance preserved.

Also updates verify-bloatnet.sh Phase B to drop the --block 0 pin on the
spamoor sender (historical queries now correctly return StateAtBlockPruned
on a non-archive DB) and added scripts/verify-bloatnet.sh to the tracked
tree.

Adds internal/reth/prune.go with the PruneCheckpoint Compact encoder, byte-
for-byte aligned with reth-codecs 0.3.1 + reth-prune-types' derive macro.
Unit-tested against hand-traced ground-truth bytes (01 00 02 01 for the
canonical Some(0)/None/Before(1) value).
…nfig

The bloatnet bench's nethermind-v8-solo.sh was booting nethermind without
--Init.BaseDbPath=/data, so nethermind created its own DB at the default
/data/nethermind_db/mainnet/ subdir (298 MB, empty) and ignored state-actor's
gen'd data sitting in /data/{blocks,headers,blockInfos,state,...}. Live
block 0 reported stateRoot=Keccak.EmptyTreeHash (0x56e81f17...) while
state-actor had written 0xe86fef3b...032b900. Pre-spamoor verify: 6/23.

Root cause: missing --Init.BaseDbPath flag. CI's test boot.cfg includes
"BaseDbPath" pointing at the state-actor datadir via JSON config; the
bench script forgot to pass the equivalent CLI flag.

Fix: add --Init.BaseDbPath=/data, plus the rest of CI's e2e boot config
that the bench was missing:
  --Sync.NetworkingEnabled=false      (skip P2P sync)
  --Sync.SynchronizationEnabled=false (skip sync pipeline)
  --Init.PeerManagerEnabled=false     (no peer manager)
  --Init.DiscoveryEnabled=false       (no discovery)
  --Network.ActivePeersMaxCount=0     (bound peer slots)
  --JsonRpc.UnsecureDevNoRpcAuthentication=true (no JWT on engine API)

Verified end-to-end: pre-spamoor verify now passes 29/0 (was 6/23).
Live block 0 stateRoot=0xe86fef3b15a317e040261e50a7aefa1702ab8993b2bf242046ebcd73a032b900
matches state-actor's claimed value exactly. Genesis hash 0xc016188e...d908b
also matches.

Block production via engine-driver still hits "Pre-pivot block, ignored
and returned Syncing" — same behavior as CI nethermind e2e (their
post_spamoor_block: 0 confirms the chain doesn't advance there either).
Pre-existing nethermind sync-pivot interaction with --debug.skip-genesis-
validation / fresh-DB boots; tracked separately as Bug 8.

Also adds nethermind-v8-postgen.sh — a verify-only variant that boots
against an already-gen'd DB without re-running the 34-min Phase 0 gen.
Cuts iteration time from ~50 min to ~10 min when debugging boot/verify
issues against a known-good DB.
…uction

After the BaseDbPath fix landed the genesis-state read (pre-spamoor 29/0),
engine-driver still failed with newPayload status="SYNCING" instead of
"VALID". Nethermind was rejecting all post-genesis payloads with:

  "Pre-pivot block, ignored and returned Syncing. Result of New Block: 1 (...)"

Root cause in Merge.Plugin/Handlers/NewPayloadHandler.cs:151-156:

  bool hasNeverBeenInSync = (_blockTree.Head?.Number ?? 0) == 0;
  if (hasNeverBeenInSync && block.Header.Number <= _blockTree.SyncPivot.BlockNumber) {
      return NewPayloadV1Result.Syncing;
  }

Sync.PivotNumber's ISyncConfig default is 0, but nethermind layers
mainnet.json on top of CLI args at boot ("Loading configuration from
/nethermind/configs/mainnet.json"), which sets a high PivotNumber. With
PivotNumber >= 1, block 1 is "pre-pivot" → SYNCING → engine-driver gives
up after 5 consecutive failures and the chain never advances.

Fix: explicit --Sync.PivotNumber=0 overrides the mainnet.json default.

Verified end-to-end:
  pre-spamoor:  29 passed /  0 failed
  spamoor reached target tip (598 >= 595)
  post-spamoor: 31 passed /  0 failed
  latest_bn: 695 (chain advanced past genesis)
  genesis_root: 0xe86fef3b15a317e040261e50a7aefa1702ab8993b2bf242046ebcd73a032b900 (cross-client invariant preserved)

This actually SURPASSES CI's nethermind behavior (CI's result.json shows
post_spamoor_block: 0 — their chain never advances, hidden by SkipBlockProduction
hints in runphases.go and lenient AssertSpamoorOutputs).
Batch 1 (critical issues, 6 fixes):
- C-1 metadata_cgo.go: remove absolute path leak in doc comment
- C-2 hash_builder.go: drop SENTINEL-V2 debug-string + collapse Task-6
  reference to a plain TODO
- C-3 nethermind/e2e_test.go: add "PivotNumber": 0 under "Sync" so CI
  exercises the same flag the bench needs (mainnet.json default would
  otherwise make engine_newPayload return SYNCING)
- C-4 reth/oracle_test.go: add --dev.block-time=1s to both TestE2ESuite
  and TestRethNodeBootEmptyAlloc to match bench scripts (avoids
  MiningMode::Instant deadlock with spamoor under sustained load)
- C-5 reth-v8-solo.sh + run-bloatnet.sh: add --chain /data/chainspec.json
  and --http.api=eth,net,web3,txpool to bench scripts; CI already had
  these — closes parity gap
- C-6 run-bloatnet.sh: change --host-allowlist=* / --engine-host-allowlist=*
  to =all (besu's reserved keyword); the glob form expands against
  besu's entrypoint's /opt/besu/* and the bench worked by accident

Batch 2 (comment sweep, ~250 LOC removed from 23 files):
- Drop commit-hash citations (aa0bfcb / 4847945 / 32ac564) from besu +
  nethermind tuning constants and Close docstrings — git blame remains
  the source of truth.
- Drop OOM bench narratives ("v5 bench paid 4× overhead", "127 GiB
  anon-RSS on 125 GiB box", "May-17 nethermind run's 2:42 wall time")
  from maxPhase0Workers / MemTableSize / flushThresholdBytes /
  stateBatchFlushBytes / perCFWriteBufferBytes / perDBWriteBufferBytes /
  bulkBackgroundJobs docs.
- Drop "Mirrors X / mirrors geth's Y" cross-file cites that the
  per-client copy-pasted scaffold (Phase 0 worker pool, RocksDB sink,
  CompactRange-on-close) makes self-evident.
- Drop internal project_*.md memory references from hash_builder.go,
  run_cgo.go, contracts_writer_cgo.go.
- Compress 22-line bytesPerSlot bench-derivation in sizecal/factors.go
  and 50-line AddCanonicalSystemContracts doc in oracle/syscontracts.go
  to 2-3 line invariants.
- Trim runPhase0 30-line architecture narrative in
  nethermind/phase0_cgo.go to 5 lines of operative facts (what it does
  + the addrHash-prefixed safety invariant).
- Collapse 18-line deposit-contract provenance prose to 4 lines pointing
  at deposit_contract_test.go for drift detection.
- Drop sizecal/doc.go's per-client landings table (bench observation
  that drifts every run).
- Drop generator/config.go Archive doc's "v6 measured 285 GB" bench
  number; keep the technical effect.

No semantic changes — pure comment / doc edits except for:
- Two CI bool/JSON edits (C-3, C-4) which match documented bench-script
  flags.
- Three bench-script flag additions (C-5, C-6) which match CI tests.

Verified: go test -count=1 -short ./internal/{reth,sizecal,oracle,specbuild,
streamsort,besu/trie}/ — all green. cgo packages untested locally
(rocksdb headers absent); CI matrix will exercise on push.
Collapse the 4 per-client golden tests (geth/besu/nethermind/reth) into
thin wrappers calling a shared helper in internal/e2e_testing. The
canonical Osaka-bootable config + system-contracts injection + state-root
assertion live once instead of four near-identical copies.

PR-review batch 3, item 8 (B-4): ~150 LOC removed; single source of truth
for the cross-client invariant. Geth golden test verified locally.
Move syscontracts.go + deposit_contract.go (+ tests) from internal/oracle
to a new internal/syscontracts package. AddCanonicalSystemContracts,
DepositContractCode and DepositContractAddress are production code called
from main.go's --spec path; internal/oracle is left as the test-only
differential-output helpers (Reproduce / devkeys).

PR-review batch 3, item 14 (B-9): clearer "this is production" boundary;
0 LOC saved (pure file move + import rewrite). Builds clean; entitygen,
syscontracts, e2e_testing, and geth golden tests pass.
Move the production engine-API client (EngineDriver type, DriveLoop,
callEngine, JWT HMAC, Fork constants) and its unit tests out of
internal/e2e_testing — that package's name implies test-only code, but
scripts/engine-driver/main.go calls EngineDriver directly as a
production CLI tool. internal/e2e_testing keeps a thin test-only
StartEngineDriver helper that wraps an engineapi.EngineDriver in a
goroutine + cleanup for test use.

PR-review batch 3, item 14 (B-8): clearer "this is production" boundary;
0 LOC saved (file move + import rewrite). Builds clean; engineapi unit
tests + e2e_testing pass.
ComputeStateRoot, ComputeStateRootStreaming and computeStorageRoot grew a
NodeEmitter parameter when AccountsTrie/StoragesTrie persistence was
wired up, but client/reth/streaming_test.go was never updated — leaving
the reth e2e CI job in a build-failed state on every push.

These two tests check root-determinism only (legacy vs streaming RNG
path), not trie-table persistence, so nil is the correct emit (selects
the existing compute-only constructor).
@CPerezz CPerezz merged commit 0848ae8 into ethereum:main May 21, 2026
7 checks passed
CPerezz added a commit that referenced this pull request May 21, 2026
Acts on the two-agent audit of PR #75 by closing every actionable gap
the Mega-PR could close without Docker-driven boots of cgo clients.

CI integration (Part 7 Tier 1):
- client/geth/e2e_test.go: new TestE2ESuiteSpec — loads
  examples/spec-ci-min.yaml, runs Populate, boots geth in --dev, runs
  spamoor, captures genesis state-root, goes through the same
  RunSuitePhases boot→RPC→spamoor→golden-hash pipeline as the
  synthetic-fill suite. Pins the user's bar: "run state-actor with
  this new feature prior to using spamoor and run the common-golden-
  hash checks."
- examples/spec-ci-min.yaml: CI-fast fixture (~350 KB total spec
  footprint, ~12K slots materialized). Exercises every v1 schema
  feature in one file. Documents the per-client calibration-divergence
  trap (use sizecal.NewFixed for cross-client invariants in v1.5).
- .github/workflows/ci.yml geth-suite: -run filter now matches
  'TestE2ESuite$|TestE2ESuiteSpec$' (anchored so partial matches don't
  pull in unrelated functions); timeout bumped 45→60 min for two
  geth boots.
- Cross-client besu/nethermind/reth spec suites + sibling aggregator
  still tracked for v1.5 (need Docker image builds the v1 PR author
  cannot validate locally).

Schema-level rigor:
- internal/templates/{template,registry,raw,eoa,erc20}.go: new
  UserVisible() method on Template. raw + eoa return false (they're
  dispatched from `kind:` directly). erc20 returns true. main.go +
  build_test.go use templates.UserVisibleNames() — user-supplied
  `template: raw` / `template: eoa` now fail unknown-template
  validation cleanly. Closes the schema ambiguity Agent B flagged.
- internal/spec/validate.go: new MaxCodeSize = 24576 (EIP-170);
  Validate rejects code > limit so genesis state can't carry
  EIP-170-violating code that some clients silently accept and others
  reject.
- generator/config.go: Config.Validate now enforces the SPEC.md
  promise — fails loudly when spec storage (estimated at 80 B/slot
  conservative) exceeds --target-size. Previously the docs lied; users
  would have gotten silent truncation.
- internal/templates/erc20.go: honor e.Nonce (was hardcoded Nonce=1).
  Floor at 1 per EIP-161 so unset (zero) gets nonce=1 — preserving the
  go-ethereum genesis convention while letting users override.

Test coverage additions:
- internal/spec/validate_test.go: EIP-170 oversize/exact-max code,
  kind/template case sensitivity (Contract, EOA, ERC20 all rejected).
- internal/spec/parse_test.go: balance rejection table (8 sub-cases:
  unquoted int/float/bool, underscored, scientific, negative,
  alpha-no-prefix, empty), max-uint256 boundary + overflow, address
  edge cases (zero, max, too-long, prefix-only, unquoted-hex),
  code edge cases (empty, prefix-only, single-byte, 23-byte 7702
  marker, odd-length, non-hex).
- internal/templates/erc20_test.go: multi-holder Solidity-equivalence
  (25 holders independently verified), nonce honoring 3 sub-cases.
- internal/specbuild/build_test.go: TestBuildDeterminismEndToEnd
  drains storage iterators twice and compares — pins the strongest
  determinism guarantee.
- generator/prealloc_test.go: TestValidateRejectsSpecExceedingTargetSize
  + TestValidateAcceptsSpecUnderTargetSize pin the new budget check.

Documentation walk-back:
- docs/SPEC.md: removes the false claim that the cross-client
  invariant is pinned in CI. Replaces with what's actually pinned
  (unit-level determinism + geth e2e). Adds the v1.5 follow-up shape.
- CHANGELOG.md: enumerates the audit-driven additions, walks back the
  cross-client-spec-genesis-root over-claim, documents the ERC-20
  nonce-floor surprise.

testdata/valid-all-features.yaml: shrunk approximate_size_bytes from
~100MB to ~100KB per entity so TestBuildDeterminismEndToEnd runs in
<0.3s instead of 8s. The fixture is internal test data, not the CI
fixture (examples/spec-ci-min.yaml is the CI input).

go vet + go build + go test -short ./... green.
CPerezz added a commit that referenced this pull request May 21, 2026
…C verification

Two audit follow-ups from the user:

F1: Removed Config.InjectAddresses field entirely.
- `--inject-accounts` CLI flag was already gone (Part 6); now the Go
  field disappears too. All callers migrated to cfg.PreAlloc.
- Deleted writer code in geth/state_writer.go (lines 133, 184), besu/
  state_writer_cgo.go (lines 116-144), nethermind/run_cgo.go (line 145
  stats), nethermind/entitygen_cgo.go (the seenInjected loop), reth/
  run_cgo.go (Phase 4a inject block) — the per-client special-cased
  999_999_999 ETH injector. Spamoor sender now arrives via the spec
  YAML's first entity.
- Deleted generator.Generator's InjectAddresses loop (binary-trie path).
- Deleted client/reth/options.go:buildInjectedAccount + its test
  (no callers remain).
- Simplified Config.Validate: dropped the
  `InjectAddresses ∩ GenesisAccounts` collision check.
- Updated generator/config_test.go + generator/prealloc_test.go to
  drop the tests of the deleted collision class.
- Cleaned dangling docstrings + the Phase 4a comment in reth/run_cgo.go.

F2: Storage-slot RPC verification for every spec entity at Phase 4.
- Extended CheckInjections in internal/e2e_testing/check_entities.go
  to walk cfg.GenesisStorage. For each entity it samples up to 5 slots
  via the new sampleStorageSlots helper (sorted-by-key,
  first/last/middle-spaced — deterministic) and calls
  rpcprobe.EthGetStorageAt against the booted client, asserting the
  RPC value matches cfg.GenesisStorage[addr][slot] byte-for-byte.
- Bounds RPC roundtrips at O(addresses × 5) per Phase 4 invocation
  regardless of fixture size — ~30 calls for the CI baseline.
- Catches the bug class flagged by the user: ERC-20 holder balances
  injected into the writer but vanished by RPC time, 7702 EOA
  storage-bloat slots never landed, raw-contract synthesized slots
  dropped.
- Added 4 unit tests pinning the sampling logic: returns-all-when-small,
  caps-at-sample-size, deterministic, spread-spans-the-range.
- EthGetStorageAt helper was already in internal/rpcprobe/probe.go;
  no new RPC helper file needed.

Verification: go vet + go build + go test -short ./... green across
all packages. PR #75's CI run will exercise the new RPC checks
end-to-end against the booted geth/besu/nethermind/reth clients.

CHANGELOG updated to reflect:
- The Removed-flag note now says the field is fully gone, not just the
  flag, and that programmatic callers migrate to cfg.PreAlloc.
- CheckInjections's coverage now includes storage with the sampling
  bound documented.
@CPerezz CPerezz deleted the feat/spec-yaml branch May 21, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fork-aware predeploy preallocation (--fork=cancun|prague|osaka)

1 participant