Skip to content

fix(l1,l2): serialize forkchoice_update#6508

Merged
iovoid merged 1 commit intomainfrom
fix-fcu-race
May 7, 2026
Merged

fix(l1,l2): serialize forkchoice_update#6508
iovoid merged 1 commit intomainfrom
fix-fcu-race

Conversation

@iovoid
Copy link
Copy Markdown
Contributor

@iovoid iovoid commented Apr 20, 2026

Motivation

Currently if forkchoice_update is called by two sites simultaneously, we could end up with a corrupted canonical chain.

Description

There are two underlying problems:

  • forkchoice_update reads the latest block outside the atomic write segment, allowing two threads to read the same 'current latest'
  • latest_block_header could end up in an inconsistent state

This PR fixes them by serializing writes with a mutex.

It also adds a regression test.

@iovoid iovoid requested a review from a team as a code owner April 20, 2026 19:15
@github-actions github-actions Bot added L1 Ethereum client L2 Rollup client labels Apr 20, 2026
@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. High: the new delete_range is not correct for CANONICAL_BLOCK_HASHES, because that column family stores block numbers as u64::to_le_bytes(), while both RocksDB and the in-memory map apply range deletion in lexicographic byte order, not numeric order. [11, 0, ..] is lexicographically greater than [0, 1, ..], so reorging to head 10 will not delete block 256, and many higher canonical entries can survive incorrectly. This is a consensus-facing correctness bug in canonical chain maintenance. References: crates/storage/store.rs:1034, crates/storage/store.rs:1036, crates/storage/api/tables.rs:4, crates/storage/backend/rocksdb.rs:368, crates/storage/backend/in_memory.rs:184. Suggested fix: do not use range delete on little-endian numeric keys; either keep explicit numeric deletes, or migrate this CF to a sortable encoding such as big-endian before introducing range operations.

  2. Medium: the added regression test would not catch Item 1. It only runs against EngineType::InMemory and only uses block heights 10..15, where little-endian byte order still happens to match numeric order, so the test can pass while the RocksDB implementation is wrong for real chain heights. References: test/tests/storage/fcu_race_tests.rs:30, test/tests/storage/fcu_race_tests.rs:31, test/tests/storage/fcu_race_tests.rs:96, test/tests/storage/fcu_race_tests.rs:98. This should add at least one case that crosses 255 -> 256, and ideally exercise the RocksDB backend too.

I did not run cargo test: the environment’s rustup install tried to write under /home/runner/.rustup/tmp and failed with a read-only filesystem error.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Lines of code report

Total lines added: 9
Total lines removed: 0
Total lines changed: 9

Detailed view
+--------------------------------+-------+------+
| File                           | Lines | Diff |
+--------------------------------+-------+------+
| ethrex/crates/storage/store.rs | 2601  | +9   |
+--------------------------------+-------+------+

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 20, 2026

Greptile Summary

This PR addresses a TOCTOU race in forkchoice_update by adding a tokio::sync::Mutex to serialize concurrent callers and replacing the loop-based canonical-entry deletion with a delete_range call to eliminate the pre-read of LatestBlockNumber. The mutex serialization is correct, but the delete_range approach introduces a new correctness bug: CANONICAL_BLOCK_HASHES keys are stored as little-endian u64, which does not sort lexicographically by numeric value — causing RocksDB's (and the in-memory backend's) byte-wise comparator to silently skip entries whose LE first byte is numerically smaller than (head_number+1)'s first byte (e.g., block 65536 survives a reorg to head 50000). The regression test only uses block numbers ≤ 15, so this breakage is not caught by the new test.

Confidence Score: 3/5

Not safe to merge: the delete_range fix reintroduces canonical-chain corruption for reorgs spanning a byte-boundary in LE-encoded block numbers.

The mutex approach correctly serializes forkchoice_update callers and the TOCTOU race is fixed at the concurrency level. However, the new delete_range on LE-encoded keys is semantically incorrect with RocksDB's default lexicographic comparator, leaving orphan canonical entries for block numbers >= 256 in certain reorg scenarios. The regression test's small block numbers (<=15) cannot catch this. This is a P1 data-integrity issue that needs to be resolved before merging.

crates/storage/backend/rocksdb.rs and crates/storage/backend/in_memory.rs both need the delete_range key-ordering fixed; test/tests/storage/fcu_race_tests.rs should be extended to cover block numbers above 256.

Important Files Changed

Filename Overview
crates/storage/store.rs Adds fcu_lock mutex to serialize concurrent forkchoice_update calls and removes the TOCTOU-prone load_latest_block_number read. The mutex approach is correct, but the delete_range bounds rely on LE key ordering which is incorrect for RocksDB's lexicographic comparator.
crates/storage/backend/rocksdb.rs Implements delete_range via WriteBatch::delete_range_cf, but CANONICAL_BLOCK_HASHES uses little-endian u64 keys while RocksDB's default comparator is lexicographic — the range will silently miss entries whose LE first byte falls below start_key's first byte.
crates/storage/backend/in_memory.rs Implements delete_range using Rust slice lexicographic comparison via retain; suffers the same LE byte-ordering bug as the RocksDB path for block numbers ≥ 256.
crates/storage/api/mod.rs Adds delete_range to the StorageWriteBatch trait with a clear half-open [start, end) contract; clean API surface.
test/tests/storage/fcu_race_tests.rs Good regression test for the TOCTOU race; however all block numbers are ≤ 15 (BASE+EXT), so the LE byte-ordering bug in delete_range is never exercised.
crates/storage/Cargo.toml Adds sync feature to the tokio dependency to enable tokio::sync::Mutex; correct and minimal change.
test/tests/storage/mod.rs Registers the new fcu_race_tests module; trivial plumbing change.

Sequence Diagram

sequenceDiagram
    participant CallerA
    participant CallerB
    participant fcu_lock as fcu_lock (tokio::Mutex)
    participant Store
    participant DB as RocksDB / InMemory

    CallerA->>fcu_lock: lock().await → _guard
    Note over fcu_lock: CallerB blocks here
    CallerA->>Store: latest_block_header.update(new_head_A)
    CallerA->>Store: forkchoice_update_inner(...)
    Store->>DB: txn.put(new_canonical_blocks)
    Store->>DB: txn.delete_range([head+1, u64::MAX)) ⚠️ LE key ordering
    Store->>DB: txn.put(LatestBlockNumber = head_A)
    Store->>DB: txn.commit()
    CallerA->>fcu_lock: drop(_guard)

    CallerB->>fcu_lock: lock().await → _guard
    CallerB->>Store: latest_block_header.update(new_head_B)
    CallerB->>Store: forkchoice_update_inner(...)
    Store->>DB: txn.put / delete_range / commit
    CallerB->>fcu_lock: drop(_guard)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/storage/backend/rocksdb.rs
Line: 376-382

Comment:
**Little-endian keys are not monotone under lexicographic order — `delete_range` will silently skip entries**

RocksDB's default comparator is lexicographic (byte-by-byte, left to right). Little-endian u64 encoding does not produce a lexicographically sorted sequence: for example, 256 encodes as `[0x00, 0x01, 0, …]`, which sorts *before* 11's `[0x0B, 0, …]` in lex order. As a result, `delete_range_cf` with `start = (head+1).to_le_bytes()` will miss any canonical entry whose LE first byte is smaller than the first byte of `head+1`, leaving orphan canonical entries—exactly the corruption this PR is trying to eliminate.

Concrete failure scenario: chain extends to block 65536 (`[0x00, 0x00, 0x01, 0, …]`), then reorgs to `head = 50000` (delete_start `[0x51, 0xC3, 0, …]`). Lex: `[0x00, 0x00, 0x01, …] < [0x51, 0xC3, …]`, so block 65536 survives the delete_range and the canonical table remains inconsistent.

The pre-existing loop-based deletion was correct because it generated each exact key. The fix is to either use big-endian encoding (which *is* lexicographically monotone) or restore the enumeration approach.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/storage/backend/in_memory.rs
Line: 197

Comment:
**Same LE byte-ordering bug as the RocksDB path**

Rust's `PartialOrd` on `&[u8]` is also lexicographic. The `retain` predicate `key.as_slice() < start_key || key.as_slice() >= end_key` applies the same byte-by-byte ordering and will preserve entries whose LE first byte is smaller than `start_key`'s first byte, even if their numeric value is above the new head. The regression test uses block numbers ≤ 15, which all fit in the first byte, so the test never exercises numbers ≥ 256 where the bug manifests.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 1034-1036

Comment:
**`u64::MAX.to_le_bytes()` is not a safe upper-bound sentinel for LE-encoded RocksDB keys**

Even if the byte-order issue were resolved, using `u64::MAX.to_le_bytes()` (`[0xFF; 8]`) as the exclusive end of a `delete_range` assumes that `[0xFF; 8]` is the lexicographic maximum 8-byte key. With big-endian encoding that assumption holds; with little-endian it does not, so this sentinel is conceptually tied to byte ordering. Additionally, `delete_range` semantics are `[start, end)`, meaning a canonical entry at block `u64::MAX` (key `[0xFF; 8]`) would not be removed — but that's an academic concern for this context.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(l1,l2): serialize forkchoice_update" | Re-trigger Greptile

Comment thread crates/storage/backend/rocksdb.rs Outdated
Comment on lines +376 to +382
.cf_handle(table)
.ok_or_else(|| StoreError::Custom(format!("Table {} not found", table)))?;

self.batch.delete_range_cf(&cf, start_key, end_key);
Ok(())
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Little-endian keys are not monotone under lexicographic order — delete_range will silently skip entries

RocksDB's default comparator is lexicographic (byte-by-byte, left to right). Little-endian u64 encoding does not produce a lexicographically sorted sequence: for example, 256 encodes as [0x00, 0x01, 0, …], which sorts before 11's [0x0B, 0, …] in lex order. As a result, delete_range_cf with start = (head+1).to_le_bytes() will miss any canonical entry whose LE first byte is smaller than the first byte of head+1, leaving orphan canonical entries—exactly the corruption this PR is trying to eliminate.

Concrete failure scenario: chain extends to block 65536 ([0x00, 0x00, 0x01, 0, …]), then reorgs to head = 50000 (delete_start [0x51, 0xC3, 0, …]). Lex: [0x00, 0x00, 0x01, …] < [0x51, 0xC3, …], so block 65536 survives the delete_range and the canonical table remains inconsistent.

The pre-existing loop-based deletion was correct because it generated each exact key. The fix is to either use big-endian encoding (which is lexicographically monotone) or restore the enumeration approach.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/backend/rocksdb.rs
Line: 376-382

Comment:
**Little-endian keys are not monotone under lexicographic order — `delete_range` will silently skip entries**

RocksDB's default comparator is lexicographic (byte-by-byte, left to right). Little-endian u64 encoding does not produce a lexicographically sorted sequence: for example, 256 encodes as `[0x00, 0x01, 0, …]`, which sorts *before* 11's `[0x0B, 0, …]` in lex order. As a result, `delete_range_cf` with `start = (head+1).to_le_bytes()` will miss any canonical entry whose LE first byte is smaller than the first byte of `head+1`, leaving orphan canonical entries—exactly the corruption this PR is trying to eliminate.

Concrete failure scenario: chain extends to block 65536 (`[0x00, 0x00, 0x01, 0, …]`), then reorgs to `head = 50000` (delete_start `[0x51, 0xC3, 0, …]`). Lex: `[0x00, 0x00, 0x01, …] < [0x51, 0xC3, …]`, so block 65536 survives the delete_range and the canonical table remains inconsistent.

The pre-existing loop-based deletion was correct because it generated each exact key. The fix is to either use big-endian encoding (which *is* lexicographically monotone) or restore the enumeration approach.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread crates/storage/backend/in_memory.rs Outdated

let db_mut = Arc::make_mut(&mut *db);
if let Some(table_ref) = db_mut.get_mut(table) {
table_ref.retain(|key, _| key.as_slice() < start_key || key.as_slice() >= end_key);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Same LE byte-ordering bug as the RocksDB path

Rust's PartialOrd on &[u8] is also lexicographic. The retain predicate key.as_slice() < start_key || key.as_slice() >= end_key applies the same byte-by-byte ordering and will preserve entries whose LE first byte is smaller than start_key's first byte, even if their numeric value is above the new head. The regression test uses block numbers ≤ 15, which all fit in the first byte, so the test never exercises numbers ≥ 256 where the bug manifests.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/backend/in_memory.rs
Line: 197

Comment:
**Same LE byte-ordering bug as the RocksDB path**

Rust's `PartialOrd` on `&[u8]` is also lexicographic. The `retain` predicate `key.as_slice() < start_key || key.as_slice() >= end_key` applies the same byte-by-byte ordering and will preserve entries whose LE first byte is smaller than `start_key`'s first byte, even if their numeric value is above the new head. The regression test uses block numbers ≤ 15, which all fit in the first byte, so the test never exercises numbers ≥ 256 where the bug manifests.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread crates/storage/store.rs Outdated
Comment on lines +1034 to +1036
let delete_start = (head_number + 1).to_le_bytes();
let delete_end = u64::MAX.to_le_bytes();
txn.delete_range(CANONICAL_BLOCK_HASHES, &delete_start, &delete_end)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 u64::MAX.to_le_bytes() is not a safe upper-bound sentinel for LE-encoded RocksDB keys

Even if the byte-order issue were resolved, using u64::MAX.to_le_bytes() ([0xFF; 8]) as the exclusive end of a delete_range assumes that [0xFF; 8] is the lexicographic maximum 8-byte key. With big-endian encoding that assumption holds; with little-endian it does not, so this sentinel is conceptually tied to byte ordering. Additionally, delete_range semantics are [start, end), meaning a canonical entry at block u64::MAX (key [0xFF; 8]) would not be removed — but that's an academic concern for this context.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 1034-1036

Comment:
**`u64::MAX.to_le_bytes()` is not a safe upper-bound sentinel for LE-encoded RocksDB keys**

Even if the byte-order issue were resolved, using `u64::MAX.to_le_bytes()` (`[0xFF; 8]`) as the exclusive end of a `delete_range` assumes that `[0xFF; 8]` is the lexicographic maximum 8-byte key. With big-endian encoding that assumption holds; with little-endian it does not, so this sentinel is conceptually tied to byte ordering. Additionally, `delete_range` semantics are `[start, end)`, meaning a canonical entry at block `u64::MAX` (key `[0xFF; 8]`) would not be removed — but that's an academic concern for this context.

How can I resolve this? If you propose a fix, please make it concise.

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have a thorough picture of the changes. Let me compile the review.


PR Review: fix(l1,l2): serialize forkchoice_update (#6508)

Overview

This PR addresses a TOCTOU race in forkchoice_update where two concurrent callers could observe the same LatestBlockNumber and leave orphaned canonical entries. The fix has two parts:

  • Replace per-key deletion with a delete_range on CANONICAL_BLOCK_HASHES to make the inner function self-contained (no read outside the transaction).
  • Add a tokio::sync::Mutex (fcu_lock) to serialize concurrent callers and protect the latest_block_header cache update.

The concurrency diagnosis is correct and the mutex is the right tool. However, there is a critical correctness bug in the delete_range implementation.


Critical Bug: delete_range is wrong for little-endian keys

This is the most important finding.

CANONICAL_BLOCK_HASHES keys are block numbers encoded as to_le_bytes(). Both the RocksDB backend (default bytewise/memcmp comparator — no custom comparator is set in rocksdb.rs) and the in-memory backend (retain with byte-slice comparison) use lexicographic ordering, which does not match numeric order for little-endian values.

For any n, le8(n) has:

  • byte 0 = n & 0xFF
  • byte 1 = (n >> 8) & 0xFF

Lexicographic comparison inspects byte 0 first. Because LE flips significance, le8(255) = [0xFF, 0x00, …] is greater than le8(256) = [0x00, 0x01, …] under memcmp, even though 255 < 256.

The impact for head_number = 255:

delete_start = le8(256) = [0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]

A key K = le8(n) satisfies K >= delete_start iff K[0] > 0x00 OR (K[0] == 0x00 AND K[1] >= 0x01). This means:

  • Blocks 1–255 (le8 first byte 0x01–0xFF > 0x00) → incorrectly deleted (they should stay canonical)
  • Block 0 → correctly preserved (it's head-or-before)
  • Block 65536le8(65536) = [0x00, 0x00, 0x01, …] → K[0]=0x00, K[1]=0x00 < 0x01 → incorrectly preserved (should be deleted in a reorg past 65536)

In other words: delete_range(le8(head+1), le8(u64::MAX)) deletes every block whose number is not a multiple of 65536 rather than every block greater than head. For a reorg to block 255, this silently destroys the entire canonical chain from block 1 to 254.

store.rs:1034–1036 — the affected lines:

let delete_start = (head_number + 1).to_le_bytes();
let delete_end = u64::MAX.to_le_bytes();
txn.delete_range(CANONICAL_BLOCK_HASHES, &delete_start, &delete_end)?;

The same logic flaw applies to in_memory.rs:197's retain call.

Why the regression test doesn't catch this: BASE = 10, EXT = 5 — all block numbers are < 256. For single-byte numbers, the LE first byte is the number itself, so lexicographic order coincidentally matches numeric order in this range. The test gives false confidence.

Fix options (in order of preference):

  1. Switch CANONICAL_BLOCK_HASHES keys to big-endian (to_be_bytes()): BE encoding preserves numeric order lexicographically, making range deletes correct. This is a one-time DB migration but the cleanest long-term solution.

  2. Keep per-key deletion but move the latest read inside the mutex (before calling forkchoice_update_inner), then pass it in:

    let _guard = self.fcu_lock.lock().await;
    let latest = self.load_latest_block_number().await?.unwrap_or(0);
    self.forkchoice_update_inner(..., latest).await?;

    Since the mutex is held, no other FCU can race between the read and the write. This avoids any range-ordering dependency.


Secondary Issues

forkchoice_update_inner is pub (store.rs:1012)

The mutex is only acquired in forkchoice_update. If any caller invokes forkchoice_update_inner directly, they bypass serialization entirely. Since the function is pub, external crates can call it. It should be pub(crate) at most, or better yet unexposed (fn).

u64 overflow in head_number + 1 (store.rs:1034)

If head_number == u64::MAX, this panics in debug mode or wraps in release. Use head_number.saturating_add(1) or an explicit check. Practically unlikely, but worth fixing for robustness.

latest_block_header update happens before the DB write (store.rs:2258–2261)

This ordering is intentional (referenced comment from issue #3927), but within the mutex the in-memory cache is updated optimistically before forkchoice_update_inner completes. If forkchoice_update_inner returns an error, the cache is now inconsistent with the DB. Consider reverting the cache update on error, or updating it after a successful commit.

In-memory delete_range is not deferred to commit (in_memory.rs:184–200)

The existing delete method also acquires the lock immediately rather than deferring. The pre-existing FIXME on commit() ("in-memory writes aren't atomic") covers this, and the new method is consistent with existing behavior — just worth noting the atomicity gap still exists.


Positive Notes

  • Adding tokio::sync::Mutex (not std::sync::Mutex) is correct since the lock spans .await points.
  • The Arc clone pattern for the fcu_lock field is consistent with the rest of the Store struct.
  • The test structure (seed → race two FCUs → cleanup probe) is a sound regression design; it just needs larger block numbers to cover the LE ordering edge cases.
  • Adding "sync" to the tokio feature flags is the right minimal change.

Summary

The mutex serialization is sound and addresses the stated race. The delete_range approach is broken for any chain that has crossed block 255 and is the primary blocker for merging. The test needs block numbers > 255 to be meaningful. Recommend resolving the delete_range strategy (option 2 above is the lowest-risk change) and making forkchoice_update_inner non-public before merge.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

This is a high-quality fix for a subtle TOCTOU race condition in the forkchoice update path. The implementation correctly addresses the safety issue while maintaining good documentation.

Critical Correctness: Race Condition Fix

  • Line 2256 (store.rs): The tokio::sync::Mutex correctly serializes concurrent FCU callers. This prevents the interleaving where two threads read the same latest value outside the write transaction, then both commit with stale delete ranges, leaving orphaned canonical entries.
  • Line 1020 (store.rs): The comment accurately describes the safety invariant—the lock ensures the read of LatestBlockNumber and the subsequent commit are atomic with respect to other FCU operations.

Storage Layer Safety: Delete Range Logic

  • Lines 1031-1036 (store.rs): Excellent documentation of the little-endian encoding trap. The explicit iteration over (head_number + 1)..=latest is correct—u64::to_le_bytes() creates non-lexicographic keys where block 256 ([0x00, 0x01]) sorts before block 11 ([0x0B]), making RocksDB's delete_range unsafe here.

Concurrency Design

  • Using tokio::sync::Mutex (async-aware) is appropriate since the operation spans async I/O (DB writes via spawn_blocking). The lock is held for the entire duration of forkchoice_update_inner, which is necessary for correctness but acceptable given FCU is not on the hot path for transaction processing.
  • Line 1520 (store.rs): Correct initialization of the new lock field.

Test Coverage

  • fcu_race_tests.rs: Well-designed stress test that actually triggers the race condition using a multi-threaded runtime. The test logic is sound:
    1. Seeds canonical entries at block BASE (250)
    2. Spawns concurrent extend (to 260) and no-op FCUs
    3. Verifies cleanup removes all entries above BASE
  • The choice of BASE=250 with EXT=10 deliberately crosses the byte boundary at 256 to guard against future refactors attempting lexicographic range deletions.

Minor Suggestions

  1. Test assertion style: In race_iteration, returning true for "race detected" is slightly counterintuitive (vs Result or assert!), but the usage is unambiguous.
  2. Lock ordering: Ensure no other code paths acquire locks inside forkchoice_update_inner while holding fcu_lock to prevent potential deadlocks. The current implementation appears safe as the DB operations use spawn_blocking without holding other store locks.

Summary: The fix is correct, well-documented, and properly tested. The endianness comment is particularly valuable for preventing future regressions.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have a complete picture. Let me write the review.


PR #6508 Review: fix(l1,l2): serialize forkchoice_update

Overview

This PR fixes a TOCTOU (time-of-check/time-of-use) race in Store::forkchoice_update where concurrent callers could interleave: (a) the LatestBlockNumber read (used to bound the canonical-entry delete loop) and (b) the DB write transaction and in-memory cache update. The fix adds a tokio::sync::Mutex (fcu_lock) that serializes all callers end-to-end. A focused regression test is included.

The approach is sound. Below are specific findings.


Correctness

forkchoice_update_inner is pub — lock contract unenforced (store.rs:1012)

pub async fn forkchoice_update_inner(...)

The invariant "the caller must hold fcu_lock" exists only in a comment. Any code that calls forkchoice_update_inner directly skips the mutex entirely. Currently there is only one caller (line 2262), but pub visibility invites future misuse.

Recommend: change to pub(crate) or private, or add a #[doc(hidden)] + strong # Safety doccomment that makes the invariant explicit. If nothing external needs it, making it private is the safest option.


Cache updated before DB commit — error path leaves state inconsistent (store.rs:2257–2269)

let _guard = self.fcu_lock.lock().await;

// Cache updated FIRST
self.latest_block_header.update(new_head);

// DB write happens SECOND — if this fails, the cache is stale
self.forkchoice_update_inner(...).await?;

If forkchoice_update_inner returns an error, latest_block_header now points to a block that the DB does not consider the canonical head. Subsequent nonce/state reads will be wrong until the next successful FCU.

The PR description cites issue #3927 as the reason for this ordering (cache-first avoids nonce inconsistencies during sync), but that rationale should be evaluated against the error case. At minimum, a // SAFETY: comment explaining why the stale-on-error case is acceptable would help future readers.


unwrap_or(0) in forkchoice_update_inner (store.rs:1020)

let latest = self.load_latest_block_number().await?.unwrap_or(0);

When latest is None (empty store), head_number + 1 is typically ≥ 1, so the delete range (head_number+1)..=0 is empty — safe. But calling FCU when head_number > 0 on an uninitialized store returns 0 and skips all cleanup. This edge case appears impossible in practice (genesis is always block 0), but a debug_assert!(head_number == 0 || latest.is_some()) would make the invariant explicit.


Design

tokio::sync::Mutex choice is correct. The lock is held across multiple .await points (load_latest_block_number().await, spawn_blocking(...).await). Using std::sync::Mutex here would be unsound (cannot hold across await). Good.

let _guard naming is correct. _guard (with trailing letters) keeps the value alive until end of scope. Bare _ would drop immediately and release the lock. No issue here.

Single constructor path. fcu_lock is initialized only in from_backend (line 1514), which is the single path that all public constructors flow through. No risk of a Store instance missing the field.


LE-encoding note in the diff

The added comment explains why delete_range is not used:

// `delete_range` is not safe here: keys are `u64::to_le_bytes()`, and
// RocksDB's lexicographic comparator does not match LE numeric order
// (e.g. block 256 = [0x00, 0x01, ..] sorts before block 11 = [0x0B, ..]),
// so a range-delete would silently miss blocks whose LE first byte is
// smaller than `head+1`'s first byte.

This is technically correct and is a valuable warning for future maintainers. The explanation clearly conveys the subtle LE vs. lexicographic mismatch. Worth keeping.


Test

test/tests/storage/fcu_race_tests.rs — well-structured.

  • #[tokio::test(flavor = "multi_thread", worker_threads = 4)] correctly exercises real concurrent scheduling.
  • The seed → race → cleanup → assert pattern is clear.
  • BASE=250, EXT=10 spans block 256, the boundary where LE byte representations overflow a single byte. This guards against someone silently breaking things by swapping the loop for a delete_range in a future refactor. The choice is clever and worth the inline explanation.
  • 100 iterations is a reasonable signal given the pre-fix hit rate of ~100%.

One minor gap: the test verifies the DB consistency (no orphan canonical entries) but does not assert cache consistency — i.e., that latest_block_header in memory matches the DB after concurrent FCUs. Adding that assertion would cover the second issue cited in the PR description ("latest_block_header could end up in an inconsistent state").


Summary

Finding Severity Pre-existing?
forkchoice_update_inner is pub, bypasses lock Medium No
Cache updated before DB commit; error leaves state dirty Low–Medium Yes, but PR touches this code
Test doesn't cover cache consistency Low No
unwrap_or(0) edge case undocumented Very low Yes

The core fix (tokio Mutex serialization) is correct and the regression test is solid. The main actionable item is restricting forkchoice_update_inner visibility so the lock invariant is enforced rather than documented.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  1. Cache/DB inconsistency still exists on the error path in store.rs line 2257. forkchoice_update() updates latest_block_header before awaiting the DB transaction, and if forkchoice_update_inner() fails, the cache is left pointing at a head that never committed. That is externally observable through store.rs line 985 and the fast-path in store.rs line 1168. The new fcu_lock only serializes callers; it does not make cache and DB state atomic. I’d either move the cache update after a successful commit, or restore the previous cached header on error while still holding the lock.

  2. The new regression test can pass spuriously because it ignores both spawned task results at fcu_race_tests.rs line 69, fcu_race_tests.rs line 73, and fcu_race_tests.rs line 77. If either forkchoice_update() returns Err or the task panics, the test keeps going and may report a false negative. This should assert both JoinHandles completed successfully and both inner results are Ok(()).

Notes

The per-key delete in forkchoice_update_inner looks like the right correctness fix for LE-encoded block numbers; replacing delete_range there is justified.

I wasn’t able to run the tests in this environment because cargo failed trying to update the Rust toolchain in a read-only rustup temp dir.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid TOCTOU fix. The mutex serializes the cache update + DB write so they can't interleave. The cache rollback on DB-write failure is the right complement — without it, a failed FCU would leave the cache pointing at a non-canonical head.

The regression test is excellent: the BASE=250, EXT=10 choice deliberately spans block 256 to catch any future refactor that tries to replace the per-block delete loop with a delete_range, since LE-encoded keys aren't lexicographically monotone. The threshold calibration (4999/5000 hits pre-fix) gives confidence the test would actually catch a regression.

The inline comment about why delete_range isn't safe is gold — exactly the kind of trap that bites people six months later.

Comment thread crates/storage/store.rs
}

// Delete canonical entries above the new head by enumerating each key.
// `delete_range` is not safe here: keys are `u64::to_le_bytes()`, and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is gold — exactly the kind of trap that bites people six months later.

@github-project-automation github-project-automation Bot moved this to In Review in ethrex_l1 May 6, 2026
@iovoid iovoid added this pull request to the merge queue May 7, 2026
Merged via the queue into main with commit 063b8be May 7, 2026
73 of 75 checks passed
@iovoid iovoid deleted the fix-fcu-race branch May 7, 2026 21:27
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 May 7, 2026
@github-project-automation github-project-automation Bot moved this to Done in ethrex_l2 May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client L2 Rollup client

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants