Skip to content

feat(l1): parallel body download from multiple peers#6490

Draft
azteca1998 wants to merge 4 commits into
mainfrom
feat/parallel-body-download
Draft

feat(l1): parallel body download from multiple peers#6490
azteca1998 wants to merge 4 commits into
mainfrom
feat/parallel-body-download

Conversation

@azteca1998
Copy link
Copy Markdown
Contributor

Summary

  • Adds request_block_bodies_parallel() to PeerHandler that splits headers into 128-block chunks and dispatches each to a different peer via tokio::spawn
  • Adds download_bodies_from_peer() helper for single-peer body fetch within spawned tasks
  • Replaces the sequential body download loop in sync_cycle_full with the parallel version
  • Failed chunks are automatically retried with other peers; bodies are validated against headers before acceptance
  • Est. +30-50% body download speed

Closes #6481

Test plan

  • Run full sync on hoodi with FULL_SYNC_BLOCK_LIMIT=50000 and verify bodies download correctly
  • Compare body download time vs sequential approach
  • Verify correct ordering: blocks in same order as input headers
  • Test with limited peers to verify retry/fallback behavior

Split block body requests into chunks of MAX_BLOCK_BODIES_TO_REQUEST
and dispatch each chunk to a different peer in parallel using
tokio::spawn, following the same channel-based pattern used by the
existing parallel header download. Failed chunks are automatically
re-queued and retried with other peers, and partial responses are
handled by re-queuing the remainder.

This replaces the sequential loop that sent 128-block requests to
one peer at a time, significantly improving body download throughput
when multiple peers are available.
@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. High: partial body responses are reassembled in the wrong global order. When a peer returns only a prefix of a chunk, the remainder is pushed as a brand-new result slot in peer_handler.rs:685, and final output is flattened by iterating results in slot order in peer_handler.rs:737. sync_cycle_full then zips that flattened body list with headers in original order in full.rs:137. If chunk 0 returns 64/128 bodies, bodies 65..128 get moved after later chunks, so headers and bodies become mismatched. This is a correctness bug for block execution. Keep original offsets for each subtask, or write into a pre-sized per-header result buffer instead of appending remainders as new chunk indices.

  2. High: the new parallel downloader can hang forever on persistent failures. Failed/invalid chunks are always re-queued in peer_handler.rs:646 and peer_handler.rs:671, but the loop in peer_handler.rs:640 has no retry budget or terminal error path. The old path bounded this with REQUEST_RETRY_ATTEMPTS in peer_handler.rs:555. With the new call site in full.rs:132, a malicious or just unhealthy peer set can stall full sync indefinitely instead of surfacing BodiesNotFound. Add per-chunk attempt counters or an overall deadline and fail once exhausted.

I did not spot any EVM/gas/consensus-rule changes in this PR beyond the sync transport path. I did not run tests locally.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have a complete picture. Here is my review:


PR Review: feat(p2p): parallel body download from multiple peers

Overview

This PR replaces the sequential body download loop in sync_cycle_full with a parallel multi-peer approach. The design mirrors the existing parallel header download (request_block_headers_parallel) and follows the same channel + tokio::spawn pattern. The structural idea is sound, but there are correctness and robustness issues that must be addressed before merging.


Critical Bug: Partial-result ordering is broken

File: peer_handler.rs, lines 676–691

When a peer returns fewer bodies than requested, the remainder is pushed to a new slot appended to the end of results:

let new_idx = results.len();   // always at the tail
results.push(None);
tasks_queue.push_back((new_idx, remaining_headers));

The final reassembly iterates results in index order:

for slot in results {
    all_bodies.extend(bodies);
}

So the remainder of chunk i ends up after all original chunks, not right after its partial sibling. Example with 256 headers (two chunks of 128):

  • Chunk 0 → peer returns only B0..B63 (partial). New task gets idx=2.
  • Chunk 1 → peer returns B128..B255 fully. results[1] set.
  • New task (idx=2) → peer returns B64..B127. results[2] set.
  • Reassembly: [B0..B63, B128..B255, B64..B127]

The zip in sync_cycle_full then pairs B128..B255 with headers H64..H127, producing silently wrong blocks. This is a data-corruption bug on any partially-responding peer.

Fix: track a (chunk_idx, sub_offset) or maintain a flat sorted list of (global_start, bodies) entries that are stitched together in order at the end, similar to how the header download tracks startblock.


No retry limit — potential infinite loop

File: peer_handler.rs, loop body

The sequential request_block_bodies enforced REQUEST_RETRY_ATTEMPTS. The parallel version has no bound. A chunk for which every peer persistently fails (network partition, all peers serve invalid data for those blocks) will be re-queued forever, hanging the sync indefinitely. At minimum, add a per-chunk attempt counter and fail-fast with an error after N attempts.


_peer_id is dead parameter

File: peer_handler.rs line 587

async fn download_bodies_from_peer(
    _peer_id: H256,   // <-- unused
    connection: &mut PeerConnection,
    block_hashes: Vec<H256>,
) -> Result<Vec<BlockBody>, PeerHandlerError> {

The leading underscore signals intent to suppress the warning, but the analogous download_chunk_from_peer uses peer_id for a debug! log. Drop the parameter or add a log line — either way it shouldn't be a silent no-op in the signature.


dec_requests leak on channel-send failure

File: peer_handler.rs, tokio::spawn block (lines 720–732)

inc_requests is called before spawning (line 718). dec_requests is called only inside the rx.try_recv() handler. If the channel send in the spawned task fails (e.g., because the outer loop returned early on a get_best_peer error), dec_requests is never called and the peer's request counter stays elevated permanently. This matches the pattern already present in request_block_headers_parallel, so it's a pre-existing wart — but worth fixing in both places with a guard or by restructuring so the spawn always delivers to the channel.


Unnecessary get_best_peer call when queue is empty

File: peer_handler.rs, lines 699–712

When the task queue is empty but in-flight tasks haven't reported back yet, the loop still calls get_best_peer() on every tick before discovering tasks_queue.pop_front() returns None. This wastes an async call (and possibly a lock acquisition) every 10 ms purely to discover no work is queued. Guard with an explicit check:

if tasks_queue.is_empty() {
    tokio::time::sleep(Duration::from_millis(10)).await;
    continue;
}

before the get_best_peer call.


Channel capacity may be under-provisioned after growth

File: peer_handler.rs, line 638

let (tx, mut rx) = tokio::sync::mpsc::channel::<(...)>(total_chunks + 1);

The channel is sized to total_chunks + 1. But every partial response pushes a new entry into results (and dispatches another task), so the total number of in-flight tasks can exceed total_chunks. If the channel fills up, spawned tasks will block on send, potentially stalling. Size the channel to accommodate the maximum possible concurrent tasks or use an unbounded channel given results are drained on every iteration.


Minor style issues

  • Missing blank line between request_block_bodies_parallel's closing brace and the /// Returns the PeerData… doc comment (peer_handler.rs, line 750–751).
  • The comment on line 694 says "Check if all original data has been collected" but results now contains dynamically added slots — the comment is misleading.
  • bodies variable in the for loop on line 658 shadows the outer bodies from try_recv; consider renaming to body since it's a single item from zip.

What's good

  • The overall architecture (channel + tokio::spawn, inc_requests before spawn, dec_requests on receipt, get_best_peer for load-aware peer selection) correctly mirrors the proven header-download pattern.
  • Validation against headers (validate_block_body) is performed before accepting bodies, consistent with the sequential path.
  • Partial-peer responses trigger re-queueing rather than failing the whole sync — correct defensive behavior (once the ordering bug is fixed).
  • The simplification in sync_cycle_full is clean and easier to follow than the old while !headers.is_empty() loop.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 16, 2026

Lines of code report

Total lines added: 126
Total lines removed: 5
Total lines changed: 131

Detailed view
+----------------------------------------------+-------+------+
| File                                         | Lines | Diff |
+----------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/peer_handler.rs | 722   | +126 |
+----------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/full.rs    | 287   | -5   |
+----------------------------------------------+-------+------+

@azteca1998 azteca1998 changed the title feat(p2p): parallel body download from multiple peers feat(l1): parallel body download from multiple peers Apr 16, 2026
@github-actions github-actions Bot added the L1 Ethereum client label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Parallel Body Download from Multiple Peers

1 participant