feat(l1): pipeline body download with execution#6491
Conversation
Lines of code reportTotal lines added: Detailed view |
🤖 Claude Code ReviewNow I have enough context to write a thorough review. PR Review:
|
| Category | Severity | Item |
|---|---|---|
| Correctness | Medium | Single-batch always uses final_batch = true, potentially changing execution mode for large single batches |
| Correctness | Low | Empty batch silently drops final_batch = true signal |
| Design | Low | Download task not aborted/awaited on execution error |
| Style | Medium | Inner body-download loop duplicated in both paths |
| Style | Low | Asymmetric send-error handling in single vs multi-batch |
The pipelining design is solid. The main request before merging is to extract the duplicated download loop into a shared helper and verify the single-batch / EXECUTE_BATCH_SIZE interaction. The test plan (hoodi full sync + timing comparison) is the right way to validate the throughput claim.
Automated review by Claude (Anthropic) · sonnet · custom prompt
🤖 Codex Code ReviewFindings
Notes I did not see an EVM/gas-accounting or consensus-rule regression in the diff itself; batch ordering is still FIFO. I couldn’t run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewNow I have a complete picture. Let me write up the review. PR #6491:
|
Greptile SummaryThis PR restructures
Confidence Score: 3/5Not safe to merge as-is — the download task is not cancelled on execution error, leaking peer connections and network I/O on the sync failure path. One P1 finding (task leak on early exit) that affects the error path on every sync failure. The happy path is correct and the pipelining logic is sound, but the missing abort/cancellation is a real resource management defect. P1 with a single occurrence caps confidence at 4, and the core sync path being affected brings it to 3. crates/networking/p2p/sync/full.rs — specifically the execution loop / download_task lifecycle around lines 227–246.
|
| Filename | Overview |
|---|---|
| crates/networking/p2p/sync/full.rs | Restructures sync_cycle_full to overlap body downloads with block execution via an mpsc channel; one P1 issue: the download task is not aborted/cancelled when the execution loop exits early via ?, allowing it to continue wasting network I/O for up to one full batch. |
Sequence Diagram
sequenceDiagram
participant SC as sync_cycle_full
participant DT as download_task (tokio::spawn)
participant PH as PeerHandler
participant CH as mpsc::channel(2)
participant EL as execution loop
participant BC as Blockchain
SC->>DT: tokio::spawn(async move { ... })
SC->>EL: while let Some(result) = body_rx.recv()
loop For each batch
DT->>PH: request_block_bodies(header_batch)
PH-->>DT: Ok(Some(bodies))
DT->>CH: body_tx.send(Ok((blocks, final_batch)))
CH-->>EL: body_rx.recv() → (blocks, final_batch)
EL->>BC: add_blocks_in_batch(blocks, final_batch)
BC-->>EL: Ok(())
end
DT-->>SC: task completes (channel closed)
EL-->>SC: body_rx exhausted
SC->>DT: download_task.await? (join)
Note over DT,EL: On execution error: EL returns early via ?,\nJoinHandle dropped → task detaches (P1 leak)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 227-246
Comment:
**Download task leaks on execution error**
When `result?` or `add_blocks_in_batch(...).await?` returns an error, `sync_cycle_full` returns early and the `download_task` `JoinHandle` is dropped. In Tokio, dropping a `JoinHandle` **detaches** the task — it keeps running. The background task will continue downloading block bodies (up to one full batch) until it tries to send on the closed channel. This wastes peer connections and network I/O after a sync failure.
The `cancel_token` is already in scope but not wired into the download task. A minimal fix is to `abort` the task on early exit, or to select on the token inside the download loop:
```rust
// On error exit, ensure the task stops immediately
while let Some(result) = body_rx.recv().await {
let (blocks, final_batch) = result?;
// ...
add_blocks_in_batch(...).await.map_err(|e| {
download_task.abort();
e
})?;
}
```
Or, propagate the existing `cancel_token` into the `tokio::spawn` closure and `select!` on `cancel_token.cancelled()` inside the body-download loop.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 217-222
Comment:
**Silent skip when empty blocks on final batch in multi-batch path**
The condition `if !blocks.is_empty() && body_tx.send(...).await.is_err()` short-circuits when `blocks` is empty — correct for execution. But if `blocks` is empty on the iteration where `final_batch == true`, the `final_batch = true` signal is never sent. This means the execution loop never calls `add_blocks_in_batch` with `final_batch = true` for that iteration, which causes blocks to be executed via batch mode rather than pipeline mode (`add_blocks` checks `sync_head_found`). In practice an empty last batch is unlikely, but the asymmetry between the two code paths (single-batch always sends `final_batch = true`) is fragile and worth documenting.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 133-165
Comment:
**Single-batch path silently swallows empty-block case**
If `headers` arrives genuinely empty (all headers were canonical and `block_headers.drain` removed them all), `blocks` stays empty and nothing is sent to the channel. The overall control flow is correct and safe, but it would be worth an explicit comment so future readers don't wonder whether a send was accidentally dropped.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile
| while let Some(result) = body_rx.recv().await { | ||
| let (blocks, final_batch) = result?; | ||
| info!( | ||
| "Executing {} blocks for full sync. First block hash: {:#?} Last block hash: {:#?}", | ||
| blocks.len(), | ||
| blocks.first().ok_or(SyncError::NoBlocks)?.hash(), | ||
| blocks.last().ok_or(SyncError::NoBlocks)?.hash() | ||
| ); | ||
| add_blocks_in_batch( | ||
| blockchain.clone(), | ||
| cancel_token.clone(), | ||
| blocks, | ||
| final_batch, | ||
| store.clone(), | ||
| ) | ||
| .await?; | ||
| } | ||
|
|
||
| // Ensure the download task completes and propagate any panics | ||
| download_task.await?; |
There was a problem hiding this comment.
Download task leaks on execution error
When result? or add_blocks_in_batch(...).await? returns an error, sync_cycle_full returns early and the download_task JoinHandle is dropped. In Tokio, dropping a JoinHandle detaches the task — it keeps running. The background task will continue downloading block bodies (up to one full batch) until it tries to send on the closed channel. This wastes peer connections and network I/O after a sync failure.
The cancel_token is already in scope but not wired into the download task. A minimal fix is to abort the task on early exit, or to select on the token inside the download loop:
// On error exit, ensure the task stops immediately
while let Some(result) = body_rx.recv().await {
let (blocks, final_batch) = result?;
// ...
add_blocks_in_batch(...).await.map_err(|e| {
download_task.abort();
e
})?;
}Or, propagate the existing cancel_token into the tokio::spawn closure and select! on cancel_token.cancelled() inside the body-download loop.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 227-246
Comment:
**Download task leaks on execution error**
When `result?` or `add_blocks_in_batch(...).await?` returns an error, `sync_cycle_full` returns early and the `download_task` `JoinHandle` is dropped. In Tokio, dropping a `JoinHandle` **detaches** the task — it keeps running. The background task will continue downloading block bodies (up to one full batch) until it tries to send on the closed channel. This wastes peer connections and network I/O after a sync failure.
The `cancel_token` is already in scope but not wired into the download task. A minimal fix is to `abort` the task on early exit, or to select on the token inside the download loop:
```rust
// On error exit, ensure the task stops immediately
while let Some(result) = body_rx.recv().await {
let (blocks, final_batch) = result?;
// ...
add_blocks_in_batch(...).await.map_err(|e| {
download_task.abort();
e
})?;
}
```
Or, propagate the existing `cancel_token` into the `tokio::spawn` closure and `select!` on `cancel_token.cancelled()` inside the body-download loop.
How can I resolve this? If you propose a fix, please make it concise.| if !blocks.is_empty() | ||
| && body_tx.send(Ok((blocks, final_batch))).await.is_err() | ||
| { | ||
| // Receiver dropped (execution loop stopped), stop downloading | ||
| return; | ||
| } |
There was a problem hiding this comment.
Silent skip when empty blocks on final batch in multi-batch path
The condition if !blocks.is_empty() && body_tx.send(...).await.is_err() short-circuits when blocks is empty — correct for execution. But if blocks is empty on the iteration where final_batch == true, the final_batch = true signal is never sent. This means the execution loop never calls add_blocks_in_batch with final_batch = true for that iteration, which causes blocks to be executed via batch mode rather than pipeline mode (add_blocks checks sync_head_found). In practice an empty last batch is unlikely, but the asymmetry between the two code paths (single-batch always sends final_batch = true) is fragile and worth documenting.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 217-222
Comment:
**Silent skip when empty blocks on final batch in multi-batch path**
The condition `if !blocks.is_empty() && body_tx.send(...).await.is_err()` short-circuits when `blocks` is empty — correct for execution. But if `blocks` is empty on the iteration where `final_batch == true`, the `final_batch = true` signal is never sent. This means the execution loop never calls `add_blocks_in_batch` with `final_batch = true` for that iteration, which causes blocks to be executed via batch mode rather than pipeline mode (`add_blocks` checks `sync_head_found`). In practice an empty last batch is unlikely, but the asymmetry between the two code paths (single-batch always sends `final_batch = true`) is fragile and worth documenting.
How can I resolve this? If you propose a fix, please make it concise.| let download_task = tokio::spawn(async move { | ||
| // If single_batch, we already have headers in memory — send them as the one and only batch. | ||
| if single_batch { | ||
| let final_batch = true; | ||
| let mut batch_headers = headers; | ||
| let mut blocks = Vec::new(); | ||
| while !batch_headers.is_empty() { | ||
| let end = min(MAX_BLOCK_BODIES_TO_REQUEST, batch_headers.len()); | ||
| let header_batch = &batch_headers[..end]; | ||
| match download_peers.request_block_bodies(header_batch).await { | ||
| Ok(Some(bodies)) => { | ||
| debug!("Obtained: {} block bodies", bodies.len()); | ||
| let block_batch = batch_headers | ||
| .drain(..bodies.len()) | ||
| .zip(bodies) | ||
| .map(|(header, body)| Block { header, body }); | ||
| blocks.extend(block_batch); | ||
| } | ||
| Ok(None) => { | ||
| let _ = body_tx.send(Err(SyncError::BodiesNotFound)).await; | ||
| return; | ||
| } | ||
| Err(e) => { | ||
| let _ = body_tx.send(Err(e.into())).await; | ||
| return; | ||
| } | ||
| } | ||
| } | ||
| if !blocks.is_empty() { | ||
| let _ = body_tx.send(Ok((blocks, final_batch))).await; | ||
| } | ||
| return; | ||
| } |
There was a problem hiding this comment.
Single-batch path silently swallows empty-block case
If headers arrives genuinely empty (all headers were canonical and block_headers.drain removed them all), blocks stays empty and nothing is sent to the channel. The overall control flow is correct and safe, but it would be worth an explicit comment so future readers don't wonder whether a send was accidentally dropped.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 133-165
Comment:
**Single-batch path silently swallows empty-block case**
If `headers` arrives genuinely empty (all headers were canonical and `block_headers.drain` removed them all), `blocks` stays empty and nothing is sent to the channel. The overall control flow is correct and safe, but it would be worth an explicit comment so future readers don't wonder whether a send was accidentally dropped.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Overlap block body downloads with block execution during full sync by using a background tokio task and an mpsc channel (buffer of 2 batches). The download task iterates through batches, fetches headers from the store, downloads bodies from peers, constructs Block objects, and sends them through the channel. The main loop receives completed batches and executes them immediately. This hides network I/O latency behind execution time for an estimated 12-15% throughput improvement. Closes #6479
3073071 to
7c9875d
Compare
🤖 Codex Code Review
Otherwise the ordering and I couldn’t run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Summary
sync_cycle_fullto overlap body downloads with block executionmpsc::channel(2)(2-batch buffer)Closes #6479
Test plan
FULL_SYNC_BLOCK_LIMIT=50000and verify completion