fix: replace blocking-in-async ops in validator, remote prover, and ntx-builder#2041
fix: replace blocking-in-async ops in validator, remote prover, and ntx-builder#2041Ollie202 wants to merge 8 commits into0xMiden:mainfrom
Conversation
…wn_blocking/block_in_place Proof verification, VM execution, local signing, and local proving are CPU-bound operations that were running directly on Tokio worker threads, stalling the async runtime. Wraps them in block_in_place or spawn_blocking to isolate blocking work from the async executor. - validator: wrap proof verification and VM execution in block_in_place - validator: wrap local SecretKey signing in block_in_place - remote-prover: wrap LocalBatchProver::prove in spawn_blocking - remote-prover: wrap LocalBlockProver::prove in block_in_place Closes 0xMiden#1976
Mirko-von-Leipzig
left a comment
There was a problem hiding this comment.
Please don't use block_in_place
Replaces all uses of tokio::task::block_in_place with spawn_blocking across the validator and remote-prover crates, as requested in review. spawn_blocking is preferred because it works with both multi-threaded and current_thread runtimes (including #[tokio::test]). For the TX executor which uses an async API that may not yield, a dedicated current_thread runtime is created inside the spawn_blocking closure to drive the future to completion.
On it👍🏻 |
LocalTransactionProver::prove generates a ZK proof which is CPU-intensive and does not yield. Wrap it in spawn_blocking with a current_thread runtime when no remote prover is configured.
The filter_notes and execute steps call executor.execute_transaction / NoteConsumptionChecker.check_notes_consumability, which are CPU-intensive and may not yield between await points. Move data-store creation + filter + execute onto a dedicated blocking thread via spawn_blocking. Use Handle::current().block_on() (not a new runtime) so that the gRPC data-store callbacks made by the VM are driven by the existing I/O driver on the parent runtime threads.
|
Updated across three commits -- ready for re-review: Mirko (@Mirko-von-Leipzig): All Krisztian (@kkovaacs): The PR now covers all the blocking-in-async areas from #1976 except the store (which is in #2008):
The only remaining item is store RocksDB blocking ops, already tracked in #2008. |
Move the tokio runtime handle capture to before the spawn_blocking closure so it is explicit that the same runtime is used for execution.
|
Moved |
kkovaacs
left a comment
There was a problem hiding this comment.
The ntx-builder execution part with the async-in-blocking context is a bit ugly but I don't have a better suggestion for now...
Partially addresses #1976 (store blocking ops remain in #2008)
Problem
Proof verification, VM execution, local signing, local batch/block proving, local transaction
proving, and ntx-builder VM execution were running CPU-intensive work directly on Tokio worker
threads. As noted in the issue (with dial9 profiling evidence), this starves the async runtime
and degrades throughput.
Changes
crates/validator/src/tx_validation/mod.rsTransactionVerifier::verify(proof verification) moved tospawn_blockingexecutor.execute_transaction(VM execution) moved tospawn_blockingwith a dedicatedcurrent_threadruntime since the validator data store makes no real async I/O callscrates/validator/src/signers/mod.rsSecretKey::sign(sync ECDSA wrapped in async fn) moved tospawn_blockingbin/remote-prover/src/server/prover.rsLocalBatchProver::provewas already fixed;LocalBlockProver::provemoved tospawn_blockingLocalTransactionProver::proveleft as-is -- delegates to the VM async APIcrates/ntx-builder/src/actor/execute.rsLocalTransactionProver::prove(ZK proof generation) moved tospawn_blockingfilter_notes+execute(VM execution with data-store callbacks) moved tospawn_blockingwith
Handle::current().block_on()so gRPC data-store calls use the existing I/O driverNot covered
Store blocking ops (RocksDB) are being addressed separately in #2008.