Skip to content

investigate timeout issue#396

Merged
xlc merged 10 commits intomasterfrom
timeout
Mar 10, 2026
Merged

investigate timeout issue#396
xlc merged 10 commits intomasterfrom
timeout

Conversation

@xlc
Copy link
Member

@xlc xlc commented Mar 6, 2026

No description provided.

- Introduce a new `lifecycleProbeEnabled` flag, activated by `BOKA_CHILD_PROCESS_TRACE=1` or `GITHUB_ACTIONS=true` environment variables.
- Implement `lifecycleProbe` to log detailed events during child process lifecycle, including spawning, readiness checks, and exit waiting.
- Enhance `sleepFor` to log the reason for sleeping, providing better context for delays.
- This tracing mechanism is designed to help diagnose and prevent hangs observed in CI environments by providing granular insights into child process states and transitions.
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes introduce a tracing mechanism to debug child process lifecycle issues in CI, which is a valuable addition. However, there is a performance concern in the sleepFor method where the reason autoclosure is evaluated unconditionally, incurring string interpolation overhead even when tracing is disabled.

@codecov
Copy link

codecov bot commented Mar 6, 2026

Codecov Report

❌ Patch coverage is 77.66990% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.14%. Comparing base (a99a709) to head (249f78b).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...ngProtocol/NetworkManager/CEProtocolHandlers.swift 62.74% 19 Missing ⚠️
...ources/PolkaVM/Executors/ChildProcessManager.swift 69.56% 14 Missing ⚠️
.../PolkaVM/Executors/ExecutorFrontendSandboxed.swift 54.54% 5 Missing ⚠️
Networking/Sources/Networking/Peer.swift 88.88% 2 Missing ⚠️
.../Tests/PolkaVMTests/ChildProcessManagerTests.swift 93.75% 2 Missing ⚠️
Blockchain/Sources/Blockchain/Blockchain.swift 66.66% 1 Missing ⚠️
...ain/Sources/Blockchain/Validator/ServiceBase.swift 66.66% 1 Missing ⚠️
Utils/Sources/Utils/EventBus/EventBus.swift 88.88% 1 Missing ⚠️
...ls/Sources/Utils/EventBus/EventSubscriptions.swift 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #396      +/-   ##
==========================================
- Coverage   73.16%   72.14%   -1.03%     
==========================================
  Files         441      440       -1     
  Lines       37440    37255     -185     
==========================================
- Hits        27392    26876     -516     
- Misses      10048    10379     +331     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xlc added 9 commits March 9, 2026 09:57
… executable resolution

Removed `SandboxPoolTests.swift` as these tests were previously disabled due to unresolved issues with the sandbox's async runtime in forked child processes.

Refactored `ChildProcessManagerTests.swift` to streamline the resolution of the sandbox executable. The test now directly utilizes `SandboxExecutableResolver` instead of manual environment variable management and a custom helper function.

Added an assertion for a zero exit code to `waitForExit` in `ChildProcessManagerTests` for increased test robustness.
Removes the parity test for the JumpInd instruction (Opcode 50) from JITControlFlowTests.swift.
The sandbox can now be spawned in a "single-shot" mode, where it processes a single IPC request and then exits. This is enabled by passing the `--single-shot` argument to the sandbox executable.

Refactored `ChildProcessManager` to correctly handle command-line arguments for spawned processes using duplicated C strings, ensuring safe and robust `execvp`/`posix_spawnp` calls.

`ExecutorFrontendSandboxed` now utilizes the single-shot mode for sandbox workers, improving resource management and simplifying cleanup by expecting the sandbox to self-terminate.

Updated `IPCServer` and `Sandbox/main.swift` to support and enable the single-shot behavior.

Added a new test `singleShotSandboxExitsAfterReply` to verify the functionality of the single-shot mode.
Modified `JITControlFlowTests` to test jump semantics more directly by jumping to a trap, rather than relying on gas exhaustion in a self-loop.
Removed `sleepForCleanupDelay` and streamlined cleanup logic in `ExecutorFrontendSandboxed`.
Moves connection closing, opening, and marking as closing operations outside of critical sections (e.g., `connections.write` closures).
This prevents potential deadlocks or unexpected behavior that could arise from performing these operations while holding a lock.
Affected functions include `addConnection` and `PeerEventHandler.onConnectionOpened`.
@xlc xlc merged commit a2cdf03 into master Mar 10, 2026
20 of 21 checks passed
@xlc xlc deleted the timeout branch March 10, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant