test(self-talk): Replace superstitious and timing-racy sleep() with proper primitives#5855
Open
alexw91 wants to merge 1 commit into
Open
test(self-talk): Replace superstitious and timing-racy sleep() with proper primitives#5855alexw91 wants to merge 1 commit into
alexw91 wants to merge 1 commit into
Conversation
…roper primitives Eight fork-based self-talk / IO tests used hardcoded sleep() calls as ad-hoc synchronization between the parent (TLS server) and child (TLS client) processes. Three categories of sleeps were present: 1. Superstitious pre-negotiate sleeps: "give the server a chance to listen". Unnecessary because the socketpair underlying the io_pair is valid from the moment of fork(); the kernel buffers writes until the parent starts reading. 2. Superstitious pre-cleanup sleeps: "give the server a chance to avoid a sigpipe". Unnecessary because SIGPIPE is ignored process-wide via signal(SIGPIPE, SIG_IGN) in s2n_io_pair_init, and the disposition is inherited across fork(). 3. Load-bearing sleeps implementing ad-hoc synchronization. These were either a race (parent's pre-SIGCONT sleep in nonblocking_test, which could fire before the child had called raise(SIGSTOP)) or a guess at how long the other process would take to reach a given state (broken_pipe_test's sleeps around the half-closed-pipe assertions). Both flavors are replaced with proper primitives. Changes per file: - s2n_self_talk_client_hello_cb_test.c: removed 2 superstitious sleeps (pre-negotiate and pre-cleanup). - s2n_self_talk_nonblocking_test.c: removed 2 superstitious pre-negotiate sleeps; replaced the parent's pre-SIGCONT sleep with waitpid(pid, &status, WUNTRACED). WUNTRACED causes waitpid to return when the child transitions to the stopped state, eliminating a real race where SIGCONT could arrive before SIGSTOP. - s2n_self_talk_broken_pipe_test.c: removed 2 superstitious sleeps; replaced the two load-bearing sleeps (child's sleep(2) for "let parent observe EPIPE" and parent's sleep(1) for "let child shut down read side") with a pair of control pipes. The child writes to ready_pipe after shutting down its read side; parent blocks on read(ready_pipe) before its s2n_send. Parent writes to done_pipe after completing its assertions; child blocks on read(done_pipe) before teardown. Fully deterministic handoff. - s2n_self_talk_session_id_test.c: removed 4 superstitious sleeps (1 pre-negotiate + 3 pre-cleanup). - s2n_self_talk_tls13_test.c: removed 2 superstitious sleeps. - s2n_self_talk_tls12_test.c: removed 2 superstitious sleeps. - s2n_self_talk_alerts_test.c: removed 1 superstitious pre-negotiate sleep (called 3 times via the outer loop). - s2n_mem_allocator_test.c: removed 2 superstitious sleeps (called twice via the outer loop). All eight tests were run 100x in a flakiness loop after changes with no failures (800 total runs). Runtime on an M4 MacBook Pro: - s2n_self_talk_client_hello_cb_test: 19.54s -> 2.81s (~7x faster) - s2n_self_talk_nonblocking_test: 19.25s -> 4.75s (~4x faster) - s2n_self_talk_broken_pipe_test: 11.36s -> 2.88s (~4x faster) - s2n_self_talk_tls12_test: 8.54s -> 5.41s (~1.6x faster) - s2n_self_talk_session_id_test: 6.12s -> 2.69s (~2.3x faster) - s2n_self_talk_tls13_test: 5.56s -> 4.08s (~1.4x faster) - s2n_self_talk_alerts_test: 5.76s -> 2.86s (~2.0x faster) - s2n_mem_allocator_test: 9.14s -> 5.45s (~1.7x faster) Refs: aws#5704
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal
Replace ad-hoc
sleep()synchronization with proper primitives across eight fork-based self-talk / IO tests, removing it entirely where it was superstitious and replacing it with deterministic synchronization where it was load-bearing.Why
See: #5704
Eight fork-based tests used hardcoded
sleep()calls between the parent (TLS server) and child (TLS client) processes. The sleeps fell into three categories:mock_clientbefores2n_negotiate. Unnecessary because the socketpair is valid from the moment offork()and the kernel buffers writes until the parent reads.signal(SIGPIPE, SIG_IGN)ins2n_io_pair_init, inherited acrossfork().Problems with this: runtime bloat (most tests had 2-8 seconds of sleep per run), and potential flakes on slow or loaded CI where the chosen sleep duration could be too short.
How
Superstitious sleeps removed. All pre-negotiate and pre-cleanup
sleep(1)calls inmock_client-style functions across all eight tests, replaced with a short comment explaining why the sleep was unnecessary.Load-bearing sleep in
s2n_self_talk_nonblocking_test.creplaced withwaitpid(pid, &status, WUNTRACED). The parent previouslysleep(1)'d beforekill(pid, SIGCONT)to give the child time toraise(SIGSTOP).WUNTRACEDcauseswaitpidto return when the child transitions to the stopped state, so the parent is guaranteed to only send SIGCONT after the child has stopped. This also fixes a latent race: on a loaded system the parent's SIGCONT could arrive before the child had called SIGSTOP.Load-bearing sleeps in
s2n_self_talk_broken_pipe_test.creplaced with a pair of control pipes. The test needs two process-to-process events:s2n_send(otherwise no EPIPE). Replaced withread(ready_pipe).s2n_shutdown). Replaced withread(done_pipe).Per-file breakdown:
s2n_self_talk_client_hello_cb_test.cs2n_self_talk_nonblocking_test.csleep(1)→waitpid WUNTRACEDs2n_self_talk_broken_pipe_test.csleep(2)/sleep(1)→ control pipess2n_self_talk_session_id_test.cs2n_self_talk_tls13_test.cs2n_self_talk_tls12_test.cs2n_self_talk_alerts_test.cs2n_mem_allocator_test.cCallouts
nanosleep()calls inself_talk_tls12_test,self_talk_tls13_test, ands2n_mem_allocator_testthat exercise the dynamic-record-size inactivity-timeout reset. Those genuinely need real elapsed time. A follow-up could replace them with a mock-clock override onconn->config->wall_clock, but that's a larger change worth doing separately.sleep(1)ins2n_locking_test.c, which is inside a#if !(S2N_OPENSSL_VERSION_AT_LEAST(1, 1, 0))compat block that only runs on libcrypto < 1.1.0. The sleep there gives a thread time to block on a mutex before asserting the mutex is held, and there's no cheap primitive to replace it in that context.WUNTRACEDchange innonblocking_testand the control-pipe change inbroken_pipe_testare correctness improvements, not just speedups. The original sleeps were races that happened to work in practice on fast machines but could fail under load.Testing
Ran each test 100 times in a tight loop after the changes. All 800 runs passed.
Runtime on an M4 MacBook Pro:
s2n_self_talk_client_hello_cb_tests2n_self_talk_nonblocking_tests2n_self_talk_broken_pipe_tests2n_self_talk_tls12_tests2n_self_talk_session_id_tests2n_self_talk_tls13_tests2n_self_talk_alerts_tests2n_mem_allocator_testRelated
Refs #5704