PRO-3800 Fix abort outcome completion in Docker and Tarball R overseers#446
PRO-3800 Fix abort outcome completion in Docker and Tarball R overseers#446YaroslavTir wants to merge 1 commit into
Conversation
| return new ExecutionOutcome(Stage.INITIALIZE, throwable.getMessage(), out); | ||
| outcome.complete(new ExecutionOutcome(Stage.INITIALIZE, throwable.getMessage(), out)); | ||
| } else { | ||
| executor.scheduleWithFixedDelay(this::writeLogs, updateInterval, updateInterval, TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
If an exception happens here or anywhere above outcome.complete, probably we would not get an outcome with this exception. Wrap with try catch to handle this case please.
| } | ||
| } else { | ||
| if (result.isDone()) { | ||
| return result.join(); |
There was a problem hiding this comment.
maybe lets retain the determinism of this scenario to avoid a race condition? CompletableFuture.complete() is a no-op if already completed, so outcome.join() may return either the ABORTED outcome or the natural COMPLETED/EXECUTE outcome depending on timing. The original explicitly handled the "already done" case (if (result.isDone()) return result.join();). The new logic loses that determinism — there's an inherent race between waitContainerCmd completing and the abort completing outcome.
There was a problem hiding this comment.
fixed: exit code 0 always yields COMPLETED regardless of abort timing. For non-zero codes the race is inherent and both ABORTED/EXECUTE are acceptable outcomes.
| public CompletableFuture<ExecutionOutcome> abort() { | ||
| if (process.isAlive()) { | ||
| log.info("Overseer [{}] processing abort request", id); | ||
| if (terminate()) { |
There was a problem hiding this comment.
boolean dead = terminate();
if (dead) {
outcome.complete(new ExecutionOutcome(Stage.ABORTED, null, stdout.toString()));
} else {
callback.accept(Stage.ABORT, "Timed out waiting for termination");
outcome.complete(new ExecutionOutcome(Stage.ABORT, "Timed out waiting for termination", stdout.toString()));
}
what do you think?
…rt support and integration test - Replace shared-executor blocking wait with private 2-thread ScheduledThreadPoolExecutor: one thread blocks on waitContainerCmd, one runs the periodic log flush - Add volatile aborting flag; abort() sets flag + stops container asynchronously, returns immediate ABORT ack; outcome resolves ABORTED when container exits - TarballROverseer: add aborting flag; complete outcome on kill timeout; produce ABORTED stage when process exits after abort was requested - Add DockerOverseerIntegrationTest (skipped when Docker is unavailable)
DockerOverseer and TarballROverseer now complete the ABORTED outcome
before attempting to stop the container/process, not after.
Previously, if stopping failed or timed out the outcome future could be
left unresolved, blocking the caller. The fix makes abort completion
unconditional — cleanup is best-effort.
Test plan
received promptly
COMPLETED/EXECUTE outcomes