Skip to content

PRO-3800 Fix abort outcome completion in Docker and Tarball R overseers#446

Open
YaroslavTir wants to merge 1 commit into
masterfrom
PRO-3800
Open

PRO-3800 Fix abort outcome completion in Docker and Tarball R overseers#446
YaroslavTir wants to merge 1 commit into
masterfrom
PRO-3800

Conversation

@YaroslavTir

Copy link
Copy Markdown
Collaborator

DockerOverseer and TarballROverseer now complete the ABORTED outcome
before attempting to stop the container/process, not after.

Previously, if stopping failed or timed out the outcome future could be
left unresolved, blocking the caller. The fix makes abort completion
unconditional — cleanup is best-effort.

Test plan

  • Cancel a running R/Docker submission — verify ABORTED status is
    received promptly
  • Cancel a running R/Tarball submission — same
  • Let a job complete normally — verify no regression in
    COMPLETED/EXECUTE outcomes

@YaroslavTir YaroslavTir added bug java Pull requests that update java code labels Jun 9, 2026
return new ExecutionOutcome(Stage.INITIALIZE, throwable.getMessage(), out);
outcome.complete(new ExecutionOutcome(Stage.INITIALIZE, throwable.getMessage(), out));
} else {
executor.scheduleWithFixedDelay(this::writeLogs, updateInterval, updateInterval, TimeUnit.MILLISECONDS);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an exception happens here or anywhere above outcome.complete, probably we would not get an outcome with this exception. Wrap with try catch to handle this case please.

}
} else {
if (result.isDone()) {
return result.join();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe lets retain the determinism of this scenario to avoid a race condition? CompletableFuture.complete() is a no-op if already completed, so outcome.join() may return either the ABORTED outcome or the natural COMPLETED/EXECUTE outcome depending on timing. The original explicitly handled the "already done" case (if (result.isDone()) return result.join();). The new logic loses that determinism — there's an inherent race between waitContainerCmd completing and the abort completing outcome.

@YaroslavTir YaroslavTir Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed: exit code 0 always yields COMPLETED regardless of abort timing. For non-zero codes the race is inherent and both ABORTED/EXECUTE are acceptable outcomes.

public CompletableFuture<ExecutionOutcome> abort() {
if (process.isAlive()) {
log.info("Overseer [{}] processing abort request", id);
if (terminate()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean dead = terminate();
if (dead) {
outcome.complete(new ExecutionOutcome(Stage.ABORTED, null, stdout.toString()));
} else {
callback.accept(Stage.ABORT, "Timed out waiting for termination");
outcome.complete(new ExecutionOutcome(Stage.ABORT, "Timed out waiting for termination", stdout.toString()));
}

what do you think?

@oleg-odysseus oleg-odysseus left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments

…rt support and integration test

- Replace shared-executor blocking wait with private 2-thread ScheduledThreadPoolExecutor:
  one thread blocks on waitContainerCmd, one runs the periodic log flush
- Add volatile aborting flag; abort() sets flag + stops container asynchronously, returns
  immediate ABORT ack; outcome resolves ABORTED when container exits
- TarballROverseer: add aborting flag; complete outcome on kill timeout; produce ABORTED
  stage when process exits after abort was requested
- Add DockerOverseerIntegrationTest (skipped when Docker is unavailable)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug java Pull requests that update java code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants