Skip to content

actions/artifact/cache/restore: Add retry with exponential backoff for TCP timeouts#3765

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-tcp-timeout-artifact-download
Draft

actions/artifact/cache/restore: Add retry with exponential backoff for TCP timeouts#3765
Copilot wants to merge 3 commits intomainfrom
copilot/fix-tcp-timeout-artifact-download

Conversation

Copy link
Contributor

Copilot AI commented Feb 27, 2026

gh api | tar has no built-in retry; a mid-stream TCP timeout produces a truncated archive and a hard failure with no recovery path.

Changes

  • Retry loop — wraps the gh api | tar pipeline in up to max-retries attempts, exiting immediately on success
  • Exponential backoff — configurable initial retry-delay (default 10s), doubling each attempt
  • Partial extraction cleanupfind -mindepth 1 -delete between retries removes partially-written files including dotfiles
  • Annotations::warning:: on non-final failures; ::error:: after all retries exhausted
  • Configurable via inputsmax-retries (default: 3) and retry-delay (default: 10) are action inputs passed through env:, consistent with existing variable handling
- uses: envoyproxy/toolshed/actions/github/artifact/cache/restore@...
  with:
    max-retries: 5      # optional, default 3
    retry-delay: 15     # optional, default 10s
Original prompt

Problem

The actions/github/artifact/cache/restore action is failing intermittently in Envoy CI with TCP connection timeouts during artifact download:

read tcp 10.1.0.252:35018->20.209.113.193:443: read: connection timed out
/*stdin*\ : Read error (39) : premature end 
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
Error: Process completed with exit code 2.

The root cause is that the gh api call in actions/github/artifact/cache/restore/action.yml streams the artifact download directly into tar via a pipe. When the TCP connection times out mid-stream, tar receives a truncated archive and dies with "Unexpected EOF in archive." There is no retry — gh api has no --retry flag (cli/cli#7533).

Proposed Fix

Wrap the existing gh api | tar pipeline in a bash retry loop with exponential backoff. This adds zero overhead on the happy path (first attempt succeeds, exits immediately) while making transient network failures recoverable.

In actions/github/artifact/cache/restore/action.yml, replace the current run: block (lines 42-51) with:

MAX_RETRIES=3
RETRY_DELAY=10
for attempt in $(seq 1 "$MAX_RETRIES"); do
    echo "Attempt $attempt of $MAX_RETRIES"
    if gh api \
        -H "Accept: application/vnd.github+json" \
        -H "X-GitHub-Api-Version: 2022-11-28" \
        "/repos/${REPOSITORY}/actions/artifacts/${ARTIFACT_ID}/zip" \
        | tar --warning=no-timestamp \
              --keep-directory-symlink \
              -xI unzstd \
              -f - \
              -C ${OUTPUT_PATH}; then
        echo "Cache restored successfully"
        exit 0
    fi
    echo "::warning::Attempt $attempt failed, retrying in ${RETRY_DELAY}s..."
    # Clean up partial extraction before retrying
    rm -rf "${OUTPUT_PATH:?}"/*
    sleep "$RETRY_DELAY"
    RETRY_DELAY=$((RETRY_DELAY * 2))
done
echo "::error::Failed to restore artifact cache after $MAX_RETRIES attempts"
exit 1

Key design decisions:

  • Retry the whole pipe from scratch — no download-to-file intermediary (that would make it 2+x slower on every CI run for something that only fails occasionally)
  • Clean up partial extraction between retries since tar may have written partial files before the connection died (rm -rf "${OUTPUT_PATH:?}"/*)
  • Exponential backoff (10s → 20s → 40s) gives transient network issues time to resolve
  • ::warning:: annotations make retries visible in the Actions log without being noisy
  • ::error:: annotation on final failure for clear diagnostics

The env vars (GH_TOKEN, ARTIFACT_ID, OUTPUT_PATH, REPOSITORY) remain unchanged.

This pull request was created from Copilot chat.


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

@netlify
Copy link

netlify bot commented Feb 27, 2026

Deploy Preview for nifty-bassi-e26446 ready!

Name Link
🔨 Latest commit 347b26e
🔍 Latest deploy log https://app.netlify.com/projects/nifty-bassi-e26446/deploys/69a17a15dd5ac400088d0f9a
😎 Deploy Preview https://deploy-preview-3765--nifty-bassi-e26446.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix TCP connection timeout during artifact download actions/artifact/cache/restore: Add retry with exponential backoff Feb 27, 2026
Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Copilot AI changed the title actions/artifact/cache/restore: Add retry with exponential backoff actions/artifact/cache/restore: Add retry with exponential backoff for TCP timeouts Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants