Skip to content

feat: compress termination messages to fit more results in 4KB limit#9682

Merged
tekton-robot merged 1 commit into
tektoncd:mainfrom
waveywaves:feat/compress-termination-results
May 12, 2026
Merged

feat: compress termination messages to fit more results in 4KB limit#9682
tekton-robot merged 1 commit into
tektoncd:mainfrom
waveywaves:feat/compress-termination-results

Conversation

@waveywaves
Copy link
Copy Markdown
Member

@waveywaves waveywaves commented Mar 28, 2026

Changes

Add optional flate compression for termination messages, gated behind the
enable-termination-message-compression alpha feature flag. When enabled, results
are compressed with Go stdlib compress/flate and base64-encoded before writing
to the termination message path, effectively increasing the usable capacity from
~33 to ~187 results (5.7x) for typical image digest payloads.

The problem

Kubernetes limits termination messages to 4KB per container, and the total across
all containers in a pod is capped at 12KB (divided by container count). With multiple
steps, each container may get as little as 1KB for results (#4808). The sidecar logs
method (TEP-0127) bypasses this but adds a sidecar container to every TaskRun,
which is wasteful when most tasks don't need large results (#8448).

The solution

Compress the JSON payload before writing to the termination message path. The parser
auto-detects compressed messages via a tknz: prefix, maintaining full backward
compatibility with uncompressed messages.

Step writes result → entrypoint reads → JSON → flate compress → base64 → termination message
                                                                              ↓
Reconciler reads ← JSON ← flate decompress ← base64 decode ← termination message

E2E test results (kind cluster, k8s v1.29.2)

Results Without Compression With Compression
30 3776 bytes ✅
35 >4096 bytes ❌ 899 bytes ✅
100 2241 bytes ✅
150 3307 bytes ✅
180 3952 bytes ✅
185 4061 bytes ✅
200 >4096 bytes ❌

Max capacity: ~33 → ~187 results (5.7x improvement)

Implementation

  • Feature flag: enable-termination-message-compression (alpha, default: false)
    • Uses PerFeatureFlag with AlphaAPIFields stability
    • Documented in config/config-feature-flags.yaml
    • Only meaningful when results-from: termination-message (default)
  • Write path: pkg/termination/write.go compresses with flate.BestCompression + base64.RawStdEncoding, prefixed with tknz:
    • Smart fallback: if compressed output is larger than plain JSON, falls back automatically
    • Fixed pre-existing O_TRUNC bug in file rewrite
  • Parse path: pkg/termination/parse.go auto-detects tknz: prefix, decompresses transparently
  • Decompression safety: io.LimitReader with 1MB cap + explicit size check
  • Entrypoint wiring: ConfigMap → pkg/pod/pod.gocmd/entrypoint/main.goEntrypointer
  • Zero new dependencies — Go stdlib only

Test coverage

  • 18 unit tests: round-trip, size comparison, append, auto-detect, corrupt data, capacity benchmark
  • Feature flag tests (default, all-flags-set, invalid value)
  • Example TaskRun and Pipeline in examples/v1/*/alpha/

Related Issues

/kind feature

Submitter Checklist

  • Has Docs if any changes are user facing
  • Has Tests included if any functionality added or changed
  • pre-commit Passed
  • Follows the commit message standard
  • Meets the Tekton contributor standards
  • Has a kind label
  • Release notes block below has been updated

Release Notes

Added optional termination message compression (alpha feature flag
enable-termination-message-compression) that uses flate compression to fit
approximately 5.7x more results in the 4KB Kubernetes termination message limit.
The parser auto-detects compressed messages for full backward compatibility.
Zero new dependencies — uses Go stdlib only.

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Mar 28, 2026
@tekton-robot tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 28, 2026
@waveywaves
Copy link
Copy Markdown
Member Author

E2E Test Results — Kind Cluster (k8s v1.29.2)

Deployed Tekton from this branch via ko apply with KO_DOCKER_REPO=kind.local.

Without compression (baseline)

Results Termination Message Status
30 3776 bytes PASS
35 >4096 bytes FAILTermination message is above max allowed size 4096, caused by large task result.

Max capacity without compression: ~33 results

With compression (enable-termination-message-compression: "true")

Results Termination Message Status
35 899 bytes PASS
100 2241 bytes PASS
150 3307 bytes PASS
175 3849 bytes PASS
180 3952 bytes PASS
185 4061 bytes PASS
200 >4096 bytes FAIL

Max capacity with compression: ~187 results

Summary

Result type: ~100 byte image digest (gcr.io/project/image-NNN@sha256:64-hex-chars)
Without compression: ~33 results
With compression:    ~187 results
Improvement:         5.7x capacity increase

All results verified intact after round-trip (first/last result values checked, result count matched). The tknz: prefix was present in the termination message, confirming compression was active.

@waveywaves waveywaves requested review from aThorp96 and vdemeester and removed request for pritidesai March 28, 2026 16:39
Copy link
Copy Markdown
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things to address before this can move forward.

Should fix

  1. No e2e test with the flag enabled. The unit tests in pkg/termination/ are thorough, but there's no integration test proving the full loop on a real cluster: feature flag → pod builder → entrypoint → compressed write → reconciler parse → results in .status.taskResults. At minimum, add one test to the alpha e2e suite that creates a TaskRun with enough results to exceed 4KB uncompressed and asserts they all appear in status.

  2. No tests in pkg/pod/pod_test.go or pkg/entrypoint/entrypointer_test.go for the new flag plumbing. The termination package is well-covered but the integration points aren't.

  3. OpenAPI spec not regenerated. EnableTerminationMessageCompression was added to the FeatureFlags struct but openapi_generated.go wasn't updated. Run ./hack/update-openapigen.sh.

  4. CRD schema entries missing description. The enableTerminationMessageCompression field is added in 8 places across the CRD YAMLs but without a description, unlike all other feature flags there.

Strategic context

I want to be transparent about the bigger picture here. There is a TEP in progress (TEP-0164) that addresses the root cause — storing results in external backends (OCI/S3/GCS) so the 4KB limit doesn't apply at all. That TEP also unifies results with TEP-0147 artifacts and converges with TEP-0139 trusted artifacts.

That said, TEP-0164 is still in proposal stage and won't land for a while. Users are hitting the 4KB wall today (as the issues you reference show), and this is a reasonable opt-in stop-gap: zero new deps, backward compatible, meaningful capacity improvement.

I'd suggest two things to make the relationship clear:

  1. Add a note to the feature flag description (in config-feature-flags.yaml and docs) that this is a short-term measure, and that external result storage (TEP-0164) will address the underlying limitation.
  2. Reference TEP-0164 in the PR description alongside the other related issues, so the community can see the full picture.

With the technical items above addressed, I'm supportive of this going in as an alpha feature.

Comment thread pkg/pod/pod.go
Comment thread pkg/entrypoint/entrypointer.go Outdated
Comment on lines +204 to +210
var wErr error
if e.CompressTerminationMessage {
wErr = termination.WriteCompressedMessage(e.TerminationPath, output)
} else {
wErr = termination.WriteMessage(e.TerminationPath, output)
}
if wErr != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must fix: This exact if compress { WriteCompressed } else { Write } block is duplicated at L478-485 in readResultsFromDisk. Extract a helper to avoid divergence:

func (e Entrypointer) writeTerminationMessage(path string, results []result.RunResult) error {
	if e.CompressTerminationMessage {
		return termination.WriteCompressedMessage(path, results)
	}
	return termination.WriteMessage(path, results)
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Extracted writeTerminationMessage helper method. Both call sites now use it. Added TestWriteTerminationMessage_Compression with 2 subtests (plain vs compressed output).

Comment thread go.mod
github.com/prometheus/client_golang v1.23.2 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.67.5 // indirect
github.com/prometheus/common v0.67.5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Unrelated change — github.com/prometheus/common went from // indirect to direct but nothing in the codebase imports it. Looks like an accidental go mod tidy artifact, please revert.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

Comment thread pkg/termination/parse.go Outdated
if err := json.Unmarshal([]byte(msg), &r); err != nil {
return nil, fmt.Errorf("parsing message json: %w, msg: %s", err, msg)
if err := json.Unmarshal([]byte(jsonMsg), &r); err != nil {
return nil, fmt.Errorf("parsing message json: %w, msg: %s", err, jsonMsg)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: On JSON parse failure after decompression, this includes the full decompressed payload in the error message. For large payloads that could produce enormous log entries. Consider truncating to first N bytes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Truncated to 256 bytes with ...(truncated) suffix.

Comment thread pkg/apis/config/feature_flags.go Outdated
Comment on lines +234 to +235
EnableWaitExponentialBackoff bool `json:"enableWaitExponentialBackoff,omitempty"`
EnableTerminationMessageCompression bool `json:"enableTerminationMessageCompression,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Inconsistent alignment — these two fields have extra whitespace padding that doesn't match the rest of the struct.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@waveywaves waveywaves force-pushed the feat/compress-termination-results branch from e4cec69 to bf1e5f0 Compare March 31, 2026 09:02
@waveywaves waveywaves marked this pull request as ready for review April 2, 2026 14:00
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2026
@waveywaves waveywaves force-pushed the feat/compress-termination-results branch from bf1e5f0 to b57cd0e Compare April 2, 2026 23:08
@vdemeester vdemeester self-assigned this Apr 7, 2026
Comment thread pkg/termination/write.go Outdated

// maxDecompressedSize is the upper bound for decompressed termination messages.
// The pre-compression JSON was at most 4KB, but with compression we allow
// larger payloads. 1MB is a safe limit that prevents decompression bombs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't 1MB too generous? How much max payload are we expecting? 128 KB? Or is 1 MB really possible except in case of attack?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The pre-compression input is capped at 4KB by Kubernetes (termination message limit). With flate, the theoretical max decompressed output from 4KB compressed is about 64KB for typical JSON.

1MB is a safety net against decompression bombs. 128KB would work for legitimate use — happy to lower it. The key constraint: must be bounded (not unlimited) and larger than any legitimate 4KB-compressed payload could produce.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put it to 128 KB

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — changed to 128KB.

Comment thread pkg/termination/write.go
@waveywaves waveywaves force-pushed the feat/compress-termination-results branch from b57cd0e to 23c80a4 Compare April 9, 2026 09:47
@waveywaves
Copy link
Copy Markdown
Member Author

/retest

@waveywaves waveywaves force-pushed the feat/compress-termination-results branch from 23c80a4 to bf8451c Compare April 14, 2026 09:33
@tekton-robot tekton-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 14, 2026
@waveywaves waveywaves force-pushed the feat/compress-termination-results branch from bf8451c to 9a62b83 Compare April 16, 2026 11:12
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 16, 2026
@waveywaves
Copy link
Copy Markdown
Member Author

/retest

Copy link
Copy Markdown
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @waveywaves - it looks good, just a few minor comments.

/approve

Comment thread pkg/pod/pod.go
Comment thread examples/v1/taskruns/no-ci/compressed-results.yaml Outdated
Comment thread pkg/pod/pod.go
@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 20, 2026
@waveywaves waveywaves force-pushed the feat/compress-termination-results branch from 9a62b83 to eb2d027 Compare May 11, 2026 10:32
@waveywaves
Copy link
Copy Markdown
Member Author

@khrm addressed all review feedback, ready for another look when you get a chance

@tekton-robot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [afrittoli,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vdemeester vdemeester added this to the v1.13.0 milestone May 11, 2026
@waveywaves
Copy link
Copy Markdown
Member Author

/retest

1 similar comment
@waveywaves
Copy link
Copy Markdown
Member Author

/retest

Copy link
Copy Markdown
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2026
@khrm
Copy link
Copy Markdown
Contributor

khrm commented May 12, 2026

/retest

@tekton-robot tekton-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 12, 2026
@waveywaves waveywaves force-pushed the feat/compress-termination-results branch 3 times, most recently from b28f312 to b0006b2 Compare May 12, 2026 07:40
When enable-termination-message-compression is enabled, the entrypoint
compresses results with zlib before writing to the termination message.
The reconciler auto-detects compressed vs plain JSON for backward
compatibility. Increases effective capacity from ~33 to ~187 results
(5.7x improvement).

- Gated behind enable-termination-message-compression PerFeatureFlag
  (alpha stability)
- No effect when results-from is set to sidecar-logs (logs warning)
- Max decompressed size capped at 128KB
- Feature flag documented in additional-configs.md
- Alpha e2e test setup enables the flag

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@waveywaves waveywaves force-pushed the feat/compress-termination-results branch from b0006b2 to 1ca2b4e Compare May 12, 2026 08:58
Copy link
Copy Markdown
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2026
@waveywaves
Copy link
Copy Markdown
Member Author

/hold

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2026
@khrm
Copy link
Copy Markdown
Contributor

khrm commented May 12, 2026

/hold

@waveywaves
Copy link
Copy Markdown
Member Author

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2026
@tekton-robot tekton-robot merged commit 872f5a3 into tektoncd:main May 12, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants