fix: serialize DPB access across pipelined encode slots by urwrstkn8mare · Pull Request #1 · porkloin/pixelforge

urwrstkn8mare · 2026-06-05T05:32:19Z

Follow-up against the branch for hgaiser#12 (pipelining). This branch is based on porkloin:pipelining, not my fork dev.

Problem

The depth-2 pipelining change gives each encode slot its own input image, command buffer, bitstream buffer, fence, and query pool. However, the DPB images and reference-tracking state still live on the encoder (dpb_images, current_dpb_slot, reference lists, etc.).

With two slots in flight, the next frame can be submitted before the previous frame has finished using or updating that shared DPB state. That can race reference reads/writes across submits. In a live stream this can show up as accumulating artifacts/stutter, and if the encode work wedges then the later drain waits forever on wait_for_fences(..., u64::MAX) — this results in a frozen stream that only updates when resumed.

Fix

Chain pipelined encode submits with a per-encoder timeline semaphore. Each encode submit signals the next timeline value, and each following encode submit waits on the previous value before its command buffer can execute.

That keeps DPB/reference access ordered on the GPU without blocking the CPU between submissions. Bitstream readback is still delayed and drained through the slot pipeline, so the useful part of the depth-2 pipeline remains: submit/readback overlap without racing the shared DPB.

This does not create true encode/encode overlap for normal dependent P-frames; N+1 still cannot safely encode before N has produced its reconstructed reference. It just expresses that dependency to Vulkan instead of relying on host-side waits.

Benchmarks

Benchmarked with moonshine's new moonshine-bench tool (hgaiser/moonshine#107): h264, vkcube as the test app, 40 s runs with 10 s warmup, RTX 5070 (driver 610.43.02). To build against current moonshine, each pixelforge variant had upstream main (v0.5.0) merged in locally — all three merge cleanly.

Variant	1080p60 total / encode (avg µs)	4K120 total / encode (avg µs)
upstream `main` (v0.5.0)	1583 / 1240	6450 / 5910
`pipelining` (hgaiser#12)	397 / 72	644 / 109
`pipelining` + this PR	386 / 70	665 / 111

Multi-run averages; run-to-run spread is ~±50 µs. All variants sustained the target frame rate (60 / 120 fps).

Takeaways:

The timeline-semaphore chaining in this PR costs nothing measurable compared to unsynchronized pipelining — identical within noise at both resolutions, while both retain the large reduction in per-frame blocking time over main.
With pipelining, the bench's per-frame total/encode stats measure CPU-blocking time (submit + drain of the prior slot), not GPU encode duration — the GPU still spends ~1.2 ms (1080p) / ~5.9 ms (4K) per frame, overlapped with the next frame's stages.

urwrstkn8mare · 2026-06-05T06:40:38Z

there might be a better way to go about this so i'll try implementing that - then you can have a look at it because you probably know alot more about it. also lmk if ur benchmark suite is ready so i can test with that.

urwrstkn8mare · 2026-06-05T07:33:30Z

just tested no more frame freezing after a while 🎊

…ing)

urwrstkn8mare · 2026-06-11T05:09:03Z

@porkloin just added some benchmarks to the PR desc. - don't see any performance regressions from the fixes.

fix: serialize DPB access across pipelined encode slots

9c1021a

urwrstkn8mare mentioned this pull request Jun 5, 2026

feat: depth-2 encoder pipelining hgaiser/pixelforge#12

Open

urwrstkn8mare marked this pull request as draft June 5, 2026 06:39

perf: chain pipelined encodes with timeline semaphores

a941683

urwrstkn8mare marked this pull request as ready for review June 5, 2026 07:04

urwrstkn8mare added a commit to urwrstkn8mare/pixelforge that referenced this pull request Jun 11, 2026

Merge porkloin#1 (DPB serialization + timeline-semaphore encode chain…

ea2f454

…ing)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: serialize DPB access across pipelined encode slots#1

fix: serialize DPB access across pipelined encode slots#1
urwrstkn8mare wants to merge 2 commits into
porkloin:pipeliningfrom
urwrstkn8mare:fix/dpb-pipelining-pr12

urwrstkn8mare commented Jun 5, 2026 •

edited

Loading

Uh oh!

urwrstkn8mare commented Jun 5, 2026

Uh oh!

urwrstkn8mare commented Jun 5, 2026

Uh oh!

urwrstkn8mare commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

urwrstkn8mare commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Benchmarks

Uh oh!

urwrstkn8mare commented Jun 5, 2026

Uh oh!

urwrstkn8mare commented Jun 5, 2026

Uh oh!

urwrstkn8mare commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

urwrstkn8mare commented Jun 5, 2026 •

edited

Loading