Skip to content

fix: serialize DPB access across pipelined encode slots#1

Open
urwrstkn8mare wants to merge 2 commits into
porkloin:pipeliningfrom
urwrstkn8mare:fix/dpb-pipelining-pr12
Open

fix: serialize DPB access across pipelined encode slots#1
urwrstkn8mare wants to merge 2 commits into
porkloin:pipeliningfrom
urwrstkn8mare:fix/dpb-pipelining-pr12

Conversation

@urwrstkn8mare

@urwrstkn8mare urwrstkn8mare commented Jun 5, 2026

Copy link
Copy Markdown

Follow-up against the branch for hgaiser#12 (pipelining). This branch is based on porkloin:pipelining, not my fork dev.

Problem

The depth-2 pipelining change gives each encode slot its own input image, command buffer, bitstream buffer, fence, and query pool. However, the DPB images and reference-tracking state still live on the encoder (dpb_images, current_dpb_slot, reference lists, etc.).

With two slots in flight, the next frame can be submitted before the previous frame has finished using or updating that shared DPB state. That can race reference reads/writes across submits. In a live stream this can show up as accumulating artifacts/stutter, and if the encode work wedges then the later drain waits forever on wait_for_fences(..., u64::MAX) — this results in a frozen stream that only updates when resumed.

Fix

Chain pipelined encode submits with a per-encoder timeline semaphore. Each encode submit signals the next timeline value, and each following encode submit waits on the previous value before its command buffer can execute.

That keeps DPB/reference access ordered on the GPU without blocking the CPU between submissions. Bitstream readback is still delayed and drained through the slot pipeline, so the useful part of the depth-2 pipeline remains: submit/readback overlap without racing the shared DPB.

This does not create true encode/encode overlap for normal dependent P-frames; N+1 still cannot safely encode before N has produced its reconstructed reference. It just expresses that dependency to Vulkan instead of relying on host-side waits.

Benchmarks

Benchmarked with moonshine's new moonshine-bench tool (hgaiser/moonshine#107): h264, vkcube as the test app, 40 s runs with 10 s warmup, RTX 5070 (driver 610.43.02). To build against current moonshine, each pixelforge variant had upstream main (v0.5.0) merged in locally — all three merge cleanly.

Variant 1080p60 total / encode (avg µs) 4K120 total / encode (avg µs)
upstream main (v0.5.0) 1583 / 1240 6450 / 5910
pipelining (hgaiser#12) 397 / 72 644 / 109
pipelining + this PR 386 / 70 665 / 111

Multi-run averages; run-to-run spread is ~±50 µs. All variants sustained the target frame rate (60 / 120 fps).

Takeaways:

  • The timeline-semaphore chaining in this PR costs nothing measurable compared to unsynchronized pipelining — identical within noise at both resolutions, while both retain the large reduction in per-frame blocking time over main.
  • With pipelining, the bench's per-frame total/encode stats measure CPU-blocking time (submit + drain of the prior slot), not GPU encode duration — the GPU still spends ~1.2 ms (1080p) / ~5.9 ms (4K) per frame, overlapped with the next frame's stages.

@urwrstkn8mare

Copy link
Copy Markdown
Author

there might be a better way to go about this so i'll try implementing that - then you can have a look at it because you probably know alot more about it. also lmk if ur benchmark suite is ready so i can test with that.

@urwrstkn8mare urwrstkn8mare marked this pull request as ready for review June 5, 2026 07:04
@urwrstkn8mare

Copy link
Copy Markdown
Author

just tested no more frame freezing after a while 🎊

urwrstkn8mare added a commit to urwrstkn8mare/pixelforge that referenced this pull request Jun 11, 2026
@urwrstkn8mare

Copy link
Copy Markdown
Author

@porkloin just added some benchmarks to the PR desc. - don't see any performance regressions from the fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant