New demo: video encoding #456

jholveck · 2026-01-17T01:47:47Z

Changes proposed in this PR

It seems that there are a number of users who use MSS for video encoding. Add a demo showing how to do this.

This again highlights multithreaded pipelining, like the TinyTV demo, but is more accessible (as few users have a TinyTV). It also shows a number of pitfalls that are common when encoding video, such as failing to correctly record timestamps.

Tests added/updated - N/A
Documentation updated
Changelog entry added
./check.sh passed

I'll probably break this into a simple and advanced version too. I may have to take out the audio code. This also currently uses some of my work in the (unmerged) feat-buffer branch, so I'll need to switch it to use what's available now.

Very much incomplete, sometimes stopping mid-sentence. But I've written enough that I don't want to lose it, so here's an intermediate commit.

Also reformat the comments.

BoboTiG · 2026-01-17T04:49:06Z

Nice!!

BoboTiG · 2026-01-17T05:06:22Z

This is really good information here, priceless!

If I wanted to use h265, is it as easy as serting "h265", or does it need special handling?

jholveck · 2026-01-17T09:18:24Z

You could pass --codec libx265 on the command line. The thing you use there is the same as ffmpeg's -c:v flag.

You can get a list of available codecs in the PyAV build using python3 -m av --codecs; libx265 is among those.

You'd need to comment out the "profile":"high" in CODEC_OPTIONS: libx265 doesn't recognize the high profile. Most, if not all, of the features in the H.264 "high" profile are already part of the H.265 "main" (default) profile.

You can look at other flags for libx265 using ffmpeg --help encoder=libx265, if your ffmpeg build has libx265 compiled in. The libx265 library is GPL-only, so some builds might not include it, but the one included in PyAV does.

I'll add some comments to this effect.

demos/video-capture-simple.py

bboudaoud-nv · 2026-01-20T15:32:54Z

This is really good information here, priceless!

If I wanted to use h265, is it as easy as serting "h265", or does it need special handling?

@BoboTiG codec names can be a bit finicky here. IIRC on Linux x265 should work for encode. On Windows I believe its hevc. The optional parameters passed in, as well as the supported encoding formats for frames may change with the codec depending on the system though. Currently h264 encoding is definitely the safe choice for multi-platform.

You could also experiment with codec names like mpeg or h264 to see if you get better multiplatform support here.

BoboTiG · 2026-01-20T16:46:28Z

Currently h264 encoding is definitely the safe choice for multi-platform.

Fully agree on that.

Thank you for the useful review :)

BoboTiG · 2026-01-20T19:40:50Z

@jholveck should we do checks for third-party modules?

I was testing, and I was missing multiple modules, I needed to look on PyPI, a process which could be improved.

Should we check at import time, something like this?

try:
    import av
except ImportError:
    print("The PyAv module is missing, run: `python -m pip install av`")
    sys.exit(1)

try:
    import si_prefix
except ImportError:
    print("The si-prefix module is missing, run: `python -m pip install si-prefix`")
    sys.exit(1)

This feels like a horrible solution haha, I'm simply putting words on a idea.

BoboTiG · 2026-01-20T19:42:31Z

Overall, it works great! I see a small difference in colors, on Linux, but I guess this is due to the JPEG compression.

jholveck · 2026-01-20T20:40:06Z

Overall, it works great! I see a small difference in colors, on Linux, but I guess this is due to the JPEG compression.

You might try turning on DISPLAY_IS_SRGB.

jholveck · 2026-01-20T21:23:15Z

@jholveck should we do checks for third-party modules?

I think that doing explicit checks like that might be more mess than success. But I can easily add a comment above where we import third-party modules giving the right pip command to install them.

jholveck · 2026-01-20T21:29:58Z

Overall, it works great! I see a small difference in colors, on Linux, but I guess this is due to the JPEG compression.
You might try turning on DISPLAY_IS_SRGB.

By the way, this relates to #207. I think that tagging screenshots with the display's colorspace would be a useful future addition to MSS, for just this sort of thing.

BoboTiG · 2026-01-20T22:30:28Z

@jholveck should we do checks for third-party modules?

I think that doing explicit checks like that might be more mess than success. But I can easily add a comment above where we import third-party modules giving the right pip command to install them.

Yes, lets keep it simple then: at the top of the file, one line to install everything.

BoboTiG · 2026-01-20T22:31:48Z

Out of curiosity, do you plan to add more stuff in that PR? Wondering if you keep it as a draft for a specific reason :)

jholveck · 2026-01-21T00:46:23Z

Out of curiosity, do you plan to add more stuff in that PR? Wondering if you keep it as a draft for a specific reason :)

Not specifically, but I've asked my colleague, @bboudaoud-nv , to review this. He's worked with a number of internal devs who use MSS and other libraries for video encoding, so I wanted to get any insights he might have. He's already provided comments on the simple demo; I've asked him to also look at the full one too.

Once he's done his review, I expect that I'll be ready for your final review and commit.

Edit: Actually, I'll probably also fix what's needed to make it Windows-compatible, as he mentioned in his review comments.

BoboTiG · 2026-01-21T09:15:33Z

Overall, it works great! I see a small difference in colors, on Linux, but I guess this is due to the JPEG compression.

You might try turning on DISPLAY_IS_SRGB.

Actually, DISPLAY_IS_SRGB being True or False changes nothing. and that's OK, I was just noting the fact. No need to provide a script to handle all cases.

bboudaoud-nv · 2026-01-21T16:20:13Z

demos/video-capture.py

+# Timestamps (PTS/DTS)
+# --------------------
+#
+# Every frame has a *presentation timestamp* (PTS): when the viewer should see it.


Might be worth noting a PTS is an integer value scaled by the time base (and maybe introduce time base before this section for logical flow)

bboudaoud-nv · 2026-01-21T16:21:14Z

demos/video-capture.py

+# Constant Frame Rate (CFR) and Variable Frame Rate (VFR)
+# -------------------------------------------------------
+#
+# Many video files run at a fixed frame rate, like 30 fps.  Each frame is shown at 1/30 sec intervals.  This is called


Might be good to point out in cases like this the time base can just be the frame rate/time and then the PTS values are simply integers increasing by 1 at each frame

bboudaoud-nv · 2026-01-21T16:22:27Z

demos/video-capture.py

+
+    # Keep running this loop until the main thread says we should stop.
+    while not shutdown_requested.is_set():
+        # Wait until we're ready.  This should, ideally, happen every 1/fps second.


Same as other script I'd be tempted to capture then sleep for the remaining time to retain per-call perf/avoid slowing over the target rate.

I don't quite get that concept. Can you elaborate on the difference?

One reason for doing the sleep right before the capture is because the capture is what we want to do at precise intervals. Putting the sleep here means that jitter in the time taken by other steps (such as blocking in the yield for the mailbox to become empty) doesn't translate into the capture interval.

I'm still not clear on why you are saying that this would impact per-call perf and slow the overall rate. If we were sleeping 1/30 sec each time, I'd agree with you, but here we're sleeping until 1/30 sec since the previous frame's target time. Is there something I'm missing?

This is the same idea as above, the goal for video is to have each frame be "captured" 1 / fps apart. The idea that you fall behind 1-off then get many short frames is less desirable than simply resuming regularly spaced frames after. As you point out technically which side of the capture this is on isn't all that important. But if you want to maintain frame rate as best as possible I'd do something like:

dt = 1 / fps while not shutdown_requested.is_set(): now= time.time() screenshot = sct.grab() dur = time.time() - tstart yield screenshot, now remaining = dt - dur if remaining > 0: time.sleep(remaining)

This way each frame is timed independently and one long frame doesn't cause impacts on other frames. (e.,g., the fps will resume/not "catch up" on the other side of the hitch)

One thing I'm assuming here is that there is more runtime variation in the sct.grab() call duration than there is in the sleep() call here. That may not be true on all platforms.

demos/video-capture.py

bboudaoud-nv · 2026-01-21T16:27:55Z

demos/video-capture.py

+        if first_frame_at is None:
+            first_frame_at = timestamp
+        frame.pts = int((timestamp - first_frame_at) / TIME_BASE)
+        frame.time_base = TIME_BASE


I don't believe you need to set the individual frame time bases here, just the PTS value. In fact I think you may even be able to set the frame.time and let it handle the conversion to PTS here, but I haven't tried that lately.

You're absolutely correct that you can set just the PTS value. If the frame doesn't have a time_base set, then PyAV will assume the destination stream's time_base when it encodes the frame. (This is a feature of PyAV, not ffmpeg, as far as I can tell.)

However, you do need to set the time base if you want to use frame.time, which can be quite handy for debugging. It's also necessary if you want to, for instance, put your frame times in nanoseconds (for simplicity) and just have them get converted to the stream time_base at encoding time; that's what I've done in some code I've written.

It's not mandatory, but it does seem to be a reasonable practice, rather than relying on the default "whatever I get put into" behavior. Thanks for the point, but I think I'll leave it as-is. I've gotten burned in the past by not setting the time_base in all the right places, so I think implicit is reasonable here.

FWIW, you can't set frame.time.

bboudaoud-nv · 2026-01-21T16:29:31Z

demos/video-capture.py

+            # The rate= parameter here is just the nominal frame rate: some tools (like file browsers) might display
+            # this as the frame rate.  But we actually control timing via the pts and time_base values on the frames
+            # themselves.
+            video_stream = avmux.add_stream(codec, rate=fps, options=CODEC_OPTIONS)


The use of the rate arg here with the fps may in some cases set a different time base/assume a CFR encoding. I tend to avoid rate here as its not required.

Can you elaborate on those circumstances? I haven't seen that happen when I also provide an explicit time base on the stream, as long as I assign the time base after setting the rate.

As the comment mentions, the rate= parameter does set a nominal frame rate for the file, which some tools use for quick display or estimation, or to convert to CFR. It seems to make sense to include that in the metadata if possible. But if it causes problems, then yeah, I'll eliminate it.

I think this is more confusing to folks than actually a problem. I believe what happens here is the time base gets set using rate then this just gets overridden later in the process and PTS values are assigned to each frame. I'm not sure if adding rate to this constructor has any other effect to the encoded video.

It's hard for me to tell quite when it's used, but it's in places other than the time base.

For instance, it's definitely written to the Matroska (MKV) headers, if you're using that format, hence my comment about it setting a nominal frame rate that "might" be displayed. (Try ./video-capture.py --fps 123 --output foo.mkv --duration 10 && ffprobe foo.mkv, and note that "123" is in the nominal frame rate display shown there.) I think it might be used in an MP4's tmcd atom, but I can't really tell how. It's like a lot of things in ffmpeg: it's impossible to tell quite what the effect is in all the different use cases, because there are so many different data paths for all these different multimedia formats.

The ffmpeg docs say on this variable (AVStream::avg_frame_rate), "May be set by the caller before avformat_write_header()". So that's why I'm including it.

demos/video-capture.py

bboudaoud-nv · 2026-01-22T14:16:06Z

demos/video-capture.py

+        while (now := time.monotonic()) < next_frame_at:
+            time.sleep(next_frame_at - now)


This seems biased towards likely to oversleep the goal. Not a problem for relaxed CFR video, but certainly something to consider. On many platforms threaded sleep is a promise to sleep for "at least [X]" but not exactly/under [X]. For example if you do sleep to within say < 1ms of the goal you might just wanna call that good enough here.

Yeah, I thought about your precision sleep library when I wrote that. I decided it wasn’t worth the added complexity for this purpose. But to speak to your point, I'm afraid I'll need to get a little pedantic.

I believe we've already accounted for bias, although not variance. This loop structure is designed to eliminate bias (since we only care about the capture frequency, not phase).

Suppose the sleep implementation (just as an example) tends to oversleep by a mean of 2 ms, with a standard deviation of 0 ms: it’s all bias, no variance.

The first loop iteration, let’s say, starts at midnight. Since next_frame_at increments by 1/30 sec each frame, regardless of how long anything in the loop took (including the sleep itself), it will schedule frames at 0 s, 0.033 s, 0.067 s, and 0.100 s.

Now, let’s look at when the sleeps exit, and the captures happen. The sleep implementation doesn’t return until 2 ms after it’s scheduled. That means the captures will be at 0.002 s, 0.035 s, 0.069 s, and 0.102 s.

These are still at 1/30 sec intervals. They’re not the exact times that we sent to sleep, but they’re still at the desired frequency, just at a different phase.

Now, let’s add in variance. By definition, the variance term averages to 0. For now, let’s assume that the sleep still has a mean of 2 ms, but now its variance is always ±1 ms (easier to write the numbers than σ²=1 ms). It can overshoot by 1 or 3 ms. So what happens then?

We still are using the same sequence of deadlines, of course: 0 s, 0.033 s, 0.067 s, 0.100 s. Now, the sleep exits at 0.003 s, 0.034 s, 0.070 s, and 0.101 s.

Our frames are still scheduled at a mean frequency of exactly 1/30 sec, although now with a frequency error of 0 ± 1 ms: jitter. Unfortunately, that’s not something that we can get rid of without a precision sleep implementation like yours. However, that timing noise is a lot less of a problem than a systematic error, since that would cause issues with A/V sync, seeking, etc.

I believe the loop structure — incrementing the target time by 1/30 sec each iteration — eliminates the systematic error. There’s no bias in the frequency. This is different than the most common naïve implementations I’ve seen: either adding 1/30 sec to the previous loop’s actual start time (now in our code), or worse yet, sleeping 1/30 sec each loop. Both of those will show a bias in the frame timings.

If this code does still allow a bias, a systemic error that has a non-zero mean, then I’d want to look for ways to eliminate that. But I think the present implementation, as it stands, does eliminate the bias, and ensures that any timing errors in sleep only get propagated to the frame timings with their variance, not bias.

I believe naptime (our accurate sleep library) should be publicly available shortly. So if you want to use it, it's a low dependency/small library to improve this. It will increase system load marginally w/ more busy wait, but is useful in a case like this.

Generally speaking on Linux time.sleep() time variance is lower as the scheduler does a better job of servicing tasks at a fine time granularity. But on Windows as system load increases I've seen > 4 ms sleep "increments" and minimum sleep times of up to 16 ms. This can be a pain when trying to accurately wait for just a few ms.

Glad to hear it!

Another bit of good news: in Python 3.11 and later, on Windows 10 and newer, Python will use a high-resolution timer. This provides a resolution of 100 ns. I'm not sure how this interacts with the scheduler, but it probably won't be nearly as bad as what you're seeing presently. I hadn't realized that the Windows sleep implementation was that coarse!

bboudaoud-nv · 2026-01-22T14:27:29Z

I'm definitely picking nits at this point, this code is very well written and should be a nice example of how to (actually) screen capture video with pyav as opposed to the weird muxing example they provide in their docs!

jholveck · 2026-01-23T07:46:02Z

I'm definitely picking nits at this point, this code is very well written and should be a nice example of how to (actually) screen capture video with pyav as opposed to the weird muxing example they provide in their docs!

I'm so glad to hear it! Thank you for your diligent review! At this point, I think I'll review your comments again and see if there are ways I can improve the comments to explain some of the sticking points, and then @BoboTiG can merge it.

jholveck · 2026-01-28T04:23:21Z

@BoboTiG I'm happy with merging it as it stands. I think @bboudaoud-nv and I have some different ideas about how to handle the finer details of the timing, but nothing that's significant enough to worry about for a demo. I may come back and change the relevant code later, but as it stands I think it's in great shape to commit. Then we can get back to focusing on improvements to functionality.

jholveck added 9 commits January 2, 2026 18:41

Basic draft of the video capture code.

7cadb41

I'll probably break this into a simple and advanced version too. I may have to take out the audio code. This also currently uses some of my work in the (unmerged) feat-buffer branch, so I'll need to switch it to use what's available now.

Merge branch 'BoboTiG:main' into feat-demo-video

4e3c559

Work the video demo to a more viable form

ca91fe5

Add more docs to the video capture demo

fba9bda

Very much incomplete, sometimes stopping mid-sentence. But I've written enough that I don't want to lose it, so here's an intermediate commit.

Improve comments

e788690

Add a comment about color spaces

eed9245

Add notes about colorspace tagging

03da28d

Also reformat the comments.

Add a pointer to the comments in pipeline.py

883f365

Merge branch 'main' into feat-demo-video

a7f63a3

jholveck added 5 commits January 17, 2026 01:49

Add comments and help strings about using other codecs

dfe2784

Merge branch 'main' into feat-demo-video

5c05ced

Add a simple version

8705054

Add information about VFR

76d3b85

Merge branch 'BoboTiG:main' into feat-demo-video

c37017d