Skip to content

Conversation

@jholveck
Copy link
Contributor

@jholveck jholveck commented Jan 17, 2026

Changes proposed in this PR

It seems that there are a number of users who use MSS for video encoding. Add a demo showing how to do this.

This again highlights multithreaded pipelining, like the TinyTV demo, but is more accessible (as few users have a TinyTV). It also shows a number of pitfalls that are common when encoding video, such as failing to correctly record timestamps.

  • Tests added/updated - N/A
  • Documentation updated
  • Changelog entry added
  • ./check.sh passed

I'll probably break this into a simple and advanced version too.
I may have to take out the audio code.

This also currently uses some of my work in the (unmerged) feat-buffer
branch, so I'll need to switch it to use what's available now.
Very much incomplete, sometimes stopping mid-sentence.  But I've
written enough that I don't want to lose it, so here's an intermediate
commit.
Also reformat the comments.
@BoboTiG
Copy link
Owner

BoboTiG commented Jan 17, 2026

Nice!!

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 17, 2026

This is really good information here, priceless!

If I wanted to use h265, is it as easy as serting "h265", or does it need special handling?

@jholveck
Copy link
Contributor Author

You could pass --codec libx265 on the command line. The thing you use there is the same as ffmpeg's -c:v flag.

You can get a list of available codecs in the PyAV build using python3 -m av --codecs; libx265 is among those.

You'd need to comment out the "profile":"high" in CODEC_OPTIONS: libx265 doesn't recognize the high profile. Most, if not all, of the features in the H.264 "high" profile are already part of the H.265 "main" (default) profile.

You can look at other flags for libx265 using ffmpeg --help encoder=libx265, if your ffmpeg build has libx265 compiled in. The libx265 library is GPL-only, so some builds might not include it, but the one included in PyAV does.

I'll add some comments to this effect.

@bboudaoud-nv
Copy link

This is really good information here, priceless!

If I wanted to use h265, is it as easy as serting "h265", or does it need special handling?

@BoboTiG codec names can be a bit finicky here. IIRC on Linux x265 should work for encode. On Windows I believe its hevc. The optional parameters passed in, as well as the supported encoding formats for frames may change with the codec depending on the system though. Currently h264 encoding is definitely the safe choice for multi-platform.

You could also experiment with codec names like mpeg or h264 to see if you get better multiplatform support here.

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 20, 2026

Currently h264 encoding is definitely the safe choice for multi-platform.

Fully agree on that.

Thank you for the useful review :)

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 20, 2026

@jholveck should we do checks for third-party modules?

I was testing, and I was missing multiple modules, I needed to look on PyPI, a process which could be improved.

Should we check at import time, something like this?

try:
    import av
except ImportError:
    print("The PyAv module is missing, run: `python -m pip install av`")
    sys.exit(1)

try:
    import si_prefix
except ImportError:
    print("The si-prefix module is missing, run: `python -m pip install si-prefix`")
    sys.exit(1)

This feels like a horrible solution haha, I'm simply putting words on a idea.

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 20, 2026

Overall, it works great! I see a small difference in colors, on Linux, but I guess this is due to the JPEG compression.

@jholveck
Copy link
Contributor Author

Overall, it works great! I see a small difference in colors, on Linux, but I guess this is due to the JPEG compression.

You might try turning on DISPLAY_IS_SRGB.

@jholveck
Copy link
Contributor Author

@jholveck should we do checks for third-party modules?

I think that doing explicit checks like that might be more mess than success. But I can easily add a comment above where we import third-party modules giving the right pip command to install them.

@jholveck
Copy link
Contributor Author

Overall, it works great! I see a small difference in colors, on Linux, but I guess this is due to the JPEG compression.
You might try turning on DISPLAY_IS_SRGB.

By the way, this relates to #207. I think that tagging screenshots with the display's colorspace would be a useful future addition to MSS, for just this sort of thing.

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 20, 2026

@jholveck should we do checks for third-party modules?

I think that doing explicit checks like that might be more mess than success. But I can easily add a comment above where we import third-party modules giving the right pip command to install them.

Yes, lets keep it simple then: at the top of the file, one line to install everything.

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 20, 2026

Out of curiosity, do you plan to add more stuff in that PR? Wondering if you keep it as a draft for a specific reason :)

@jholveck
Copy link
Contributor Author

jholveck commented Jan 21, 2026

Out of curiosity, do you plan to add more stuff in that PR? Wondering if you keep it as a draft for a specific reason :)

Not specifically, but I've asked my colleague, @bboudaoud-nv , to review this. He's worked with a number of internal devs who use MSS and other libraries for video encoding, so I wanted to get any insights he might have. He's already provided comments on the simple demo; I've asked him to also look at the full one too.

Once he's done his review, I expect that I'll be ready for your final review and commit.

Edit: Actually, I'll probably also fix what's needed to make it Windows-compatible, as he mentioned in his review comments.

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 21, 2026

Overall, it works great! I see a small difference in colors, on Linux, but I guess this is due to the JPEG compression.

You might try turning on DISPLAY_IS_SRGB.

Actually, DISPLAY_IS_SRGB being True or False changes nothing. and that's OK, I was just noting the fact. No need to provide a script to handle all cases.

# Timestamps (PTS/DTS)
# --------------------
#
# Every frame has a *presentation timestamp* (PTS): when the viewer should see it.
Copy link

@bboudaoud-nv bboudaoud-nv Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth noting a PTS is an integer value scaled by the time base (and maybe introduce time base before this section for logical flow)

# Constant Frame Rate (CFR) and Variable Frame Rate (VFR)
# -------------------------------------------------------
#
# Many video files run at a fixed frame rate, like 30 fps. Each frame is shown at 1/30 sec intervals. This is called

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to point out in cases like this the time base can just be the frame rate/time and then the PTS values are simply integers increasing by 1 at each frame


# Keep running this loop until the main thread says we should stop.
while not shutdown_requested.is_set():
# Wait until we're ready. This should, ideally, happen every 1/fps second.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as other script I'd be tempted to capture then sleep for the remaining time to retain per-call perf/avoid slowing over the target rate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get that concept. Can you elaborate on the difference?

One reason for doing the sleep right before the capture is because the capture is what we want to do at precise intervals. Putting the sleep here means that jitter in the time taken by other steps (such as blocking in the yield for the mailbox to become empty) doesn't translate into the capture interval.

I'm still not clear on why you are saying that this would impact per-call perf and slow the overall rate. If we were sleeping 1/30 sec each time, I'd agree with you, but here we're sleeping until 1/30 sec since the previous frame's target time. Is there something I'm missing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same idea as above, the goal for video is to have each frame be "captured" 1 / fps apart. The idea that you fall behind 1-off then get many short frames is less desirable than simply resuming regularly spaced frames after. As you point out technically which side of the capture this is on isn't all that important. But if you want to maintain frame rate as best as possible I'd do something like:

dt = 1 / fps
while not shutdown_requested.is_set():
  now= time.time()
  screenshot = sct.grab()
  dur = time.time() - tstart
  yield screenshot, now
  remaining = dt - dur
  if remaining > 0:
    time.sleep(remaining)

This way each frame is timed independently and one long frame doesn't cause impacts on other frames. (e.,g., the fps will resume/not "catch up" on the other side of the hitch)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'm assuming here is that there is more runtime variation in the sct.grab() call duration than there is in the sleep() call here. That may not be true on all platforms.

if first_frame_at is None:
first_frame_at = timestamp
frame.pts = int((timestamp - first_frame_at) / TIME_BASE)
frame.time_base = TIME_BASE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe you need to set the individual frame time bases here, just the PTS value. In fact I think you may even be able to set the frame.time and let it handle the conversion to PTS here, but I haven't tried that lately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely correct that you can set just the PTS value. If the frame doesn't have a time_base set, then PyAV will assume the destination stream's time_base when it encodes the frame. (This is a feature of PyAV, not ffmpeg, as far as I can tell.)

However, you do need to set the time base if you want to use frame.time, which can be quite handy for debugging. It's also necessary if you want to, for instance, put your frame times in nanoseconds (for simplicity) and just have them get converted to the stream time_base at encoding time; that's what I've done in some code I've written.

It's not mandatory, but it does seem to be a reasonable practice, rather than relying on the default "whatever I get put into" behavior. Thanks for the point, but I think I'll leave it as-is. I've gotten burned in the past by not setting the time_base in all the right places, so I think implicit is reasonable here.

FWIW, you can't set frame.time.

# The rate= parameter here is just the nominal frame rate: some tools (like file browsers) might display
# this as the frame rate. But we actually control timing via the pts and time_base values on the frames
# themselves.
video_stream = avmux.add_stream(codec, rate=fps, options=CODEC_OPTIONS)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of the rate arg here with the fps may in some cases set a different time base/assume a CFR encoding. I tend to avoid rate here as its not required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on those circumstances? I haven't seen that happen when I also provide an explicit time base on the stream, as long as I assign the time base after setting the rate.

As the comment mentions, the rate= parameter does set a nominal frame rate for the file, which some tools use for quick display or estimation, or to convert to CFR. It seems to make sense to include that in the metadata if possible. But if it causes problems, then yeah, I'll eliminate it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more confusing to folks than actually a problem. I believe what happens here is the time base gets set using rate then this just gets overridden later in the process and PTS values are assigned to each frame. I'm not sure if adding rate to this constructor has any other effect to the encoded video.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to tell quite when it's used, but it's in places other than the time base.

For instance, it's definitely written to the Matroska (MKV) headers, if you're using that format, hence my comment about it setting a nominal frame rate that "might" be displayed. (Try ./video-capture.py --fps 123 --output foo.mkv --duration 10 && ffprobe foo.mkv, and note that "123" is in the nominal frame rate display shown there.) I think it might be used in an MP4's tmcd atom, but I can't really tell how. It's like a lot of things in ffmpeg: it's impossible to tell quite what the effect is in all the different use cases, because there are so many different data paths for all these different multimedia formats.

The ffmpeg docs say on this variable (AVStream::avg_frame_rate), "May be set by the caller before avformat_write_header()". So that's why I'm including it.

Comment on lines +176 to +177
while (now := time.monotonic()) < next_frame_at:
time.sleep(next_frame_at - now)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems biased towards likely to oversleep the goal. Not a problem for relaxed CFR video, but certainly something to consider. On many platforms threaded sleep is a promise to sleep for "at least [X]" but not exactly/under [X]. For example if you do sleep to within say < 1ms of the goal you might just wanna call that good enough here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought about your precision sleep library when I wrote that. I decided it wasn’t worth the added complexity for this purpose. But to speak to your point, I'm afraid I'll need to get a little pedantic.

I believe we've already accounted for bias, although not variance. This loop structure is designed to eliminate bias (since we only care about the capture frequency, not phase).

Suppose the sleep implementation (just as an example) tends to oversleep by a mean of 2 ms, with a standard deviation of 0 ms: it’s all bias, no variance.

The first loop iteration, let’s say, starts at midnight. Since next_frame_at increments by 1/30 sec each frame, regardless of how long anything in the loop took (including the sleep itself), it will schedule frames at 0 s, 0.033 s, 0.067 s, and 0.100 s.

Now, let’s look at when the sleeps exit, and the captures happen. The sleep implementation doesn’t return until 2 ms after it’s scheduled. That means the captures will be at 0.002 s, 0.035 s, 0.069 s, and 0.102 s.

These are still at 1/30 sec intervals. They’re not the exact times that we sent to sleep, but they’re still at the desired frequency, just at a different phase.

Now, let’s add in variance. By definition, the variance term averages to 0. For now, let’s assume that the sleep still has a mean of 2 ms, but now its variance is always ±1 ms (easier to write the numbers than σ²=1 ms). It can overshoot by 1 or 3 ms. So what happens then?

We still are using the same sequence of deadlines, of course: 0 s, 0.033 s, 0.067 s, 0.100 s. Now, the sleep exits at 0.003 s, 0.034 s, 0.070 s, and 0.101 s.

Our frames are still scheduled at a mean frequency of exactly 1/30 sec, although now with a frequency error of 0 ± 1 ms: jitter. Unfortunately, that’s not something that we can get rid of without a precision sleep implementation like yours. However, that timing noise is a lot less of a problem than a systematic error, since that would cause issues with A/V sync, seeking, etc.

I believe the loop structure — incrementing the target time by 1/30 sec each iteration — eliminates the systematic error. There’s no bias in the frequency. This is different than the most common naïve implementations I’ve seen: either adding 1/30 sec to the previous loop’s actual start time (now in our code), or worse yet, sleeping 1/30 sec each loop. Both of those will show a bias in the frame timings.

If this code does still allow a bias, a systemic error that has a non-zero mean, then I’d want to look for ways to eliminate that. But I think the present implementation, as it stands, does eliminate the bias, and ensures that any timing errors in sleep only get propagated to the frame timings with their variance, not bias.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe naptime (our accurate sleep library) should be publicly available shortly. So if you want to use it, it's a low dependency/small library to improve this. It will increase system load marginally w/ more busy wait, but is useful in a case like this.

Generally speaking on Linux time.sleep() time variance is lower as the scheduler does a better job of servicing tasks at a fine time granularity. But on Windows as system load increases I've seen > 4 ms sleep "increments" and minimum sleep times of up to 16 ms. This can be a pain when trying to accurately wait for just a few ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to hear it!

Another bit of good news: in Python 3.11 and later, on Windows 10 and newer, Python will use a high-resolution timer. This provides a resolution of 100 ns. I'm not sure how this interacts with the scheduler, but it probably won't be nearly as bad as what you're seeing presently. I hadn't realized that the Windows sleep implementation was that coarse!

@bboudaoud-nv
Copy link

I'm definitely picking nits at this point, this code is very well written and should be a nice example of how to (actually) screen capture video with pyav as opposed to the weird muxing example they provide in their docs!

@jholveck
Copy link
Contributor Author

I'm definitely picking nits at this point, this code is very well written and should be a nice example of how to (actually) screen capture video with pyav as opposed to the weird muxing example they provide in their docs!

I'm so glad to hear it! Thank you for your diligent review! At this point, I think I'll review your comments again and see if there are ways I can improve the comments to explain some of the sticking points, and then @BoboTiG can merge it.

@jholveck jholveck mentioned this pull request Jan 28, 2026
4 tasks
@jholveck jholveck marked this pull request as ready for review January 28, 2026 03:24
@jholveck
Copy link
Contributor Author

@BoboTiG I'm happy with merging it as it stands. I think @bboudaoud-nv and I have some different ideas about how to handle the finer details of the timing, but nothing that's significant enough to worry about for a demo. I may come back and change the relevant code later, but as it stands I think it's in great shape to commit. Then we can get back to focusing on improvements to functionality.

@BoboTiG BoboTiG merged commit b0ae026 into BoboTiG:main Jan 28, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants