Skip to content

feat(context-drop): drop a screen video clip, not just a screenshot#1859

Merged
liususan091219 merged 9 commits into
mainfrom
feat/drop-video-clip
Jul 1, 2026
Merged

feat(context-drop): drop a screen video clip, not just a screenshot#1859
liususan091219 merged 9 commits into
mainfrom
feat/drop-video-clip

Conversation

@liususan091219

Copy link
Copy Markdown
Collaborator

Problem

Context-drop can drop a screenshot (⌃S), but a single still often can't reproduce a bug — the carousel stuck-loading issue we chased today is invisible in a screenshot. A short screen video is a far better repro to hand the agent.

What this adds

A "Drop Video Clip" action (⌃R) that records a few seconds of screen and drops the .mov as a task, mirroring the existing screenshot drop.

  • src/screen-capture-server.py — new GET /capture-video?seconds=N&display=D. Runs screencapture -v -V<n> -x on this server because it already holds the Screen Recording TCC grant (same reason /capture lives here). Duration clamped 1..60s, display 1..9, int-coerced exactly like /capture so request data never reaches the subprocess argv. Verifies a non-empty .mov before returning 200 {status, path, seconds}.
  • src/Sutando/main.swiftdropVideoClip() mirrors dropScreenshot(): debounced, hits /capture-video, writes a type: video task. It pre-notifies "Recording Ns…" since (unlike a still) it takes a few seconds. Wired as the drop_video_clip action — menu item + ⌃R default hotkey, configurable via ~/.config/sutando/hotkeys.json like the others.

The task pipeline already handles .mov, so dropped clips flow through unchanged.

Test

  • python3 -m py_compile src/screen-capture-server.py — clean.
  • swiftc -O main.swift SutandoConfig.swift -framework Cocoa -framework Carbon -framework ApplicationServices -framework AVFoundation (the startup.sh build command) — compiles clean.
  • screencapture -v -V1 -x out.mov produces a valid .mov on macOS 26.
  • Manual ⌃R end-to-end (record → task drop) pending a human pass.

Notes

  • The capture server is single-threaded, so a video drop blocks it for seconds (the vision ticker's /capture waits). Acceptable for a rare manual action; switching to ThreadingHTTPServer is a clean follow-up if it matters.

liususan091219 and others added 3 commits June 30, 2026 11:55
Adds a "Drop Video Clip" hotkey (⌃R) that records a few seconds of screen
and drops the .mov as a task. A short video repro is far easier to act on
than a single still, and it captures UI bugs a screenshot misses (e.g. the
stuck-loading carousel we just chased).

- screen-capture-server.py: new GET /capture-video?seconds=N&display=D.
  Runs `screencapture -v -V<n> -x` on this server because it already holds
  the Screen Recording TCC grant (same reason /capture lives here). Duration
  clamped 1..60s, display 1..9, int-coerced like /capture to keep request
  taint out of the subprocess argv. Verifies a non-empty .mov before 200.
- main.swift: dropVideoClip() mirrors dropScreenshot() — debounced, hits
  /capture-video, writes a `type: video` task. Pre-notifies "Recording Ns…"
  since unlike a still it takes a few seconds. Wired as drop_video_clip
  (menu item + ⌃R default hotkey, configurable via hotkeys.json).

The task pipeline already handles .mov, so dropped clips flow through
unchanged.

Tested: server py_compiles; main.swift + SutandoConfig.swift compile with the
startup.sh build command; `screencapture -v -V1 -x` produces a valid .mov on
macOS 26. Manual hotkey end-to-end pending (UI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hot branch

/capture-video also matches startswith("/capture"), so the screenshot
branch intercepted it and returned a PNG. Exclude it explicitly so the
video endpoint actually records. Caught self-testing before merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Locks the routing bug fixed in 6eea6a2: /capture-video must record a .mov
and not be swallowed by the /capture screenshot branch. Mocks screencapture
so it runs headless. Also asserts /capture still returns .png, duration
clamping, and the source-level routing guard. Independently confirmed as the
sole finding in Sutando-Pro's review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@liususan091219

Copy link
Copy Markdown
Collaborator Author

Pushed two commits to this branch (the commits list shows authored-time; these reached GitHub via a push at 17:15 UTC):

  • 6eea6a2 fix(screen-capture): route /capture-video before the /capture screenshot branch/capture-video also matches startswith("/capture"), so the screenshot handler was intercepting it and returning a PNG (the Swift hotkey would then submit a screenshot as type:video). Now excluded explicitly. Caught self-testing; same finding as the review.
  • b43f505 test(screen-capture): regression guard for /capture-video routing — mocks screencapture so it runs headless; asserts /capture-video -> .mov, /capture -> .png, duration clamping, and the source-level guard.

CI re-ran green. Branch is now feature + fix + test.

@bassilkhilo-ag2 bassilkhilo-ag2 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Routing fix is correct — the explicit and not self.path.startswith('/capture-video') guard is the right fix for the prefix collision (would have silently returned a PNG otherwise). Duration and display index both sanitized before injecting into argv. Test mocks screencapture cleanly. Approve.

Use patch.object + addCleanup instead of bare attribute assignment so the
fake screencapture (and silenced notify helpers) are restored after each test,
not leaked into later tests sharing the process. Addresses Sutando-Pro's review
finding on b43f505.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@liususan091219

Copy link
Copy Markdown
Collaborator Author

Pushed 4f7365b — addresses @Sutando-Pro's review finding on the test.

The setUp did a bare self.mod.subprocess.run = _fake_run; since self.mod.subprocess is the shared stdlib module, that leaked the fake into any later test in the same process. Switched to mock.patch.object(...) + addCleanup(p.stop) so the recorder (and the silenced notify helpers) are restored after each test. Verified: still 4/4 green, and subprocess.run is confirmed restored after the suite runs.

Addresses Sutando-Pro's review finding: /capture-video was an unauthenticated
side-effectful GET, so a drive-by web page could silently trigger a 1-60s
screen recording. The server now generates a token at startup, writes it to a
0600 file (~/.config/sutando/screen-capture-token), and requires it in the
X-Sutando-Capture-Token header, checked before any side effect. The Ctrl-R
hotkey caller reads the file and sends the header.

A browser can neither read the 0600 file nor set a custom header on a
no-cors/<img> request, so it cannot reach the endpoint. Pre-existing /capture
hole left for a follow-up per the review.

Test: adds a 403-without-token case; /capture-video tests send the token.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@liususan091219

Copy link
Copy Markdown
Collaborator Author

Pushed 5c4a13c — token guard on /capture-video (addresses @Sutando-Pro's security finding).

  • Server generates a token at startup → 0600 file ~/.config/sutando/screen-capture-token. /capture-video requires it in the X-Sutando-Capture-Token header, checked before any side effect (no menu flash, no recording on a bad/missing token → 403).
  • Swift dropVideoClip reads that file and sends the header.
  • A browser can neither read the 0600 file nor set a custom header on a no-cors/<img> request, so the drive-by vector is blocked. (An Origin/Referer-only check would be bypassable via referrerpolicy=no-referrer, per the review.)
  • Test: added a 403-without-token case; /capture-video tests now send the token. 5/5 green, Swift builds.
  • The pre-existing /capture (screenshot) hole is left for a follow-up, per the review.

@bassilkhilo-ag2 bassilkhilo-ag2 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation that mirrors dropScreenshot precisely. Key correctness points: (1) token gate on /capture-video so a web page in the capture server process cannot trigger recording via ; (2) notify() call before the blocking request so the user knows capture started (a silent 5s wait would feel broken); (3) debounce guard reuses lastDropTime; (4) duration clamped server-side 1..60s; (5) generous timeout = seconds+30 to cover capture + network. The ⌃R binding does not conflict with any existing hotkey. ✓

@bassilkhilo-ag2 bassilkhilo-ag2 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feature. ⌃R hotkey, debounced same as dropScreenshot, notify-before-wait (user sees 'Recording 5s...' instead of silence). Token auth via X-Sutando-Capture-Token mirrors the screenshot pattern. Generous 35s timeout for the blocking capture. LGTM.

Per Susan: ⌃R should start recording, then stop+drop on a second press —
user-controlled length, not a fixed duration. /capture-video gains
?action=start (open-ended screencapture -v via Popen, returns at once) and
?action=stop (SIGINT finalizes the .mov, returns path), with a single-recording
lock + a MAX_RECORDING_SECONDS watchdog so a forgotten recording can't run
forever. No-action requests keep the legacy fixed-duration path (backward
compat). Swift dropVideoClip() becomes a stateful toggle. Tests cover
start->stop, stop-when-idle, double-start guard, and token gate on start.
…eo path

The fixed-duration -V capture was the pre-toggle behavior and had no callers
left once ⌃R became a start/stop toggle. Remove it so /capture-video is
toggle-only: no/unknown action now returns 400. Tests updated (drop the
fixed-duration cases, add no-action->400).
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@cla-assistant check

@liususan091219 liususan091219 merged commit 583a430 into main Jul 1, 2026
6 checks passed
@liususan091219 liususan091219 deleted the feat/drop-video-clip branch July 1, 2026 15:48
liususan091219 added a commit that referenced this pull request Jul 1, 2026
…keys off terminal keys (#1890)

* fix(context-drop): capture audio in drop-video-clip + move hotkeys off terminal keys

drop-video-clip (added in #1859) recorded silent video: the capture server
ran `screencapture -v -x` with no audio flag. Add `-g` so the clip captures
the default audio input. The source follows System Settings > Sound > Input:
the mic for narration/room audio, or BlackHole (with output routed there) for
system/app audio. `?audio=off` disables sound; `?device=<id>` pins a specific
input via `-G<id>`.

Also move the two global hotkeys that shadowed terminal keys:
- drop_context:    Ctrl-C -> Ctrl-Shift-C (was intercepting SIGINT/cancel)
- drop_video_clip: Ctrl-R -> Ctrl-Shift-R (was intercepting reverse-i-search)

Global hotkeys match the exact modifier mask, so plain Ctrl-C / Ctrl-R still
reach the terminal; only the shifted variants trigger Sutando.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(context-drop): disclose audio in recording HUD + fix stale Ctrl-R label

Addresses review of #1890: the drop-video-clip start notification read
"Recording screen - press Ctrl-R again to stop", which (a) didn't disclose that
mic audio is now captured and (b) still referenced Ctrl-R after the hotkey moved
to Ctrl-Shift-R. Update to "Recording screen + mic - press Ctrl-Shift-R again to
stop".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants