Skip to content

fix: buffer Telegram photo-album messages into a single Claude request#188

Open
IliyaBrook wants to merge 1 commit intoRichardAtCT:mainfrom
IliyaBrook:fix/186-buffer-photo-album
Open

fix: buffer Telegram photo-album messages into a single Claude request#188
IliyaBrook wants to merge 1 commit intoRichardAtCT:mainfrom
IliyaBrook:fix/186-buffer-photo-album

Conversation

@IliyaBrook
Copy link
Copy Markdown

Follow-up to the discussion in #186@MatveyF flagged that the same
"one logical message, N responses" problem happens with images:

if I attach images to my message I get a response per image.
E.g. if I send a piece of text with 3 images it will respond 3 times —
one for the text and the first image, second for the second image,
and third for the third image

Root cause

When a user sends a Telegram album (text + N photos), Telegram delivers
it as N separate Updates that share a common message.media_group_id;
the caption lives on only one of them. The agentic photo handler fires
once per Update, so Claude is invoked N times independently instead of
seeing the album as a single message.

Fix

  • New MediaGroupBuffer (src/bot/utils/media_group_buffer.py) —
    debounces photos keyed by (user_id, chat_id, thread_id, media_group_id).
    Any photo carrying a media_group_id is buffered; a short timer
    (default 1.0s, configurable via MEDIA_GROUP_BUFFER_TIMEOUT, range
    0.3 – 5.0) fires after the last photo and flushes all photos +
    caption as a single payload.
  • agentic_photo refactored into a dispatcher: standalone photos take
    the existing fast path; album photos go through the buffer.
  • Shared work extracted to _process_photo_batch — the first photo is
    processed with the caption (so the prompt template keeps user intent),
    the rest are processed for image data only, and a single
    claude_integration.run_command is issued with all images attached.
  • The Stop button cancels any pending media-group buffers for the user.

americodias added a commit to americodias/claude-code-telegram that referenced this pull request Apr 27, 2026
…oup batching

Three-piece hybrid that preserves upstream's native multimodal SDK content
blocks while adding Cortex-specific Obsidian vault integration.

## 1. .media.telegram/ persistence layer

New MediaArchive helper saves every Telegram-uploaded image, document
and voice file to a vault-relative archive directory before forwarding
to Claude:

  .media.telegram/images/<chat_id>_<message_id>.<ext>
  .media.telegram/pdfs/<chat_id>_<message_id>_<original_name>
  .media.telegram/documents/<chat_id>_<message_id>_<original_name>
  .media.telegram/audios/<timestamp>/{received,sent}.{ogg,txt}

Image bytes still flow into the SDK as base64 content blocks (upstream's
v1.6.0 native multimodal behaviour), but the prompt is augmented with
the saved path so Claude can reference the file via Obsidian
``![[name]]`` wiki-links if a note is being written. Voice handler now
takes an optional media_archive and persists received audio + transcript
into a fresh paired-audio dir whose path it returns on ProcessedVoice;
the TTS reply later writes its sent.* peer into the same dir, keeping
voice exchanges grouped on disk.

Settings: MEDIA_ARCHIVE_ENABLED, MEDIA_ARCHIVE_DIR.

## 2. Post-turn ![[...]] reference detector → 5-Attachments promotion

After every Claude turn (text, document, photo, voice paths), the
orchestrator collects .md file paths Claude touched via Edit/Write/
MultiEdit calls and scans them for embed references (``![[name]]``).
Any match whose target lives in the archive is copied into

  5-Attachments/<type>/YYYY-MM/<name>

Both copies remain: the archive copy stays as the raw record
(gitignored), the attachments copy becomes what Obsidian renders. Copy
is content-hash idempotent — re-running on the same note never
duplicates work. Type subdirs map archive layout (images / pdfs /
documents / audios) with a fallback by file extension.

Settings: ATTACHMENT_PROMOTE_ENABLED, ATTACHMENT_DIR.

## 3. Media-group batching

Telegram delivers a multi-file selection (media group) as N separate
Update events sharing the same media_group_id. Without batching, the
default agentic handlers fire one Claude session and one reply per
item — N "Working..." messages racing on the same session id.

New MediaGroupBuffer keyed on (chat_id, media_group_id) downloads
each item via the existing flow, debounces for
TELEGRAM_MEDIA_GROUP_WINDOW_SECONDS (default 2.5s), then runs Claude
once with a combined prompt listing every saved path and replies
once. Single-file uploads (media_group_id is None) keep the existing
fast path. The buffer stays UI-agnostic — orchestrator supplies the
flush callback that builds SDK content blocks and dispatches through
``_handle_agentic_media_message``.

Settings: MEDIA_GROUP_WINDOW_SECONDS, MEDIA_GROUP_MAX_FILES.

## Upstream-PR posture

Pieces 1 and 3 are generic enough to upstream against
RichardAtCT/claude-code-telegram (PR RichardAtCT#188 already proposes media-group
batching with a different shape). Piece 2 is Cortex-specific because
it assumes an Obsidian vault layout — stays local.

## Tests

- tests/unit/test_bot/test_media_archive.py — save_image / save_document /
  pair_dir uniqueness / sanitize / transcript optional
- tests/unit/test_bot/test_attachment_promoter.py — collect_modified_md_paths
  filters / multi-input-key support / promote happy path / disabled / idempotent
  / pdf kind
- tests/unit/test_media_group_buffer.py — buffer happy path, debounce, cap,
  cancellation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant