fix: buffer Telegram photo-album messages into a single Claude request#188
Open
IliyaBrook wants to merge 1 commit intoRichardAtCT:mainfrom
Open
fix: buffer Telegram photo-album messages into a single Claude request#188IliyaBrook wants to merge 1 commit intoRichardAtCT:mainfrom
IliyaBrook wants to merge 1 commit intoRichardAtCT:mainfrom
Conversation
americodias
added a commit
to americodias/claude-code-telegram
that referenced
this pull request
Apr 27, 2026
…oup batching
Three-piece hybrid that preserves upstream's native multimodal SDK content
blocks while adding Cortex-specific Obsidian vault integration.
## 1. .media.telegram/ persistence layer
New MediaArchive helper saves every Telegram-uploaded image, document
and voice file to a vault-relative archive directory before forwarding
to Claude:
.media.telegram/images/<chat_id>_<message_id>.<ext>
.media.telegram/pdfs/<chat_id>_<message_id>_<original_name>
.media.telegram/documents/<chat_id>_<message_id>_<original_name>
.media.telegram/audios/<timestamp>/{received,sent}.{ogg,txt}
Image bytes still flow into the SDK as base64 content blocks (upstream's
v1.6.0 native multimodal behaviour), but the prompt is augmented with
the saved path so Claude can reference the file via Obsidian
``![[name]]`` wiki-links if a note is being written. Voice handler now
takes an optional media_archive and persists received audio + transcript
into a fresh paired-audio dir whose path it returns on ProcessedVoice;
the TTS reply later writes its sent.* peer into the same dir, keeping
voice exchanges grouped on disk.
Settings: MEDIA_ARCHIVE_ENABLED, MEDIA_ARCHIVE_DIR.
## 2. Post-turn ![[...]] reference detector → 5-Attachments promotion
After every Claude turn (text, document, photo, voice paths), the
orchestrator collects .md file paths Claude touched via Edit/Write/
MultiEdit calls and scans them for embed references (``![[name]]``).
Any match whose target lives in the archive is copied into
5-Attachments/<type>/YYYY-MM/<name>
Both copies remain: the archive copy stays as the raw record
(gitignored), the attachments copy becomes what Obsidian renders. Copy
is content-hash idempotent — re-running on the same note never
duplicates work. Type subdirs map archive layout (images / pdfs /
documents / audios) with a fallback by file extension.
Settings: ATTACHMENT_PROMOTE_ENABLED, ATTACHMENT_DIR.
## 3. Media-group batching
Telegram delivers a multi-file selection (media group) as N separate
Update events sharing the same media_group_id. Without batching, the
default agentic handlers fire one Claude session and one reply per
item — N "Working..." messages racing on the same session id.
New MediaGroupBuffer keyed on (chat_id, media_group_id) downloads
each item via the existing flow, debounces for
TELEGRAM_MEDIA_GROUP_WINDOW_SECONDS (default 2.5s), then runs Claude
once with a combined prompt listing every saved path and replies
once. Single-file uploads (media_group_id is None) keep the existing
fast path. The buffer stays UI-agnostic — orchestrator supplies the
flush callback that builds SDK content blocks and dispatches through
``_handle_agentic_media_message``.
Settings: MEDIA_GROUP_WINDOW_SECONDS, MEDIA_GROUP_MAX_FILES.
## Upstream-PR posture
Pieces 1 and 3 are generic enough to upstream against
RichardAtCT/claude-code-telegram (PR RichardAtCT#188 already proposes media-group
batching with a different shape). Piece 2 is Cortex-specific because
it assumes an Obsidian vault layout — stays local.
## Tests
- tests/unit/test_bot/test_media_archive.py — save_image / save_document /
pair_dir uniqueness / sanitize / transcript optional
- tests/unit/test_bot/test_attachment_promoter.py — collect_modified_md_paths
filters / multi-input-key support / promote happy path / disabled / idempotent
/ pdf kind
- tests/unit/test_media_group_buffer.py — buffer happy path, debounce, cap,
cancellation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to the discussion in #186 — @MatveyF flagged that the same
"one logical message, N responses" problem happens with images:
Root cause
When a user sends a Telegram album (text + N photos), Telegram delivers
it as N separate
Updates that share a commonmessage.media_group_id;the caption lives on only one of them. The agentic photo handler fires
once per
Update, so Claude is invoked N times independently instead ofseeing the album as a single message.
Fix
MediaGroupBuffer(src/bot/utils/media_group_buffer.py) —debounces photos keyed by
(user_id, chat_id, thread_id, media_group_id).Any photo carrying a
media_group_idis buffered; a short timer(default
1.0s, configurable viaMEDIA_GROUP_BUFFER_TIMEOUT, range0.3 – 5.0) fires after the last photo and flushes all photos +caption as a single payload.
agentic_photorefactored into a dispatcher: standalone photos takethe existing fast path; album photos go through the buffer.
_process_photo_batch— the first photo isprocessed with the caption (so the prompt template keeps user intent),
the rest are processed for image data only, and a single
claude_integration.run_commandis issued with all images attached.