Skip to content

feat: add send_file_to_user MCP tool for arbitrary file delivery#191

Open
timurvafin wants to merge 3 commits intoRichardAtCT:mainfrom
timurvafin:feat/send-file-to-user
Open

feat: add send_file_to_user MCP tool for arbitrary file delivery#191
timurvafin wants to merge 3 commits intoRichardAtCT:mainfrom
timurvafin:feat/send-file-to-user

Conversation

@timurvafin
Copy link
Copy Markdown

Summary

Adds a new MCP tool send_file_to_user so Claude can deliver any file type (PDF, zip, csv, logs, binaries, …) to the Telegram user as a document. The existing send_image_to_user keeps working and is marked deprecated — no existing MCP configs or prompts break.

Closes the gap noted in #148 (closed, superseded by the bundled #152). Intentionally narrow scope: no streaming-drafts / per-event UX changes, no file upload direction change — just outbound file delivery.

Design

Tool-side validation (MCP server) is intentionally minimal — absolute path, exists, 0 < size ≤ 50 MB (Telegram Bot API document upload limit). The tool doesn't have access to the bot's runtime configuration, so it can't do security checks.

Full validation lives bot-side in src/bot/utils/file_extractor.py:

  • APPROVED_DIRECTORY containment via Path.resolve().relative_to()
  • Secrets blocklist reusing SecurityValidator.FORBIDDEN_FILENAMES and DANGEROUS_FILE_PATTERNS (.env, id_rsa, *.pem, certificates, etc.)
  • Size limit duplicated defensively

Silent-mislead guard: when the bot-side rejects a path the tool had already told Claude was queued, the user would otherwise see only Claude's \"Файл отправлен!\" with no file arriving. The orchestrator and classic message handler now collect rejected paths and append a short 🚫 Отклонено политикой безопасности… summary. Only security rejections (outside_approved, blocked_secret) are surfaced — too_large/empty/not_absolute are already reported by the MCP tool itself, so Claude describes those accurately and we don't duplicate.

Delivery path mirrors _send_images: new _send_documents() iterates collected FileAttachments, sends each via reply_document with per-file caption, handles per-file exceptions, appends failures to the same summary. Documents aren't grouped into albums (Telegram doesn't support it).

Relation to prior PRs

Test plan

  • make test — 554 passed (17 new for validate_file_path, 7 new for send_file_to_user MCP tool)
  • make lint — black/isort/flake8 clean on all new/changed files
  • Live tested on @timurv_daily_bot with full scenario matrix:
    • test.txt, test.csv, test.zip — delivered
    • файл с пробелами.txt — Unicode filename preserved (file_name: 'файл с пробелами.txt' in Telegram API response)
    • ✅ Multi-file with per-call captions — test.txt with caption текст, test.csv with caption таблица, both delivered independently
    • /etc/passwd — Claude said "sent!", bot surfaced 🚫 Отклонено политикой безопасности (нельзя отправлять вне APPROVED_DIRECTORY или секреты): passwd
    • .env — same security rejection
    • big.bin (60 MB) — MCP tool returned Error: file too large, Claude explained the limit to user with workaround suggestions, no duplicate bot-side summary
  • Regression: send_image_to_user tests still green; no changes to _send_images logic

Files

  • src/mcp/telegram_server.py — new send_file_to_user, send_image_to_user marked deprecated in docstring, MAX_FILE_SIZE_BYTES constant
  • src/bot/utils/file_extractor.py (new) — FileAttachment, validate_file_path(...) -> (FileAttachment | None, reason | None), REJECTION_SURFACE_TO_USER contract
  • src/bot/orchestrator.py — tool-call intercept, _send_documents, three call-sites plumbed with mcp_files + mcp_rejected_files
  • src/bot/handlers/message.py — same intercept + inline document sending for classic mode
  • tests/unit/test_bot/test_file_extractor.py (new) — 17 cases
  • tests/unit/test_mcp/test_telegram_server.py — 7 new cases

🤖 Generated with Claude Code

timurvafin and others added 2 commits April 24, 2026 09:43
Adds a new MCP tool `send_file_to_user` that lets Claude deliver any file
type (PDF, zip, csv, logs, etc.) to the Telegram user as a document. The
existing `send_image_to_user` is kept working as a deprecated alias so no
existing MCP configs or prompts break.

Tool-side validation in src/mcp/telegram_server.py is intentionally minimal
(absolute path, exists, 0 < size ≤ 50 MB — the Telegram Bot API limit) so
the tool can run without access to the bot's runtime configuration.
Full security validation lives bot-side in src/bot/utils/file_extractor.py:

- APPROVED_DIRECTORY containment via Path.resolve().relative_to()
- Secrets blocklist reusing SecurityValidator.FORBIDDEN_FILENAMES
  and DANGEROUS_FILE_PATTERNS (.env, id_rsa, *.pem, etc.)
- Size limit duplicated defensively

When bot-side validation rejects a path that the tool had already told
Claude was "queued", the user would otherwise see only Claude's "file
sent!" reply with no file arriving. To avoid this silent-mislead, the
orchestrator and classic message handler now collect rejected paths and
send a short "🚫 Отклонено политикой безопасности..." summary after the
response. Only outside-approved and secrets-blocklist rejections are
surfaced — size/empty/not-absolute are already reported by the tool
itself, so Claude's explanation is accurate and we don't duplicate it.

Delivery path mirrors _send_images: a new _send_documents() iterates
mcp_files, sends each via reply_document with per-file caption, logs
per-file exceptions, and appends failures to the same summary.

Tests: 17 new cases for validate_file_path (positive, Unicode filenames,
path traversal, secrets, size, empty) and 7 for send_file_to_user MCP
tool. All 554 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-facing messages from _send_documents (orchestrator) and the parallel
inline handler in classic mode were in Russian, unlike the rest of the
project. Translate them to English to match the codebase convention:

- "⚠️ Не удалось отправить" → "⚠️ Failed to send"
- "🚫 Отклонено политикой безопасности (нельзя отправлять вне
  APPROVED_DIRECTORY или секреты)" → "🚫 Rejected by security policy
  (outside APPROVED_DIRECTORY or blocked secret file)"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timurvafin
Copy link
Copy Markdown
Author

Follow-ups in this PR

1. English-only user-facing strings (commit 76351bb)
Translated the _send_documents summary messages from Russian to English to match the rest of the project.

2. Why a new _send_documents() instead of reusing _send_images()

This came up in review-style feedback, so worth making explicit in the PR:

_send_images() already has a reply_document branch — but it's for image-specific fallbacks (SVG and large rasters that exceed the Telegram photo API's 10 MB inline limit). Reusing it for arbitrary files would have forced several concessions:

  • Shape/routing. _send_images() splits inputs via should_send_as_photo() into a send_media_group album path and a reply_document fallback. Our files are always documents and never group into albums (Telegram doesn't support document albums). Funneling FileAttachment through the image router would mean dead branches and an extra "is this actually an image" predicate.
  • Caption semantics. _send_images() attaches a single caption only to the first album item (intentional, matches Telegram album UX) and returns caption_sent: bool so callers know not to duplicate the text message. Files have per-file captions (each send_file_to_user(path, caption) call is independent) and never suppress the text reply. Different contract.
  • Blast radius. _send_images() is on the happy path of the existing screenshot/image flow. Broadening it to "any file" risks regressions in image handling for a feature that's independent.
  • The plan explicitly committed to not touching _send_images logic, only adding _send_documents alongside it (design doc constraint, kept scope narrow).

The new _send_documents() is ~40 lines, single-purpose (iterate → reply_document with per-file caption → collect failures → append security rejections to the same summary), and parallel to _send_images() in structure without sharing the image-specific branching.

@timurvafin timurvafin marked this pull request as draft April 24, 2026 06:55
Applies fixes from an independent Codex review of PR RichardAtCT#191.

1. TOCTOU between validate and send (critical). `validate_file_path()`
   now captures the file's `st_ino` and `st_dev` into FileAttachment.
   Both delivery paths (orchestrator._send_documents + classic inline)
   open the path with `O_NOFOLLOW` and verify identity via `fstat` before
   handing the fd to `reply_document`. If the path was swapped with a
   symlink or the inode changed between stream-callback validation and
   send, the file is refused and surfaced in the failure summary.

2. Caption length unbounded (important). Telegram rejects captions
   longer than 1024 chars, which would have turned a valid send into a
   generic "failed to send" for the user. Caption is now truncated to
   MAX_CAPTION_CHARS=1024 at validation time.

3. Missing symlink test coverage (important). image_extractor has
   coverage for `symlink-inside-approved -> outside` but file_extractor
   did not. Added two tests: one asserting the escape is caught
   (`outside_approved`), one asserting a symlink that resolves inside
   approved dir is accepted (with identity captured from the target).

4. Docstring drift (nice-to-have). `_send_documents()` said `rejected`
   carries `too_large` too — it doesn't, REJECTION_SURFACE_TO_USER
   filters it out. Docstring now matches the actual contract.

All 559 tests pass; live-smoke-tested on @timurv_daily_bot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timurvafin
Copy link
Copy Markdown
Author

Codex review follow-ups (commit 98d1f77)

Ran the diff through an independent Codex review. Four findings, all applied:

  • 🔴 TOCTOU between validate and send. validate_file_path() now captures st_ino + st_dev into FileAttachment. Both delivery paths (_send_documents and the parallel classic-mode inline sender) now open with O_NOFOLLOW and verify identity via fstat before handing the fd to reply_document. If the path gets swapped with a symlink or the inode changes between stream-callback validation and send, the file is refused and surfaced via the failure summary.
  • 🟡 Caption length unbounded. Telegram rejects captions >1024 chars, so a long caption would turn a valid send into a generic failure. Caption is now truncated to MAX_CAPTION_CHARS=1024 at validation time.
  • 🟡 Missing symlink test coverage. Added test_symlink_to_outside_rejected (catches the escape) and test_symlink_within_approved_ok (accepts valid symlink, identity captured from target).
  • 🟢 Docstring drift on _send_documents. Said rejected includes too_large — doesn't; REJECTION_SURFACE_TO_USER filters it out. Docstring now matches the contract.

All 559 tests pass; live-smoke-tested on a running bot to confirm the TOCTOU-safe opener doesn't break the happy path (sendDocument HTTP 200 OK on a normal .txt).

@timurvafin timurvafin marked this pull request as ready for review April 24, 2026 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant