Skip to content

Add block-based runtime caching for remote media#2887

Merged
pfefferle merged 42 commits intotrunkfrom
add/remote-media-cache-separation
Feb 11, 2026
Merged

Add block-based runtime caching for remote media#2887
pfefferle merged 42 commits intotrunkfrom
add/remote-media-cache-separation

Conversation

@pfefferle
Copy link
Member

@pfefferle pfefferle commented Feb 6, 2026

Fixes #2742, #2743, #2744

Proposed changes:

Introduces a dedicated Cache namespace for remote media caching with improved architecture:

  • New Cache classes: Cache\Avatar, Cache\Media, Cache\Emoji - each handles type-specific caching with lazy loading via activitypub_remote_media_url filter
  • Block-based runtime caching: Emoji and media are wrapped in WordPress blocks at insert time, cached lazily at render time
    • activitypub/emoji block: Wraps emoji shortcodes, renders with cached URLs
    • activitypub/image, activitypub/audio, activitypub/video blocks: Wrap remote media in posts (comments strip remote images for security)
  • Audio/video support: Incoming posts with audio or video attachments now render as native <audio> and <video> elements via dedicated blocks
  • CLI commands: wp activitypub cache status and wp activitypub cache clear [--type=<type>] for cache management
  • Filter-first architecture: All remote media URLs pass through activitypub_remote_media_url filter, allowing CDN plugins (Jetpack Photon, Cloudflare) to intercept
  • Improved security: Uses finfo for reliable MIME validation, escapes glob metacharacters, strips remote images from comments
  • Lazy avatar resolution: Avatars are no longer eagerly downloaded and stored in post meta (_activitypub_avatar_url). Instead, they resolve lazily at render time through the activitypub_remote_media_url filter
  • Simplified Attachments class: Incoming post attachments are no longer imported into the Media Library. Instead, remote media is cached as files in the uploads directory via the new Cache classes. The Attachments class now only handles outgoing post attachments
  • Granular control: Can disable globally (ACTIVITYPUB_DISABLE_MEDIA_CACHE) or per-type via filters
  • New media helper functions: is_remote_url(), process_remote_images(), process_remote_media(), and block generator functions in functions-media.php

Breaking changes:

  • activitypub_store_attachments_locally filter removed — This filter controlled whether incoming post attachments were imported into the Media Library. It is replaced by the activitypub_should_cache_url and activitypub_cache_{type}_enabled filters, and the ACTIVITYPUB_DISABLE_MEDIA_CACHE constant
  • _activitypub_avatar_url post meta no longer used — Avatar URLs are now resolved lazily at render time instead of being stored in post meta

Implementation Advantages

  1. Block-based approach preserves original URLs — Original remote URLs are stored in block attributes, allowing cache to be cleared and regenerated without data loss. No destructive content modification.

  2. Runtime rendering vs insert-time replacement — Content processing is deferred to display time. Easy to toggle between cached/uncached without re-processing existing posts. CDN plugins can intercept at render time.

  3. Filter-first architecture — All remote URLs pass through activitypub_remote_media_url filter. Jetpack Photon, Cloudflare, and other CDN plugins can intercept. Plugin authors can implement custom caching strategies.

  4. Lazy caching (on-demand) — Media is only cached when actually rendered. No upfront processing of all content. Reduces unnecessary disk I/O and bandwidth.

  5. Separation into dedicated Cache namespace — Each cache type (Avatar, Media, Emoji) has its own class with clear responsibility boundaries. Attachments class simplified to focus only on outgoing post attachments.

  6. Comment-specific handlingComment::render_blocks() selectively renders only activitypub/* blocks. Remote images stripped from comments for security while preserving emoji.

  7. CLI management commands — Built-in operational tooling with wp activitypub cache status and wp activitypub cache clear --type=<type> for visibility and control.

Block-based caching flow:

Insert time (storing content):
  Content with :emoji: shortcode  → <!-- wp:activitypub/emoji {"url":"..."} -->:emoji:<!-- /wp:... -->
  Content with <img src="...">    → <!-- wp:activitypub/image {"url":"..."} --><img><!-- /wp:... -->
  Audio attachment                → <!-- wp:activitypub/audio {"url":"..."} --><audio><!-- /wp:... -->
  Video attachment                → <!-- wp:activitypub/video {"url":"..."} --><video><!-- /wp:... -->

Render time (displaying content):
  Block render_callback → apply_filters('activitypub_remote_media_url') → cached URL in output

CLI Commands:

# Show cache status (files, size, enabled state)
wp activitypub cache status

# Show status as JSON
wp activitypub cache status --format=json

# Clear all caches
wp activitypub cache clear

# Clear specific cache type
wp activitypub cache clear --type=avatar
wp activitypub cache clear --type=media
wp activitypub cache clear --type=emoji

Storage structure:

/wp-content/uploads/activitypub/
├── actors/{actor_id}/{hash}.webp      # Avatars
├── emoji/{domain}/{hash}.webp         # Emoji by source domain
├── posts/{post_id}/{hash}.webp        # Post media
└── comments/{comment_id}/{hash}.webp  # Comment media

Other information:

  • Have you written new tests for your changes, if applicable?

Testing instructions:

  1. Enable the plugin and follow a remote actor with an avatar
  2. Verify avatar is cached in /wp-content/uploads/activitypub/actors/
  3. Create/receive a federated post with images and custom emoji
  4. Verify media is cached in appropriate directories
  5. Verify emoji shortcodes in post content are wrapped with blocks (check post_content in database)
  6. Receive a post with audio/video attachments, verify they render as <audio>/<video> elements
  7. Test CLI commands:
    wp activitypub cache status
    wp activitypub cache clear --type=avatar --yes
  8. Test disabling cache via constant:
    define( 'ACTIVITYPUB_DISABLE_MEDIA_CACHE', true );
  9. Run tests: npm run env-test

Changelog entry

  • Automatically create a changelog entry from the details below.
Changelog Entry Details

Significance

  • Patch
  • Minor
  • Major

Type

  • Added - for new features
  • Changed - for changes in existing functionality
  • Deprecated - for soon-to-be removed features
  • Removed - for now removed features
  • Fixed - for any bug fixes
  • Security - in case of vulnerabilities

Message

Add Cache namespace for remote media caching with CLI commands, block-based runtime caching, and filter-based architecture.

Introduces a new caching system for remote media files (avatars, post
media, emoji) with improved MIME validation and a filter-first architecture.

New files:
- includes/class-cache.php - Orchestrator (like Handler pattern)
- includes/cache/class-file.php - Abstract base with shared logic
- includes/cache/class-avatar.php - Actor avatar caching
- includes/cache/class-media.php - Post/comment media caching
- includes/cache/class-emoji.php - Custom emoji caching

Features:
- Hooks into existing code via activitypub_remote_media_url filter
- Can be disabled via ACTIVITYPUB_DISABLE_REMOTE_CACHE constant
- Per-type filters for granular control
- Improved MIME type validation using finfo
- Backwards-compatible storage paths

Related: #2742, #2743, #2744
- Avatar caching: Lazy via `activitypub_remote_media_url` filter,
  clears meta on actor update for re-caching on next access
- Media caching: Via `save_post_ap_post` hook, attachment URLs
  passed via `meta_input` (new posts) or stored before update
- Emoji caching: Lazy via filter, unchanged
- Attachments class: Removed direct file caching, now only handles
  content markup generation (galleries, file blocks)
- Remote_Actors: Removed `cache_avatar()`, `get_avatar_url()` now
  passes URL through filter for lazy caching
- Posts: Extracts attachment URLs before save so they're available
  at hook time, uses unhook/rehook to prevent infinite loops
Custom emoji shortcodes in remote post content are now replaced
with cached local <img> tags when the post is created/updated,
matching the behavior for comments.
- Cache\Emoji now handles emoji replacement via hooks:
  - `save_post_ap_post` for post content
  - `wp_insert_comment` for comment content
- Posts/Interactions store emoji data in meta for hook processing
- Removes direct Emoji::prepare_comment_data calls
- Consistent with Media/Avatar hook-based caching pattern
- Cache\Emoji now handles file operations only (download, validate, store)
- Main Emoji class handles content transformation (replacing shortcodes)
- Uses `activitypub_remote_media_url` filter for caching (hookable)
- Posts/Interactions use `Emoji::prepare_*` at insert-time (no temp meta)
- Removes double-save pattern for emoji replacement
- Updates Attachments tests to disable Cache\Media during tests
- Fix Media::cache_post_media to not invalidate before checking for remote URLs
- Fix Cache\Emoji::maybe_cache to delegate to import() preserving activitypub_pre_import_emoji filter
- Allow http:// emoji URLs when caching disabled (with proper filter_var validation)
- Standardize activitypub_remote_media_url filter to 4 parameters across all usages
- Update Avatar filter signature to match (4 params)
@pfefferle pfefferle force-pushed the add/remote-media-cache-separation branch from 1384c57 to cd9c2f8 Compare February 6, 2026 09:55
- Add wp_check_filetype_and_ext validation to File::validate_mime_type()
- Refactor Media class to use the filter instead of direct caching
- Rename cache_post_media() to process_post_media() for clarity
- Always register save_post hook (allows CDN plugins when caching disabled)
- Only register maybe_cache filter when local caching is enabled
- Add tests for CDN plugin support when caching is disabled

This enables third-party CDN plugins (like Jetpack Photon) to intercept
all remote media URLs via the activitypub_remote_media_url filter,
regardless of whether local caching is enabled.
- Increase hash length from 8 to 16 characters to reduce collision risk
- Add is_safe_url() validation using wp_http_validate_url()
- Add file_is_displayable_image() check for additional image validation
- Replace custom get_unique_path() with WordPress native wp_unique_filename()

Inspired by improvements in PR #2889.
- Remove SVG from default allowed MIME types (XSS risk)
- Add activitypub_sanitize_svg filter for opt-in SVG support
- Use full MD5 hash (32 chars) to reduce collision probability
- Add activitypub_should_cache_url filter to skip specific URLs
- Add activitypub_media_cached action for logging/analytics
- Add activitypub_cache_allowed_mime_types filter for extensibility
- Remove SVG from default allowed MIME types (XSS risk)
- Add activitypub_sanitize_svg filter for opt-in SVG support
- Use full MD5 hash (32 chars) to reduce collision probability
- Add activitypub_should_cache_url filter to skip specific URLs
- Add activitypub_media_cached action for logging/analytics
- Add activitypub_cache_allowed_mime_types filter for extensibility
- Add activitypub_cache_is_safe_url filter for URL validation override
- Fix tests by bypassing DNS lookups for example.com URLs
- Remove SVG from default allowed MIME types (XSS risk)
- Add activitypub_sanitize_svg filter for opt-in SVG support
- Use full MD5 hash (32 chars) to reduce collision probability
- Add activitypub_should_cache_url filter to skip specific URLs
- Add activitypub_media_cached action for logging/analytics
- Add activitypub_cache_allowed_mime_types filter for extensibility
- Add activitypub_cache_is_safe_url filter for URL validation override
- Fix tests by bypassing DNS lookups for example.com URLs
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a new caching system for remote media files (avatars, post media, emoji) as part of addressing issues #2742, #2743, and #2744. The changes separate caching concerns from the Attachments class into a dedicated Activitypub\Cache namespace with improved architecture, MIME validation, and filter-based lazy loading.

Changes:

  • Adds new Cache namespace with abstract File class and concrete implementations (Avatar, Media, Emoji) for type-specific caching
  • Implements filter-based lazy loading architecture using activitypub_remote_media_url filter, allowing CDN plugins to intercept
  • Refactors Attachments class to focus solely on Media Library imports, removing direct file caching logic
  • Updates Remote_Actors, Posts, Emoji, and Interactions classes to use new filter-based caching approach

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
includes/class-cache.php New orchestrator class for initializing cache handlers
includes/cache/class-file.php Abstract base class providing shared caching functionality with robust MIME validation using finfo
includes/cache/class-avatar.php Avatar caching with lazy loading via filter and automatic cleanup on actor delete
includes/cache/class-media.php Post/comment media caching with hook-based processing and CDN filter support
includes/cache/class-emoji.php Emoji caching using filename-based hashing for backwards compatibility
includes/class-attachments.php Refactored to focus on Media Library imports, removing direct file caching
includes/class-emoji.php Updated to use filter-based caching and allow remote URLs when caching disabled
includes/collection/class-remote-actors.php Simplified avatar handling using lazy caching via filter
includes/collection/class-posts.php Updated to pass attachment URLs to cache via meta for hook-time processing
includes/collection/class-interactions.php Minor comment updates for emoji handling
includes/constants.php Added ACTIVITYPUB_DISABLE_REMOTE_CACHE constant
activitypub.php Replaced Attachments::init() with Cache::init()
tests/* Comprehensive test coverage for all new cache classes and updated behavior

- Use cached finfo instance (avoids PHP 8.5 finfo_close() deprecation)
- Remove SVG from default allowed MIME types (XSS risk)
- Add activitypub_sanitize_svg filter for opt-in SVG support
- Use full MD5 hash (32 chars) to reduce collision probability
- Add activitypub_should_cache_url filter to skip specific URLs
- Add activitypub_pre_download_url filter for test mocking
- Add activitypub_media_cached action for logging/analytics
- Add activitypub_cache_allowed_mime_types filter for extensibility
- Add activitypub_cache_is_safe_url filter for URL validation override
- Fix tests by mocking file downloads directly
Replace only URLs within <img src=""> attributes instead of anywhere in
content. This prevents accidental URL replacement in links or other
contexts.

Addresses GitHub Copilot review feedback on PR #2887.
Resolve conflicts by keeping the Cache namespace approach:
- Attachments: Focus on Media Library imports only
- Remote_Actors: Use activitypub_remote_media_url filter for lazy avatar caching
- Emoji: Use filter-based caching approach
- Tests: Keep simplified tests that match Cache namespace implementation
@pfefferle pfefferle marked this pull request as ready for review February 6, 2026 14:00
New commands:
- `wp activitypub cache status` - Show cache statistics (files, size, enabled state)
- `wp activitypub cache clear` - Clear all caches or specific type
- `wp activitypub cache clear --type=avatar|media|emoji` - Clear specific cache type
@pfefferle
Copy link
Member Author

@jeherve how is jetpacks photon caching remote images? is this done on runtime?

@pfefferle pfefferle changed the title Add Cache namespace for remote media caching Add block-based runtime caching for remote media Feb 7, 2026
@pfefferle pfefferle requested a review from Copilot February 7, 2026 22:40
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.

- Fix CLI cache status to use global Cache::is_enabled() check
- Improve is_remote_url() with strict host comparison and prefix matching
- Fix emoji hash collisions by using full URL path hash with legacy fallback
- Restrict remote emoji URLs to only when caching is disabled (privacy)
- Add local path validation in Attachments to prevent file disclosure
- Add remote URL validation with wp_http_validate_url() for defense-in-depth
- Document unused comment media constants as reserved for future use
The legacy system stored emoji directly in HTML without file caching,
so there are no existing cached files to maintain compatibility with.
Add a filter to permit the test directory for local file imports during attachment tests (adds allow_test_directory and registers/unregisters the filter in set_up/tear_down). Also add a guard in the Litespeed Cache integration test to skip the write-permissions scenario when running as root (posix_getuid() == 0), since root can bypass file permission restrictions and would make the test invalid.
- Remove unused `emoji-regex(*SKIP)(?!)` placeholder from esc_hashtag()
- Refactor wrap_media_in_content() to use WP_HTML_Tag_Processor for
  robust HTML parsing (handles any attribute order)
- Fix URL encoding: use wp_json_encode() instead of esc_url() to
  preserve original URL for str_replace matching in render_media_block()
- Add (*SKIP)(?!) pattern to prevent double-wrapping already-wrapped
  images when content is processed multiple times
- Add tests for wrap_media_in_content() covering wrapping, attribute
  order, double-wrap prevention, URL preservation, and local images
Refactor to use a single code path for all remote media caching:

- Remove _activitypub_attachments meta and process_attachments_meta()
- Attachments not already in content are appended as media blocks
- All remote images (inline and appended) are wrapped uniformly
- Caching happens lazily at render time via block render_callback

Changes:
- Posts: Add append_attachment_media_blocks() to inject missing attachments
- Posts: Update activity_to_post() to wrap content first, then append
- Media: Remove save_post hook and process_attachments_meta()
- Tests: Update to trigger lazy caching via do_blocks() with post context
- Tests: Add tests for attachment appending and deduplication
- Rename activitypub/media block to activitypub/image
- Rename wrap_media_in_content() to process_remote_images()
- Rename generate_media_block() to generate_image_block()
- Unify inline image wrapping and attachment appending into single function
- Remove append_attachment_media_blocks() from Posts collection
- Add render_post_content() test helper to reduce duplication
Preserve the ActivityPub `updated` timestamp in emoji block attributes
so the cache layer can detect upstream changes. Previously only the URL
was stored, making staleness checks impossible for block-rendered emoji.

Fix wp_check_filetype_and_ext() call in validate_mime_type() which
received a plain MIME list instead of the expected extension-to-MIME map,
causing every valid image to be rejected. Use WordPress defaults and the
finfo-detected extension instead.

Remove SVG handling, custom mime_to_extension(), and GIF skip.
Audio and video attachments from federated posts were silently dropped
because extract_attachments() skipped non-image media types. This adds
activitypub/audio and activitypub/video custom blocks (matching the
existing activitypub/image pattern) so these attachments are preserved
in imported content.

- Register activitypub/audio and activitypub/video blocks with render
  callbacks that apply the activitypub_remote_media_url filter
- Add generate_audio_block() and generate_video_block() helpers
- Add process_remote_media() to route attachments by type
- Update extract_attachments() to include audio/video with a type key
- Extract media functions into dedicated functions-media.php
- Move media tests to class-test-functions-media.php
The ap_ prefix is redundant since the directory is already under
/activitypub/, and inconsistent with the other cache directories
(actors, comments, emoji).
Copy link
Member

@jeherve jeherve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests well overall. I played with the CLI commands, checked cached posts, emoji, ap_posts, and actors. Most of it seems to work well. I only have 2 notes.


I tried trashing, then deleting an ap_post, but the cached image in that post remained available in the activitypub/ap_posts directory. This works well for newer posts in the activitypub/posts directory though ; the cache directory does get deleted there.

@pfefferle
Copy link
Member Author

@jeherve activitypub/ap_posts was the old caching and because the old caching directly replaced the URL in the HTML, this folder can't be easily removed (yet).

Replace non-existent self::mime_to_extension() with the WordPress core
function wp_get_default_extension_for_mime_type(), matching the pattern
already used in the parent File::cache() method.
Update .github/workflows/gardening.yml to add two new path-to-label mappings for CLI: includes/class-cli.php and includes/cli, both labeled as "[Feature] CLI" so PRs touching those files are automatically tagged.
Clean up .github/workflows/gardening.yml by adding the includes/cli entry alongside includes/class-cli.php and removing the duplicate trailing CLI entries. This prevents duplicate label entries in the auto-label configuration and tidies the workflow file.
@pfefferle pfefferle requested a review from jeherve February 11, 2026 15:04
@pfefferle pfefferle merged commit 0879fee into trunk Feb 11, 2026
10 checks passed
@pfefferle pfefferle deleted the add/remote-media-cache-separation branch February 11, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cached remote meda filenames might collide

3 participants