Add block-based runtime caching for remote media#2887
Conversation
Introduces a new caching system for remote media files (avatars, post media, emoji) with improved MIME validation and a filter-first architecture. New files: - includes/class-cache.php - Orchestrator (like Handler pattern) - includes/cache/class-file.php - Abstract base with shared logic - includes/cache/class-avatar.php - Actor avatar caching - includes/cache/class-media.php - Post/comment media caching - includes/cache/class-emoji.php - Custom emoji caching Features: - Hooks into existing code via activitypub_remote_media_url filter - Can be disabled via ACTIVITYPUB_DISABLE_REMOTE_CACHE constant - Per-type filters for granular control - Improved MIME type validation using finfo - Backwards-compatible storage paths Related: #2742, #2743, #2744
- Avatar caching: Lazy via `activitypub_remote_media_url` filter, clears meta on actor update for re-caching on next access - Media caching: Via `save_post_ap_post` hook, attachment URLs passed via `meta_input` (new posts) or stored before update - Emoji caching: Lazy via filter, unchanged - Attachments class: Removed direct file caching, now only handles content markup generation (galleries, file blocks) - Remote_Actors: Removed `cache_avatar()`, `get_avatar_url()` now passes URL through filter for lazy caching - Posts: Extracts attachment URLs before save so they're available at hook time, uses unhook/rehook to prevent infinite loops
Custom emoji shortcodes in remote post content are now replaced with cached local <img> tags when the post is created/updated, matching the behavior for comments.
- Cache\Emoji now handles emoji replacement via hooks: - `save_post_ap_post` for post content - `wp_insert_comment` for comment content - Posts/Interactions store emoji data in meta for hook processing - Removes direct Emoji::prepare_comment_data calls - Consistent with Media/Avatar hook-based caching pattern
- Cache\Emoji now handles file operations only (download, validate, store) - Main Emoji class handles content transformation (replacing shortcodes) - Uses `activitypub_remote_media_url` filter for caching (hookable) - Posts/Interactions use `Emoji::prepare_*` at insert-time (no temp meta) - Removes double-save pattern for emoji replacement - Updates Attachments tests to disable Cache\Media during tests
- Fix Media::cache_post_media to not invalidate before checking for remote URLs - Fix Cache\Emoji::maybe_cache to delegate to import() preserving activitypub_pre_import_emoji filter - Allow http:// emoji URLs when caching disabled (with proper filter_var validation) - Standardize activitypub_remote_media_url filter to 4 parameters across all usages - Update Avatar filter signature to match (4 params)
1384c57 to
cd9c2f8
Compare
- Add wp_check_filetype_and_ext validation to File::validate_mime_type() - Refactor Media class to use the filter instead of direct caching - Rename cache_post_media() to process_post_media() for clarity - Always register save_post hook (allows CDN plugins when caching disabled) - Only register maybe_cache filter when local caching is enabled - Add tests for CDN plugin support when caching is disabled This enables third-party CDN plugins (like Jetpack Photon) to intercept all remote media URLs via the activitypub_remote_media_url filter, regardless of whether local caching is enabled.
- Increase hash length from 8 to 16 characters to reduce collision risk - Add is_safe_url() validation using wp_http_validate_url() - Add file_is_displayable_image() check for additional image validation - Replace custom get_unique_path() with WordPress native wp_unique_filename() Inspired by improvements in PR #2889.
- Remove SVG from default allowed MIME types (XSS risk) - Add activitypub_sanitize_svg filter for opt-in SVG support - Use full MD5 hash (32 chars) to reduce collision probability - Add activitypub_should_cache_url filter to skip specific URLs - Add activitypub_media_cached action for logging/analytics - Add activitypub_cache_allowed_mime_types filter for extensibility
- Remove SVG from default allowed MIME types (XSS risk) - Add activitypub_sanitize_svg filter for opt-in SVG support - Use full MD5 hash (32 chars) to reduce collision probability - Add activitypub_should_cache_url filter to skip specific URLs - Add activitypub_media_cached action for logging/analytics - Add activitypub_cache_allowed_mime_types filter for extensibility - Add activitypub_cache_is_safe_url filter for URL validation override - Fix tests by bypassing DNS lookups for example.com URLs
- Remove SVG from default allowed MIME types (XSS risk) - Add activitypub_sanitize_svg filter for opt-in SVG support - Use full MD5 hash (32 chars) to reduce collision probability - Add activitypub_should_cache_url filter to skip specific URLs - Add activitypub_media_cached action for logging/analytics - Add activitypub_cache_allowed_mime_types filter for extensibility - Add activitypub_cache_is_safe_url filter for URL validation override - Fix tests by bypassing DNS lookups for example.com URLs
There was a problem hiding this comment.
Pull request overview
This pull request introduces a new caching system for remote media files (avatars, post media, emoji) as part of addressing issues #2742, #2743, and #2744. The changes separate caching concerns from the Attachments class into a dedicated Activitypub\Cache namespace with improved architecture, MIME validation, and filter-based lazy loading.
Changes:
- Adds new
Cachenamespace with abstractFileclass and concrete implementations (Avatar,Media,Emoji) for type-specific caching - Implements filter-based lazy loading architecture using
activitypub_remote_media_urlfilter, allowing CDN plugins to intercept - Refactors
Attachmentsclass to focus solely on Media Library imports, removing direct file caching logic - Updates
Remote_Actors,Posts,Emoji, andInteractionsclasses to use new filter-based caching approach
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| includes/class-cache.php | New orchestrator class for initializing cache handlers |
| includes/cache/class-file.php | Abstract base class providing shared caching functionality with robust MIME validation using finfo |
| includes/cache/class-avatar.php | Avatar caching with lazy loading via filter and automatic cleanup on actor delete |
| includes/cache/class-media.php | Post/comment media caching with hook-based processing and CDN filter support |
| includes/cache/class-emoji.php | Emoji caching using filename-based hashing for backwards compatibility |
| includes/class-attachments.php | Refactored to focus on Media Library imports, removing direct file caching |
| includes/class-emoji.php | Updated to use filter-based caching and allow remote URLs when caching disabled |
| includes/collection/class-remote-actors.php | Simplified avatar handling using lazy caching via filter |
| includes/collection/class-posts.php | Updated to pass attachment URLs to cache via meta for hook-time processing |
| includes/collection/class-interactions.php | Minor comment updates for emoji handling |
| includes/constants.php | Added ACTIVITYPUB_DISABLE_REMOTE_CACHE constant |
| activitypub.php | Replaced Attachments::init() with Cache::init() |
| tests/* | Comprehensive test coverage for all new cache classes and updated behavior |
- Use cached finfo instance (avoids PHP 8.5 finfo_close() deprecation) - Remove SVG from default allowed MIME types (XSS risk) - Add activitypub_sanitize_svg filter for opt-in SVG support - Use full MD5 hash (32 chars) to reduce collision probability - Add activitypub_should_cache_url filter to skip specific URLs - Add activitypub_pre_download_url filter for test mocking - Add activitypub_media_cached action for logging/analytics - Add activitypub_cache_allowed_mime_types filter for extensibility - Add activitypub_cache_is_safe_url filter for URL validation override - Fix tests by mocking file downloads directly
Replace only URLs within <img src=""> attributes instead of anywhere in content. This prevents accidental URL replacement in links or other contexts. Addresses GitHub Copilot review feedback on PR #2887.
Resolve conflicts by keeping the Cache namespace approach: - Attachments: Focus on Media Library imports only - Remote_Actors: Use activitypub_remote_media_url filter for lazy avatar caching - Emoji: Use filter-based caching approach - Tests: Keep simplified tests that match Cache namespace implementation
New commands: - `wp activitypub cache status` - Show cache statistics (files, size, enabled state) - `wp activitypub cache clear` - Clear all caches or specific type - `wp activitypub cache clear --type=avatar|media|emoji` - Clear specific cache type
|
@jeherve how is jetpacks photon caching remote images? is this done on runtime? |
- Fix CLI cache status to use global Cache::is_enabled() check - Improve is_remote_url() with strict host comparison and prefix matching - Fix emoji hash collisions by using full URL path hash with legacy fallback - Restrict remote emoji URLs to only when caching is disabled (privacy) - Add local path validation in Attachments to prevent file disclosure - Add remote URL validation with wp_http_validate_url() for defense-in-depth - Document unused comment media constants as reserved for future use
The legacy system stored emoji directly in HTML without file caching, so there are no existing cached files to maintain compatibility with.
Add a filter to permit the test directory for local file imports during attachment tests (adds allow_test_directory and registers/unregisters the filter in set_up/tear_down). Also add a guard in the Litespeed Cache integration test to skip the write-permissions scenario when running as root (posix_getuid() == 0), since root can bypass file permission restrictions and would make the test invalid.
- Remove unused `emoji-regex(*SKIP)(?!)` placeholder from esc_hashtag() - Refactor wrap_media_in_content() to use WP_HTML_Tag_Processor for robust HTML parsing (handles any attribute order) - Fix URL encoding: use wp_json_encode() instead of esc_url() to preserve original URL for str_replace matching in render_media_block() - Add (*SKIP)(?!) pattern to prevent double-wrapping already-wrapped images when content is processed multiple times - Add tests for wrap_media_in_content() covering wrapping, attribute order, double-wrap prevention, URL preservation, and local images
Refactor to use a single code path for all remote media caching: - Remove _activitypub_attachments meta and process_attachments_meta() - Attachments not already in content are appended as media blocks - All remote images (inline and appended) are wrapped uniformly - Caching happens lazily at render time via block render_callback Changes: - Posts: Add append_attachment_media_blocks() to inject missing attachments - Posts: Update activity_to_post() to wrap content first, then append - Media: Remove save_post hook and process_attachments_meta() - Tests: Update to trigger lazy caching via do_blocks() with post context - Tests: Add tests for attachment appending and deduplication
- Rename activitypub/media block to activitypub/image - Rename wrap_media_in_content() to process_remote_images() - Rename generate_media_block() to generate_image_block() - Unify inline image wrapping and attachment appending into single function - Remove append_attachment_media_blocks() from Posts collection - Add render_post_content() test helper to reduce duplication
Preserve the ActivityPub `updated` timestamp in emoji block attributes so the cache layer can detect upstream changes. Previously only the URL was stored, making staleness checks impossible for block-rendered emoji. Fix wp_check_filetype_and_ext() call in validate_mime_type() which received a plain MIME list instead of the expected extension-to-MIME map, causing every valid image to be rejected. Use WordPress defaults and the finfo-detected extension instead. Remove SVG handling, custom mime_to_extension(), and GIF skip.
Audio and video attachments from federated posts were silently dropped because extract_attachments() skipped non-image media types. This adds activitypub/audio and activitypub/video custom blocks (matching the existing activitypub/image pattern) so these attachments are preserved in imported content. - Register activitypub/audio and activitypub/video blocks with render callbacks that apply the activitypub_remote_media_url filter - Add generate_audio_block() and generate_video_block() helpers - Add process_remote_media() to route attachments by type - Update extract_attachments() to include audio/video with a type key - Extract media functions into dedicated functions-media.php - Move media tests to class-test-functions-media.php
The ap_ prefix is redundant since the directory is already under /activitypub/, and inconsistent with the other cache directories (actors, comments, emoji).
jeherve
left a comment
There was a problem hiding this comment.
This tests well overall. I played with the CLI commands, checked cached posts, emoji, ap_posts, and actors. Most of it seems to work well. I only have 2 notes.
I tried trashing, then deleting an ap_post, but the cached image in that post remained available in the activitypub/ap_posts directory. This works well for newer posts in the activitypub/posts directory though ; the cache directory does get deleted there.
|
@jeherve |
Replace non-existent self::mime_to_extension() with the WordPress core function wp_get_default_extension_for_mime_type(), matching the pattern already used in the parent File::cache() method.
Update .github/workflows/gardening.yml to add two new path-to-label mappings for CLI: includes/class-cli.php and includes/cli, both labeled as "[Feature] CLI" so PRs touching those files are automatically tagged.
Clean up .github/workflows/gardening.yml by adding the includes/cli entry alongside includes/class-cli.php and removing the duplicate trailing CLI entries. This prevents duplicate label entries in the auto-label configuration and tidies the workflow file.
Fixes #2742, #2743, #2744
Proposed changes:
Introduces a dedicated
Cachenamespace for remote media caching with improved architecture:Cache\Avatar,Cache\Media,Cache\Emoji- each handles type-specific caching with lazy loading viaactivitypub_remote_media_urlfilteractivitypub/emojiblock: Wraps emoji shortcodes, renders with cached URLsactivitypub/image,activitypub/audio,activitypub/videoblocks: Wrap remote media in posts (comments strip remote images for security)<audio>and<video>elements via dedicated blockswp activitypub cache statusandwp activitypub cache clear [--type=<type>]for cache managementactivitypub_remote_media_urlfilter, allowing CDN plugins (Jetpack Photon, Cloudflare) to interceptfinfofor reliable MIME validation, escapes glob metacharacters, strips remote images from comments_activitypub_avatar_url). Instead, they resolve lazily at render time through theactivitypub_remote_media_urlfilterACTIVITYPUB_DISABLE_MEDIA_CACHE) or per-type via filtersis_remote_url(),process_remote_images(),process_remote_media(), and block generator functions infunctions-media.phpBreaking changes:
activitypub_store_attachments_locallyfilter removed — This filter controlled whether incoming post attachments were imported into the Media Library. It is replaced by theactivitypub_should_cache_urlandactivitypub_cache_{type}_enabledfilters, and theACTIVITYPUB_DISABLE_MEDIA_CACHEconstant_activitypub_avatar_urlpost meta no longer used — Avatar URLs are now resolved lazily at render time instead of being stored in post metaImplementation Advantages
Block-based approach preserves original URLs — Original remote URLs are stored in block attributes, allowing cache to be cleared and regenerated without data loss. No destructive content modification.
Runtime rendering vs insert-time replacement — Content processing is deferred to display time. Easy to toggle between cached/uncached without re-processing existing posts. CDN plugins can intercept at render time.
Filter-first architecture — All remote URLs pass through
activitypub_remote_media_urlfilter. Jetpack Photon, Cloudflare, and other CDN plugins can intercept. Plugin authors can implement custom caching strategies.Lazy caching (on-demand) — Media is only cached when actually rendered. No upfront processing of all content. Reduces unnecessary disk I/O and bandwidth.
Separation into dedicated Cache namespace — Each cache type (Avatar, Media, Emoji) has its own class with clear responsibility boundaries. Attachments class simplified to focus only on outgoing post attachments.
Comment-specific handling —
Comment::render_blocks()selectively renders onlyactivitypub/*blocks. Remote images stripped from comments for security while preserving emoji.CLI management commands — Built-in operational tooling with
wp activitypub cache statusandwp activitypub cache clear --type=<type>for visibility and control.Block-based caching flow:
CLI Commands:
Storage structure:
Other information:
Testing instructions:
/wp-content/uploads/activitypub/actors/<audio>/<video>elementsnpm run env-testChangelog entry
Changelog Entry Details
Significance
Type
Message
Add Cache namespace for remote media caching with CLI commands, block-based runtime caching, and filter-based architecture.