Skip to content

Releases: josego85/pdf-content-search

1.5.0

08 Nov 17:27
0b10827

Choose a tag to compare

[1.5.0] - 2025-11-08

Added

  • Professional Search UI:
    • Modern gradient background design (gray-50 → blue-50 → gray-100)
    • Hero section with centered icon and professional typography
    • Enhanced search box with keyboard shortcuts (/ to focus, ESC to clear)
    • Search performance metrics display (result count and duration in ms)
    • Grid/List view toggle for search results
    • Favorites system with localStorage persistence
    • Professional loading states with dual-ring spinner
    • Improved empty states with actionable suggestions
    • Initial state showing feature benefits (Lightning Fast, Smart Highlighting, In-Page Highlighting)

Changed

  • Comprehensive Responsive Design Implementation:
    • CRITICAL FIX: Added viewport meta tag to base.html.twig (essential for proper mobile rendering)
    • Mobile-First Strategy: Implemented progressive enhancement from mobile (320px) → tablet (640px+) → desktop (1024px+)
    • Responsive Breakpoints:
      • Mobile (default): 0-640px - Single column layout, compact UI, essential features
      • Tablet (sm): 640px+ - Two column grid, medium spacing, expanded features
      • Desktop (md/lg): 768px+ - Full feature set, maximum spacing, complete text labels
    • Touch Optimization:
      • All interactive elements meet 44x44px minimum touch target size
      • Added touch-manipulation CSS for better mobile interaction
      • Improved button and control sizes for tablet/mobile devices
    • Typography Scaling:
      • Hero title: text-3xl (mobile) → sm:text-4xl (tablet) → md:text-5xl (desktop)
      • Search input: py-3 (mobile) → sm:py-4 (tablet) → md:py-5 (desktop)
      • All text elements scale progressively across breakpoints
    • Component-Specific Improvements:
      • Search Container: Responsive padding px-4 sm:px-6 lg:px-8, py-6 sm:py-8 md:py-12
      • Hero: Scaled icons and text with horizontal padding to prevent clipping
      • Search Bar: Optimized input padding, simplified placeholder for mobile, responsive clear button
      • Controls: Conditional text display (hide verbose labels on mobile), responsive icons
      • Results Grid: Smart breakpoints grid-cols-1 sm:grid-cols-2, progressive gap sizing
      • Result Cards: Compact badges on mobile, icon-only "View PDF" button on small screens, improved text truncation
      • State Components: All loading, empty, error, and initial states fully responsive
    • Layout Optimizations:
      • Progressive spacing: smaller margins/padding on mobile, larger on desktop
      • Responsive border radius: rounded-xl sm:rounded-2xl
      • Flexible grid layouts with proper breakpoint transitions
    • Accessibility Enhancements:
      • Added ARIA labels to all interactive buttons
      • Improved semantic HTML with lang="en" attribute
      • Better keyboard navigation support
      • Enhanced screen reader compatibility
    • Technical Improvements:
      • Added flex-shrink-0 to prevent unwanted layout collapse
      • Used break-words for proper long text handling
      • Implemented min-w-0 for correct flexbox text truncation
      • Replaced space-x with gap utilities for better mobile support
      • All interactive states include active: pseudo-classes for touch feedback
  • Modular Component Architecture (SOLID Principles):
    • Refactored SearchComponent.vue (440 lines) into 9 specialized components
    • Applied Single Responsibility Principle for better maintainability
    • Component structure: search/Search.vue with Hero, Bar, Controls, Results, ResultCard, and 4 state components
  • Vue.js Optimization:
    • Enabled runtime-only build (~33KB bundle size reduction)
    • Updated webpack.config.js with runtimeCompilerBuild: false
    • Migrated from DOM template compilation to direct component mounting
    • Simplified templates/search.html.twig (removed <search-component> tag)
  • Component Naming Convention:
    • Adopted Vue 3 Style Guide enterprise conventions
    • Path-based naming: components named by context, not redundant prefixes
    • Cleaner imports: import Hero from './Hero.vue' vs import SearchHero from './SearchHero.vue'

Removed

  • Monolithic SearchComponent.vue replaced by modular architecture

What's Changed

  • feat: comprehensive responsive design implementation for mobile and tablet by @josego85 in #5
  • Feature/improve search UI by @josego85 in #6

Full Changelog: v1.4.0...v1.5.0

1.4.0

08 Nov 14:55
b86a610

Choose a tag to compare

[1.4.0] - 2025-11-08

Added

  • PDF Highlighting System:
    • Intelligent hybrid word boundary detection for accurate highlighting
    • Automatic detection of malformed PDF text layers (words without spaces)
    • Context-aware matching: strict word boundaries for normal text, permissive for malformed PDFs
    • Support for special characters in word boundaries (bullets, em-dashes, etc.)
    • Position mapping system for precise character-level highlighting across normalized and original text
  • Search Architecture (SOLID):
    • QueryBuilderInterface contract for search engine abstraction
    • SearchStrategy enum (HYBRID, EXACT, PREFIX) for configurable search behavior
    • QueryParser service for advanced search operators ("quotes", +required, -exclude)
    • SearchQueryBuilder with intelligent hybrid search: exact matches prioritized, fuzzy only for 5+ char words
  • Docker Infrastructure:
    • Multi-stage Docker setup (development and production)
    • Alpine-based images for 71% size reduction (525MB dev, ~250MB prod vs 1.82GB)
    • Separate Dockerfiles for dev and prod environments
    • .dockerignore for optimized builds
    • Comprehensive Docker documentation in docs/docker.md

Changed

  • PDF.js Upgrade:
    • Upgraded PDF.js from v2.16.105 (2022) to v5.4.394 (2025)
    • Migrated to modern PDF.js v5 API (TextLayer class instead of TextLayerBuilder)
    • Updated webpack configuration to copy .mjs worker files for PDF.js v5
    • Improved text layer rendering with better spacing and positioning
  • PDF Highlighting:
    • Refactored highlighting algorithm to mark all occurrences (previously only first occurrence)
    • Implemented word boundary validation to prevent false matches (e.g., "java" in "javascript")
    • Uses ultra-minimal CSS with all: unset to prevent text duplication
    • Highlight color changed to soft yellow (#fef3c7) matching search results preview
    • Text rendered on canvas with transparent text layer overlay for clean highlighting
    • Removed debugging console.log statements for production-ready code
  • Search Logic:
    • Refactored search to prioritize exact matches (10x boost), then word matches (5x), then fuzzy (1x)
    • Fixes issue where "jos" incorrectly matched "job" - now only exact or close matches
    • SearchController now depends on QueryBuilderInterface (Dependency Inversion Principle)
  • Docker Configuration:
    • Migrated from Debian to Alpine Linux base images
    • Reorganized Docker files: .docker/dev/ and .docker/prod/ structure
    • Renamed compose.yaml to docker-compose.yml (production base)
    • docker-compose.override.yml auto-loaded for development
    • Apache and PHP configs moved to .docker/dev/ subdirectories
  • Documentation:
    • Moved Docker documentation from README to docs/docker.md
    • Simplified README with link to detailed Docker docs
  • build: Updated PHP from version 8.4.11 to 8.4.14
  • build: Updated ElasticSearch from version 8.17.1 to 8.17.10
  • build: Updated Kibana from version 8.17.1 to 8.17.10
  • build(deps): Updated Composer dependencies to latest compatible versions
  • build(deps): Updated npm dependencies:
    • Vue.js from 3.5.13 to 3.5.24
    • @vue/compiler-sfc from 3.5.13 to 3.5.24
    • postcss from 8.5.3 to 8.5.6
    • Fixed dependency versions (removed ^ ranges) for reproducible builds
    • Fixed 2 low severity npm vulnerabilities (brace-expansion, tmp)

Removed

  • Root Dockerfile in favor of organized .docker/dev/ and .docker/prod/ structure
  • compose.override.yaml replaced by docker-compose.override.yml
  • Makefile commands (using standard docker-compose commands)

Fixed

  • PDF Highlighting Issues:
    • Fixed text duplication/overlapping in PDF viewer caused by visible text in both canvas and text layer
    • Corrected word boundary detection to properly skip compound words like "javascript" when searching "java"
    • Fixed highlighting to find all occurrences instead of just the first one per span
    • Resolved issues with highlighting words containing accents (e.g., "José" when searching "jose")
    • Fixed text layer dimensions to match viewport size in PDF.js v5
  • Elasticsearch single-node configuration (cluster.routing.allocation.disk.threshold_enabled=false)
  • ElasticsearchService::deleteIndex() now checks index existence before deletion

What's Changed

Full Changelog: v1.3.1...v1.4.0

1.3.1

05 Aug 16:52
d99e95e

Choose a tag to compare

[1.3.1] - 2025-08-05

Added

  • Support for the PHP intl extension to enhance internationalization features and improve overall performance.

Changed

  • build: Upgraded Symfony from version 7.2 to 7.3.
  • build: Upgraded PHP from version 8.3 to 8.4.
  • docs: Updated application version badge in README.md.

Fixed

  • Missing intl PHP extension warning in Symfony during runtime.

What's Changed

Full Changelog: v1.3.0...v1.3.1

1.3.0

05 Aug 15:17
3ec1b30

Choose a tag to compare

[1.3.0] - 2025-08-05

Added

  • PDF Viewer Integration:

    • Added a new PDF viewer route (/viewer) that allows users to open a PDF document at a specific page using ?path=...&page=....
    • Integrated PDF.js from Mozilla to render PDF pages directly in the browser using a <canvas> and a dynamic text layer.
    • Implemented search term highlighting for a given query using ?q=..., applied on the specified page.
    • Highlighting is case-insensitive and styled using <mark> elements injected into the text layer.
    • The highlight feature now retrieves terms directly from the Elasticsearch results (search highlights), enabling a more seamless experience when navigating between results.
    • Added parsing and injection of the highlighted terms into the PDF viewer dynamically, improving the user experience.
    • Limitations: Currently highlights only the first occurrence of the search term per span (this limitation will be improved in future versions).
    • Improved the highlight feature to correctly show all matches including those with accented characters, fixing issues where accented terms were partially or incorrectly highlighted.
  • Project Management:

    • Added TODO.md document to track pending features, improvements, and technical debt.
    • Serves as a lightweight roadmap for contributors and team members.
  • Accent and Special Character Normalization:

    • The indexer now replaces accented characters and special variations of vowels (e.g., á, é, í, ó, ú, ü) with their plain equivalents (a, e, i, o, u) during indexing.
    • This improves the consistency of search queries and results when users omit accents.

Changed

  • Indexer and Search Refactoring:

    • Major refactor of the indexer and search logic to improve maintainability and search consistency.
    • Normalized input during both indexing and querying phases to better handle special characters and improve match accuracy.
  • Search Results Handling:

    • The highlight terms fetched from Elasticsearch are now processed and passed to the PDF viewer for more accurate highlighting.
    • The search terms in Elasticsearch are parsed to ensure they are appropriately reflected in the PDF viewer.
    • Enhanced handling of search results to integrate smoothly with the PDF viewer.

Known Issues

  • Character Encoding:
    • There may be issues with certain characters (e.g., accented characters) not being properly highlighted in the PDF viewer. This issue will be addressed in future versions.
    • The rendering of accented characters such as José in the highlights might not be perfect due to encoding differences between the PDF content and the search terms.

What's Changed

New Contributors

Full Changelog: v1.2.2...v1.3.0

v1.2.2

09 Apr 16:58

Choose a tag to compare

[1.2.2] - 2025-04-09

Added

  • .php-cs-fixer.cache added to .gitignore to avoid committing temporary fixer cache files.
  • Code Style Enforcement:
    • Introduced Husky to run style checks automatically before each commit.
    • Configured a pre-commit Git hook to run composer cs-check and block commits if style violations are detected.
  • .editorconfig added to enforce consistent formatting across editors:
    • Enforces 4-space indentation for PHP, 2 spaces for YAML, JSON, and JS files.
    • Uses LF line endings and trims trailing whitespace.
    • Ensures consistent newline endings and UTF-8 encoding.

Changed

  • Enhanced PHP-CS-Fixer configuration:
    • Added full rule set aligned with PHP 8.3 best practices.
    • Enforced stricter and explicit code style across the codebase.
  • Applied PHP-CS-Fixer rules to refactor and reformat multiple PHP files for consistency.

v1.2.1

08 Apr 20:11

Choose a tag to compare

[1.2.1] - 2025-04-08

Added

  • Documentation Improvements:
    • Expanded README.md with detailed sections:
      • Features: Highlighting key functionalities like page-level PDF search, real-time results, and content highlighting.
      • Technologies: Comprehensive list of tools and frameworks used.
      • Requirements: Clear prerequisites for running the project.
      • Installation: Step-by-step guide for setting up the project.
      • Docker Setup: Instructions for building and running containers.
      • Configuration: Explanation of environment variables and service bindings.
      • PDF Management: Instructions for organizing and indexing PDFs.
      • Usage: Detailed guide on how to use the application.
      • Development: Added frontend and backend development workflows.
      • Elasticsearch: Commands for managing indices and monitoring cluster health.
      • Maintenance: Steps for clearing caches, updating dependencies, and rebuilding containers.
      • Troubleshooting: Common issues and solutions for Elasticsearch, frontend, and PDF indexing.
      • Security: Recommendations for securing the application in production.
      • Contributing: Guidelines for contributing to the project.

Changed

  • Refactored PDF indexing process:
    • Split PDFs into individual pages for better granularity.
    • Improved text extraction accuracy using pdftotext.
    • Enhanced metadata handling (e.g., total page count, file paths).
    • Improved error reporting for failed indexing operations.
  • SearchController constructor refactoring:
    • Injected pdfPagesIndex from configuration instead of hardcoding the index name.
  • Dockerfile cleanup:
    • Removed unused system package previously required for older workflows.
  • PDF folder restructuring:
    • Changed location of indexed PDFs from var/pdfs/ to a more appropriate and web-accessible directory (public/pdfs/) for easier linking and access.

Fixed

  • Fixed Search Issues:
    • Resolved issue where similar words (e.g., "lose" instead of "Jose") were incorrectly highlighted.
    • Adjusted frontend logic to highlight only exact matches for search terms using regular expressions.
    • Enhanced backend query precision for Elasticsearch highlighting.
    • Fixed context display in search results for better readability.
  • Addressed missing or unclear instructions in the README.md:
    • Added steps for verifying dependencies and services.
    • Clarified Docker commands for starting and stopping containers.
    • Included examples for debugging and troubleshooting common issues.

v1.2.0

08 Apr 20:00

Choose a tag to compare

[1.2.0] - 2025-04-08

Added

  • PDF Page-Level Search:
    • Individual page indexing for PDFs
    • Page content extraction with context
    • Page number tracking in search results
    • Direct PDF page links in results
  • Enhanced Search Results:
    • Context snippets with highlighted matches
    • Page-specific navigation in PDFs
    • PDF preview integration in browser
    • Page count information display
  • Command Improvements:
    • Page-by-page PDF processing
    • Unique ID generation per page
    • Better error handling per page
    • Progress indicators for indexing

Changed

  • Refactored PDF indexing process:
    • Split PDFs into individual pages
    • Improved text extraction accuracy
    • Enhanced metadata handling
    • Better error reporting
  • Updated search interface:
    • Added page-specific result display
    • Improved result highlighting
    • Enhanced PDF viewer integration
    • Better result organization

Fixed

  • PDF page counting accuracy
  • Text extraction reliability
  • Search result context display
  • PDF viewer integration issues
  • Improved Content Highlighting:
    • Resolved issue where similar words (e.g., "lose" instead of "Jose") were incorrectly highlighted.
    • Adjusted frontend logic to highlight only exact matches for search terms.
    • Enhanced backend query to improve precision in highlighting.

v1.1.0

08 Apr 13:49

Choose a tag to compare

[1.1.0] - 2025-04-08

Added

  • Created SearchEngineInterface for search service abstraction
  • Improved error handling in Elasticsearch operations
  • Added type hints and return types for better code clarity
  • Frontend Search Implementation:
    • Vue.js search component with real-time feedback
    • Tailwind CSS styling and responsive design
    • Search results highlighting
    • Loading states and error handling
    • Debounced search functionality
    • Document metadata display (date, score)
  • Development Tools:
    • Added PHP-CS-Fixer for code style enforcement
    • Configured Symfony and PSR-12 coding standards
    • Added composer scripts for style checking
    • VS Code integration setup
  • Monitoring Tools:
    • Added Kibana 8.17.1 integration
    • Configured health checks for Kibana
    • Added Elasticsearch monitoring dashboard
    • Integrated with existing Elasticsearch setup

Changed

  • Refactored ElasticsearchService to implement SearchEngineInterface
  • Improved Elasticsearch client configuration
  • Enhanced exception handling for Elasticsearch operations

v1.0.0

08 Apr 09:24

Choose a tag to compare

v1.0.0 Pre-release
Pre-release

[1.0.0] - 2025-04-08

Added

  • Initial project setup with Symfony 7.2
  • Docker infrastructure:
    • PostgreSQL 16 with health checks
    • Apache 2.4 web server
    • PHP-FPM 8.4 configuration
    • Elasticsearch 8.17.1 integration
  • Basic project configuration:
    • Docker Compose setup
    • Environment variables structure
    • Project documentation
  • Elasticsearch features:
    • Health checks implementation
    • Volume persistence
    • Memory optimization
    • Security configuration
  • Apache and PHP integration
  • Database configuration and persistence
  • Console Commands:
    • PDF indexer command (app:index-pdfs)
    • Automatic text extraction from PDFs
    • Elasticsearch document indexing
  • Documentation:
    • Comprehensive README.md with:
      • Project description
      • Installation instructions
      • Docker setup guide
      • Usage examples
      • Development guidelines
      • Contributing guidelines
      • License information

Changed

  • N/A

Deprecated

  • N/A

Removed

  • N/A

Fixed

  • N/A

Security

  • Disabled Elasticsearch security for development
  • Basic authentication setup for services