A Symfony application to search content within PDF files using Elasticsearch and Vue.js.
- Features
- Description
- Technologies
- Requirements
- Installation
- Docker Setup
- Configuration
- PDF Management
- Usage
- Development
- Elasticsearch
- Maintenance
- Troubleshooting
- Security
- Contributing
- License
- π Page-Level PDF Search
- π Real-time Search Results
- π― Intelligent Content Highlighting
- Smart word boundary detection (avoids "java" in "javascript")
- Support for accented characters (finds "JosΓ©" when searching "jose")
- Handles malformed PDF text layers automatically
- Highlights all occurrences on the page
- π Relevance Scoring
- π± Responsive Design
- π Fast Elasticsearch Backend
- π Automatic PDF Processing
- π Page Context Display
- π Direct PDF Page Links with In-Page Highlighting
- π Search Analytics via Kibana
This application allows users to search for content within PDF files using Elasticsearch for efficient text searching and indexing, with a modern Vue.js frontend.
- PHP 8.4.14
- Symfony 7.3.6
- Elasticsearch 8.17.10
- Kibana 8.17.10
- Vue.js 3.5.24
- PDF.js 5.4.394
- Tailwind CSS 3.4.17
- Docker 27.5.1 & Docker Compose
- Node.js 22.x
- PostgreSQL 16
- Apache 2.4
- Docker 27.5.1 and Docker Compose
- PHP 8.4.14
- Composer 2.x
- Node.js 22.x and npm
- pdftotext utility (poppler-utils)
- At least 4GB RAM (for Elasticsearch)
- Clone the repository:
git clone [email protected]:yourusername/pdf-content-search.git
cd pdf-content-search- Install dependencies:
composer install
npm install- Install pdftotext utility:
sudo apt-get install poppler-utils- Build frontend assets:
npm run devQuick Start:
docker-compose build
docker-compose up -dFor detailed Docker documentation, configuration, and production setup, see docs/docker.md.
# PostgreSQL
POSTGRES_DB=app
POSTGRES_PASSWORD=!ChangeMe!
POSTGRES_USER=app
POSTGRES_VERSION=16
# Elasticsearch
ELASTICSEARCH_HOST=http://elasticsearch:9200- Application: http://localhost
- Elasticsearch: http://localhost:9200
- Kibana: http://localhost:5601
- Create PDF directories:
mkdir -p public/pdfs-
Place your PDFs in
public/pdfs/ -
Index the PDFs:
docker compose exec php bin/console app:index-pdfs- Access the application at
http://localhost - Use the search bar to find content in PDFs
- Results will show:
- PDF filename
- Page number
- Content context with highlighted matches
- Direct link to PDF page
- Click on "View PDF at this page" to see the PDF with:
- All matching words highlighted in yellow
- Smart highlighting that respects word boundaries
- Support for accented and special characters
- Automatic handling of malformed PDF text layers
- Start development environment:
docker compose up -d
npm run watch- Run tests:
docker compose exec php bin/phpunit- Check code style:
# Check for violations without fixing
docker compose exec php vendor/bin/php-cs-fixer fix --dry-run
# Check with detailed diff output
docker compose exec php vendor/bin/php-cs-fixer fix --dry-run --diff
# Fix code style violations
docker compose exec php vendor/bin/php-cs-fixer fix- Frontend Development:
- Components in
assets/components/ - Styles in
assets/css/ - Build:
npm run build - Watch:
npm run watch
- Components in
- Check cluster health:
curl http://localhost:9200/_cluster/health- View indices:
curl http://localhost:9200/_cat/indices- Monitor with Kibana:
- Access Kibana at http://localhost:5601
- View index management
- Monitor cluster health
- Analyze search performance
- Clear caches:
docker compose exec php bin/console cache:clear- Update dependencies:
docker compose exec php composer update
docker compose exec php npm update- Rebuild containers:
docker compose down
docker compose build --no-cache
docker compose up -d- Elasticsearch Issues:
# Check health
docker compose exec elasticsearch curl -X GET "localhost:9200/_cluster/health"
# View logs
docker compose logs elasticsearch- Frontend Issues:
# Clear cache
npm cache clean --force
# Rebuild
npm run build- PDF Indexing Issues:
# Check directory
ls public/pdfs/
# Verbose indexing
docker compose exec php bin/console app:index-pdfs -vv- Change default PostgreSQL credentials
- Enable Elasticsearch security in production
- Configure HTTPS for production
- Set proper file permissions
- Fork the repository
- Create feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add AmazingFeature') - Push branch (
git push origin feature/AmazingFeature) - Open Pull Request
Licensed under GNU General Public License v3.0 - see LICENSE file.