📚 PDF Analyzer Chatbot

An interactive chatbot that lets you chat with your PDFs using LlamaIndex, Ollama, and HuggingFace embeddings.

The project loads your PDF files, indexes them, and allows you to ask questions. Answers are generated with detailed context, and sources (file names and page numbers) are displayed for full transparency.

🚀 Features

📂 Upload PDFs into the data/ folder and chat with them.
🤖 Ollama LLM (Llama 3) for answering your queries.
🔍 HuggingFace Embeddings (all-MiniLM-L6-v2) for efficient PDF search.
📖 Detailed responses powered by tree_summarize query mode.
📝 Source references (file name + page numbers).
🖥️ Simple command-line interface for Q&A.

📦 Requirements

Python 3.9+
Ollama installed and running locally
Virtual environment (recommended)

🛠️ Installation & Setup

1. Clone the repository

git clone https://github.com/Cirarshi/PDF-Analyzer-Chatbot.git
cd pdf-analyzer-chatbot

###2. Create a virtual environment

python -m venv .venv

Activate it:

Windows (PowerShell):

.venv\Scripts\activate

Linux/macOS:

source .venv/bin/activate

Install dependencies

pip install -r requirements.txt

Install and run Ollama

Download Ollama from Ollama’s website

Once installed, pull the Llama 3 model

ollama pull llama3

Keep Ollama running in the background:

ollama run llama3

Place all your PDFs inside the Content/ folder:

Content/
 ├── sample1.pdf
 ├── sample2.pdf

Run the chatbot

python app.py

📖 Usage

Once started, the program will continuously ask for your input.

📑 Project Structure

.
├── app.py                # Main chatbot code
├── requirements.txt      # Project dependencies
├── Content/              # Folder containing your PDFs
├── .venv/                # Virtual environment (ignored in git)
└── README.md             # Project documentation

📝 Notes

You can use model of your choice based on your preference out of the mentioned below

"compact" → short answer (default earlier).
"tree_summarize" → merges context from multiple sources, detailed + structured.
"refine" → builds answer step by step, refining it each time (more nuanced).
"accumulate" → dumps relevant chunks without much summarization (great for research).

📌 Roadmap

Add Streamlit UI for web interface
Enable multiple embedding models selection
Support for summarizing entire PDFs in one command

🤝 Contributing

Pull requests are welcome! Please open an issue first to discuss any major changes.

🚀 Future Implementations

Here are some ideas to take this project further:

Interactive Web UI
- Build a frontend using Streamlit, Gradio, or React for a more user-friendly interface.
- Allow drag-and-drop PDF uploads and a live chat experience.
Multi-Modal Support
- Extend beyond PDFs to support Word docs, Excel sheets, and images (OCR).
Improved Citation Handling
- Highlight exact text snippets from the PDF instead of just page numbers.
- Add clickable links to jump directly to the referenced section.
Customizable LLM Options
- Let users pick between different local models (e.g., Mistral, Gemma, Llama2/3) depending on hardware.
Enhanced Search
- Add a hybrid keyword + semantic search for faster and more accurate retrieval.
Multi-User / Study Group Mode
- Deploy as a local or cloud-based service where multiple users can query the same knowledge base.
Memory & Notes
- Add a feature for saving queries and answers for future revision.
- Option to export results into a personal study notebook.
Voice Interface
- Integrate speech-to-text and text-to-speech for hands-free usage.
Fine-Tuned Models
- Train embeddings or fine-tune the LLM on domain-specific study material (e.g., law, medicine, engineering).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Content		Content
.gitattributes		.gitattributes
README.md		README.md
Steps.txt		Steps.txt
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📚 PDF Analyzer Chatbot

🚀 Features

📦 Requirements

🛠️ Installation & Setup

1. Clone the repository

📖 Usage

📑 Project Structure

📝 Notes

📌 Roadmap

🤝 Contributing

🚀 Future Implementations

About

Uh oh!

Releases

Packages

Languages

Cirarshi/PDF-Analyzer-Chatbot

Folders and files

Latest commit

History

Repository files navigation

📚 PDF Analyzer Chatbot

🚀 Features

📦 Requirements

🛠️ Installation & Setup

1. Clone the repository

📖 Usage

📑 Project Structure

📝 Notes

📌 Roadmap

🤝 Contributing

🚀 Future Implementations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages