This project is a conversational AI application designed to answer user queries using a multilingual large language model (LLM) called Llama 3.2. The system leverages a combination of advanced natural language processing (NLP) tools and a vector database to provide contextual, accurate, and efficient responses. The AI can maintain a history of conversations and use this history alongside relevant external documents to generate meaningful answers.
-
Contextual Query Handling:
- Answers questions using a combination of conversation history and a vector-based document retrieval system.
- If a question is unrelated to the context or history, the system gracefully responds with "pass."
-
Multilingual LLM Integration:
- Utilizes Llama 3.2, a state-of-the-art multilingual large language model for text input/output operations.
-
Conversation Memory:
- Maintains a history of the last 5 exchanges between the user and the model to enrich future responses.
-
Real-Time Web Interface:
- Built with NiceGUI to provide a clean and user-friendly web application for interaction.
-
PDF Document Processing:
- Parses and splits PDF documents into manageable chunks for efficient retrieval.
-
Efficient Database Management:
- Uses Chroma Vector Database to store and retrieve embeddings of textual data.
-
Logging:
- Logs user queries and AI responses for debugging and tracking purposes.
-
Real-Time Conversation Contextualization:
- Integrates contextual memory and external document retrieval in real-time for more accurate responses.
-
Adaptive Language Model Tuning:
- Supports custom fine-tuning for more specialized domains to adapt the model to particular use cases.
-
Interactive User Feedback:
- Users can provide feedback on answers, which helps improve future responses by adjusting memory.
- Llama 3.2:
- A collection of pretrained and instruction-tuned generative models in 1B and 3B sizes.
- Optimized for multilingual dialogue, agentic retrieval, and summarization tasks.
- Outperforms many open-source and closed chat models on industry benchmarks.
- LangChain:
- For building the conversational pipeline, including prompt templates and retrieval-based queries.
- Ollama Embeddings:
- Converts text into numerical representations for similarity searches.
- Chroma Vector Database:
- Stores embeddings of text documents for fast semantic search.
- PyPDFDirectoryLoader:
- Extracts text content from PDF files.
- Recursive Character Text Splitter:
- Splits large text into manageable chunks for better search and retrieval.
- External File Format Support:
- Plans for future support for additional file types such as DOCX, TXT, and HTML.
- NiceGUI:
- A Python-based framework for creating modern web interfaces.
- Implements real-time user interaction with AI.
- Real-Time Conversation Handling:
- The web interface is optimized to handle live interactions, offering seamless conversation flow.
- Argparse:
- Handles command-line arguments for flexibility in usage.
- Logging:
- Tracks user inputs and model outputs in a structured log file.
- Datetime and Asyncio:
- Used for logging timestamps and managing real-time asynchronous tasks.
- User Feedback Mechanism:
- A feedback loop that allows users to rate answers, improving response generation.
-
PDF Processing:
- Upload PDF documents to the
datadirectory. - The system processes and splits documents into chunks with metadata, storing them in the Chroma database.
- Upload PDF documents to the
-
Querying:
- Users input a query through the web interface or command-line.
- The query is converted into embeddings, and the most relevant document chunks are retrieved.
-
Conversational Memory:
- The AI uses the last 5 exchanges to add depth and context to its answers.
-
Response Generation:
- Combines retrieved documents and conversation history to craft a response using the Llama 3.2 model.
-
Logging:
- All interactions are logged for later analysis.
-
Interactive Feedback:
- Users can rate responses or provide feedback, which can influence future model behavior and improve the system’s adaptability.
The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text-only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open-source and closed chat models on common industry benchmarks.
- Python 3.8 or later
- Required Python packages (install via
pip install -r requirements.txt):- langchain
- nicegui
- chromadb
- PyPDF2
-
Populate the Database:
- Place your PDF documents in the
datadirectory. - Run
populate_database.pyto process and store document embeddings.
- Place your PDF documents in the
-
Start the Web Interface:
- Run
ui_utils.pyto launch the NiceGUI-based web app. - Access the app at
http://localhost:8080.
- Run
-
Command-Line Query:
- Run
main.pywith a query argument:
python main.py "Your query here"
- Run
-
User Feedback:
- Provide ratings or comments on responses directly from the web interface.
- Contextual Question Answering:
- The AI retrieves relevant content from PDFs to answer questions accurately.
- Multilingual Conversations:
- Supports queries and responses in multiple languages.
- Agentic Retrieval:
- Summarizes large documents and provides concise answers.
- Interactive Feedback Loop:
- Users can help fine-tune future responses by providing feedback.
- Add support for additional file types (e.g., DOCX, TXT).
- Enhance multilingual capabilities with more tuned models.
- Bigger LLM for better user experience.
- Implenemtation on a actual running personal website.
For questions, suggestions, or contributions, feel free to reach out at [email protected].