Building a Scalable RAG AI Chatbot with Microsoft Azure: A Step-by-Step Guide

Mihai Carol Bazga

Oct 2, 2024 — 4 min read

Recently, we embarked on an exciting project to develop a Retrieval-Augmented Generation (RAG) chatbot aimed at helping beneficiaries easily access funding information from a large database of documents. One of the key requirements for our project was to exclusively use Microsoft Azure services. This choice was motivated by the client’s trust in Azure's secure, scalable, and reliable cloud infrastructure, so we accepted the challenge.

Here’s a comprehensive walkthrough of how we implemented a scalable and production-ready solution using Azure's suite of tools.

Setting Up Azure Resources

To develop our RAG chatbot, we utilized several essential Azure resources:

Azure OpenAI:
- GPT-4o-Mini: We employed GPT-4o-Mini as our large language model (LLM) to generate human-like responses. This model is pre-trained on a vast corpus of text, making it proficient in understanding and generating natural language.
- Text-Embedding-Ada-002: This service was used to create embeddings, which are high-dimensional vectors representing the semantic meaning of text. These embeddings are crucial for efficiently searching through our document database.
Azure Search Service:
- Integrated to index and search through the embeddings. This service ensures that when a user query is made, the most relevant document vectors are retrieved and used to form an answer.
Azure Blob Storage:
- Used to store the vast repository of documents securely. This scalable storage solution ensured that we could handle the large volume of data without performance issues.
Azure Service Plans:
- To support our infrastructure, we selected suitable Azure service plans that offered the necessary computational power and storage capacity. These plans provided the flexibility and scalability needed to accommodate the dynamic nature of our chatbot's operations.
Azure Web Services:
- Frontend Exposure: To expose a user-friendly frontend for our chatbot, we utilized Azure Web Services. This enabled us to create an interactive web interface where users could easily interact with the chatbot. The frontend was designed to be intuitive and responsive, enhancing the overall user experience.
- Backend Proxy: To link the frontend to various Azure services (e.g., OpenAI), we developed a backend proxy using Azure Web Services. This backend proxy acted as an intermediary, ensuring seamless communication between the frontend interface and the underlying Azure resources. By doing so, we could efficiently manage API calls, data retrieval, and response generation.

Implementing the RAG Framework

The fundamental idea of the RAG (Retrieval-Augmented Generation) framework is straightforward:

Document Chunking: The documents are divided into smaller chunks.
Embedding Creation: Each chunk is transformed into embeddings and inserted into a database.
Query Processing: When a request is made, the content of the most relevant vectors is retrieved and used to prompt an LLM to generate a coherent and contextually appropriate answer.

These steps were made very easy by the out-of-the-box integration between Microsoft Azure Services and OpenAI.

A black background with a black square

Description automatically generated with medium confidence

Overcoming Challenges with Advanced Techniques

Preserving Context in Large Documents: One significant challenge we encountered was handling very large documents (up to 100 pages). Simple chunking could lead to a loss of context, as chunks might be created at inappropriate points within the document. To address this, we used LangChain, an open-source tool that supports overlapping chunks. Each new chunk contains a portion of the previous one, thereby preserving the context across chunks.

Handling Complex Tables: Another challenge was dealing with big tables within documents. For this, we leveraged Azure Document Intelligence to extract data from tables accurately. We then applied dynamic chunking to retain information about each column's representation, ensuring that the context within tables was not lost.

Processing Scanned PDFs: Many documents were PDFs with scanned files, necessitating an additional preprocessing step. We used Azure's Optical Character Recognition (OCR) service to extract text from images within these scanned documents. This text extraction was a crucial preliminary step before applying the chunking and embedding process.

How We Prevent AI System Hallucinations

In developing a RAG chatbot for a governmental institution, preventing "hallucinations"—instances where the AI generates plausible but incorrect information—was a key focus to avoid erroneous responses that can undermine public trust, lead to misinformation, and potentially make people lose money or miss deadlines. Here’s how we tackled this challenge:

Prompt Engineering:
- Contextual Prompts: Providing context within the prompt helped the model understand the query better. For instance, you can instruct the model by providing a system message:
  "You are an assistant that works for X organisation. Your job is to help beneficiaries with information about funding. Your responses must be limited only to provided documents, and you should not add any information outside that knowledge base."

Instruction-Based Prompts: Explicit instructions guided the model. For example:
"Based on the document, summarise the key points without adding any information that is not explicitly mentioned."

Few-Shot Examples:
- We included examples of user queries and correct responses within the prompt, helping the model learn the desired output pattern. For example:User: "Who are you?"
  AI: "I am a helpful assistant that works for X organisation and helps beneficiaries find information about funding sessions easily."
  User: "Who is the founder of Microsoft?"
  AI: "I cannot help you with that. The requested information is not provided in my knowledge base."

Final Thoughts

Working on this chatbot gave us fresh insights into deploying AI effectively while ensuring accuracy. In today's world, being able to chat with your data is a must because it saves valuable time. Microsoft Azure's user-friendly tools made the process much simpler, allowing us to deliver a reliable and effective solution.