Creating a Medical Question-Answering Chatbot Using Open-Source BioMistral LLM, LangChain, Chroma’s Vector Storage, and RAG: A Step-by-Step Guide

In this tutorial, we’ll build a powerful, PDF-based question-answering chatbot tailored for medical or health-related content. We’ll leveRAGe the open-source BioMistral LLM and LangChain’s flexible data orchestration capabilities to process PDF documents into manageable text chunks. We’ll then encode these chunks using Hugging Face embeddings, capturing deep semantic relationships and storing them in a Chroma […] The post Creating a Medical Question-Answering Chatbot Using Open-Source BioMistral LLM, LangChain, Chroma’s Vector Storage, and RAG: A Step-by-Step Guide appeared first on MarkTechPost.

Feb 3, 2025 - 01:06
 0
Creating a Medical Question-Answering Chatbot Using Open-Source BioMistral LLM, LangChain, Chroma’s Vector Storage, and RAG: A Step-by-Step Guide

In this tutorial, we’ll build a powerful, PDF-based question-answering chatbot tailored for medical or health-related content. We’ll leveRAGe the open-source BioMistral LLM and LangChain’s flexible data orchestration capabilities to process PDF documents into manageable text chunks. We’ll then encode these chunks using Hugging Face embeddings, capturing deep semantic relationships and storing them in a Chroma vector database for high-efficiency retrieval. Finally, by employing a Retrieval-Augmented Generation (RAG) system, we’ll integrate the retrieved context directly into our chatbot’s responses, ensuring clear, authoritative answers for users. This approach allows us to rapidly sift through large volumes of medical PDFs, providing context-rich, accurate, and easy-to-understand insights.

Setting up tools

!pip install langchain sentence-transformers chromadb llama-cpp-python langchain_community pypdf
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS, Chroma
from langchain_community.llms import LlamaCpp
from langchain.chains import RetrievalQA, LLMChain
import pathlib
import textwrap
from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
    text = text.replace('•', '  *')
    return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))
from google.colab import drive
drive.mount('/content/drive')

First, we install and configure Python packages for document processing, embedding generation, local LLMs, and advanced retrieval-based workflows with LlamaCpp. We leverage langchain_community for PDF loading and text splitting, set up RetrievalQA and LLMChain for question answering, and include a to_markdown utility plus Google Drive mounting.

Setting up API key access

from google.colab import userdata
# Or use `os.getenv('HUGGINGFACEHUB_API_TOKEN')` to fetch an environment variable.
import os
from getpass import getpass


HF_API_KEY = userdata.get("HF_API_KEY")
os.environ["HF_API_KEY"] = "HF_API_KEY"

Here, we securely fetch and set the Hugging Face API key as an environment variable in Google Colab. It can also leverage the HUGGINGFACEHUB_API_TOKEN environment variable to avoid directly exposing sensitive credentials in your code.

Loading and Extracting PDFs from a Directory

loader = PyPDFDirectoryLoader('/content/drive/My Drive/Data')
docs = loader.load()

We use PyPDFDirectoryLoader to scan the specified folder for PDFs, extract their text into a document list, and lay the groundwork for tasks like question answering, summarization, or keyword extraction.

Splitting Loaded Text Documents into Manageable Chunks

text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = text_splitter.split_documents(docs)

In this code snippet, RecursiveCharacterTextSplitter is applied to break down each document in docs into smaller, more manageable segments.

Initializing Hugging Face Embeddings

embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")

Using HuggingFaceEmbeddings, we create an object using the BAAI/bge-base-en-v1.5 model. It converts text into numerical vectors.

Building a Vector Store and Running a Similarity Search

vectorstore = Chroma.from_documents(chunks, embeddings)
query = "who is at risk of heart disease"
search = vectorstore.similarity_search(query)
to_markdown(search[0].page_content)

We first build a Chroma vector store (Chroma.from_documents) from the text chunks and the specified embedding model. Next, you create a query asking, “who is at risk of heart disease,” and perform a similarity search against the stored embeddings. The top result (search[0].page_content) is then converted to Markdown for clearer display.

Creating a Retriever and Fetching Relevant Documents

retriever = vectorstore.as_retriever(
    search_kwargs={'k': 5}
)
retriever.get_relevant_documents(query)

We convert the Chroma vector store into a retriever (vectorstore.as_retriever) that efficiently fetches the most relevant documents for a given query. 

Initializing BioMistral-7B  Model with LlamaCpp

llm = LlamaCpp(
    model_path= "/content/drive/MyDrive/Model/BioMistral-7B.Q4_K_M.gguf",
    temperature=0.3,
    max_tokens=2048,
    top_p=1)

We set up an open-source local BioMistral LLM using LlamaCpp, pointing to a pre-downloaded model file. We also configure generation parameters such as temperature, max_tokens, and top_p, which control randomness, the maximum tokens generated, and the nucleus sampling strategy.

Setting Up a Retrieval-Augmented Generation (RAG) Chain with a Custom Prompt

from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain.prompts import ChatPromptTemplate
template = """
<|context|>
You are an AI assistant that follows instruction extremely well.
Please be truthful and give direct answers

<|user|>
{query}

 <|assistant|>
"""
prompt = ChatPromptTemplate.from_template(template)
rag_chain = (
    {'context': retriever, 'query': RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Using the above, we set up an RAG pipeline using the LangChain framework. It creates a custom prompt with instructions and placeholders, incorporates a retriever for context, and leverages a language model for generating answers. The flow is defined as a series of operations (RunnablePassthrough for direct query handling, the ChatPromptTemplate for prompt construction, the LLM for response generation, and finally, the StrOutputParser to produce a clean text string).

Invoking the RAG Chain to Answer a Health-Related Query

response = rag_chain.invoke("Why should I care about my heart health?")
to_markdown(response)

Now, we call the previously constructed RAG chain with a user’s query. It passes the query to the retriever, retrieves relevant context from the document collection, and feeds that context into the LLM to generate a concise, accurate answer.

In conclusion, by integrating BioMistral via LlamaCpp and taking advantage of LangChain’s flexibility, we are able to build a medical-RAG chatbot with context awareness. From chunk-based indexing to seamless RAG pipelines, it streamlines the process of mining large volumes of PDF data for relevant insights. Users receive clear and easily readable answers by formatting final responses in Markdown. This design can be extended or tailored for various domains, ensuring scalability and precision in knowledge retrieval across diverse documents.


Use the Colab Notebook here. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

                        </div>
                                            <div class= read more