Nov 2025Research project
TrustMed AI
A RAG-powered medical chatbot that combines authoritative clinical sources (Mayo Clinic, CDC, MedlinePlus) with real patient discussions from Reddit to answer questions about Diabetes and Cardiovascular Disease.
- Python
- LangChain
- AWS Bedrock
- Amazon OpenSearch
- Llama 3
- Chainlit
- BeautifulSoup
01
What it does
TrustMed AI answers medical questions about Diabetes and Cardiovascular Disease by retrieving context from two source types: authoritative clinical articles (Mayo Clinic, CDC, MedlinePlus) and real patient discussions scraped from Reddit. A hybrid retrieval pipeline (k-NN + BM25) ranks passages before passing them to Llama 3 via AWS Bedrock for answer generation.
02
How it works
- Scraped 201 medical articles and 1,525 Reddit threads using BeautifulSoup
- Chunked and embedded documents into Amazon OpenSearch with hybrid k-NN + BM25 indexing
- Built a LangChain retrieval chain with AWS Bedrock (Llama 3) for generation
- Evaluated with RAGAS metrics: context relevance, answer relevance, and groundedness
- Served through a Chainlit chat interface for interactive Q&A
03
Key decisions
- Hybrid retrieval (semantic + keyword) to handle both clinical terminology and colloquial patient language
- Dual-source design so answers reflect both medical authority and lived patient experience
- OpenSearch over a pure vector DB for its native BM25 support alongside k-NN
- RAGAS evaluation framework to quantify retrieval and generation quality