Skip to content
Nov 2025Research project

TrustMed AI

A RAG-powered medical chatbot that combines authoritative clinical sources (Mayo Clinic, CDC, MedlinePlus) with real patient discussions from Reddit to answer questions about Diabetes and Cardiovascular Disease.

Stack
  • Python
  • LangChain
  • AWS Bedrock
  • Amazon OpenSearch
  • Llama 3
  • Chainlit
  • BeautifulSoup
01

What it does

TrustMed AI answers medical questions about Diabetes and Cardiovascular Disease by retrieving context from two source types: authoritative clinical articles (Mayo Clinic, CDC, MedlinePlus) and real patient discussions scraped from Reddit. A hybrid retrieval pipeline (k-NN + BM25) ranks passages before passing them to Llama 3 via AWS Bedrock for answer generation.

02

How it works

  • Scraped 201 medical articles and 1,525 Reddit threads using BeautifulSoup
  • Chunked and embedded documents into Amazon OpenSearch with hybrid k-NN + BM25 indexing
  • Built a LangChain retrieval chain with AWS Bedrock (Llama 3) for generation
  • Evaluated with RAGAS metrics: context relevance, answer relevance, and groundedness
  • Served through a Chainlit chat interface for interactive Q&A
03

Key decisions

  • Hybrid retrieval (semantic + keyword) to handle both clinical terminology and colloquial patient language
  • Dual-source design so answers reflect both medical authority and lived patient experience
  • OpenSearch over a pure vector DB for its native BM25 support alongside k-NN
  • RAGAS evaluation framework to quantify retrieval and generation quality