Repository Record

Clinical Tools

MedAgentBench

by Stanford ML Group

View on GitHub
212stars
46forks
Python
MIT

About

MedAgentBench is a realistic virtual EHR environment designed to benchmark medical LLM agents on clinical tasks. Built on top of AgentBench, it provides a Docker-based FHIR server environment that simulates realistic electronic health record interactions. The benchmark evaluates how well LLM agents can navigate clinical workflows, make decisions based on patient data, and execute tasks in a standardized healthcare information system, supporting models like GPT-4o, Gemini, and Claude through configurable agent configurations.

Tech Stack

PythonDockerFHIROpenAI APIVertex AIAgentBench

Research Paper

View Paper

Quick Start

conda create -n medagentbench python=3.9 && pip install -r requirements.txt && docker pull jyxsu6/medagentbench:latest && docker run -p 8080:8080 medagentbench