Skip to content

anuraagr/enterprise-data-agents-app-fabric-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Healthcare Data Agent — Microsoft Fabric + Azure AI

An enterprise-grade AI chatbot that lets Health and Life Sciences (HLS) professionals query synthetic patient data using plain English. It converts natural-language questions into SQL, executes them against a Microsoft Fabric Lakehouse, and returns actionable insights — no SQL knowledge required.

Python 3.11 Streamlit Fabric Azure


Why This Matters for Health & Life Sciences

Challenge How This App Helps
Clinical analysts need data but don't write SQL Natural language → SQL via Fabric Data Agent
Healthcare data is sensitive and siloed Azure AD authentication, no data leaves Fabric
Reporting cycles are slow Real-time answers in seconds from a chat interface
Onboarding new analysts takes weeks Self-service: just type a question and get an answer

Architecture

┌─────────────────┐     HTTPS / OAuth 2.0      ┌───────────────────────┐
│  Streamlit UI   │ ◄──────────────────────────►│  Microsoft Fabric     │
│  (Azure         │     OpenAI-compatible       │  Data Agent           │
│   Container     │     Assistants API          │  (NL → SQL engine)    │
│   Apps)         │                             │                       │
└─────────────────┘                             │  ┌─────────────────┐  │
                                                │  │ Fabric Lakehouse│  │
        Azure AD Service Principal              │  │ (Synthea data)  │  │
        or Managed Identity                     │  └─────────────────┘  │
                                                └───────────────────────┘

API flow: Create assistant → Create thread → Post question → Poll for run completion → Retrieve answer → Clean up thread


Dataset: Synthea Synthetic Healthcare

The Lakehouse contains ~150 000 records across 19 tables generated by Synthea — a widely-used open-source synthetic patient generator. No real PHI is present.

Category Tables Example Columns
Patient patients, allergies, careplans, immunizations Id, Gender, Race, BirthDate, Healthcare_Expenses
Clinical conditions, medications, procedures, observations Description, Code, Base_Cost, TotalCost, Value, Units
Administrative encounters, organizations, providers, payers EncounterClass, Total_Claim_Cost, Revenue, Speciality

Project Structure

├── src/                          # Application source
│   ├── Home.py                   # Landing page
│   ├── pages/
│   │   └── 01-Healthcare_Agent.py  # Chat interface
│   ├── services/
│   │   ├── agent_provider.py     # Azure AI Foundry integration
│   │   ├── tool_provider.py      # Fabric & Genie tool init
│   │   └── genie_functions.py    # Databricks Genie integration
│   ├── config.json               # Agent namespace config
│   ├── requirements.txt          # Python deps
│   └── env.example               # Environment variable template
├── tests/
│   ├── stress_test_healthcare_agent.py   # 50+ queries, 8 categories
│   └── quick_test.py                     # Connectivity smoke test
├── docs/
│   └── STRESS_TEST_SUMMARY.md    # Test results & methodology
├── Dockerfile                    # Container build
├── deploy-azure.ps1              # One-command Azure deployment
└── README.md

Getting Started

Prerequisites

Requirement Details
Python 3.11 or later
Azure CLI Installed and authenticated (az login)
Microsoft Fabric Workspace with a Data Agent configured (F64+ capacity)
Azure AD App Registration With Fabric API permissions granted

1. Clone & install

git clone https://github.com/anuraagr/enterprise-data-agents-app-fabric-api.git
cd enterprise-data-agents-app-fabric-api

python -m venv .venv
.venv\Scripts\activate        # Windows
# source .venv/bin/activate   # macOS / Linux

pip install -r src/requirements.txt

2. Configure environment

cp src/env.example src/.env

Open src/.env and fill in the required values:

Variable Description
FABRIC_WORKSPACE_ID GUID of your Fabric workspace
FABRIC_ARTIFACT_ID GUID of your Data Agent artifact
FABRIC_CLIENT_ID Azure AD App Registration client ID
FABRIC_CLIENT_SECRET Azure AD App Registration client secret
FABRIC_TENANT_ID Azure AD tenant ID

Where to find these: Open Microsoft Fabric → your workspace → Data Agent → Settings. The workspace ID and artifact ID are in the URL. The App Registration values come from Azure Portal → App registrations.

3. Run locally

cd src
streamlit run Home.py

The app opens at http://localhost:8501. Navigate to the Healthcare Agent page and start asking questions.

4. Deploy to Azure Container Apps (optional)

.\deploy-azure.ps1

This creates a resource group, ACR, and Container App with managed identity — all in one command.


Sample Queries

Once the agent is running, try these:

Question What it does
"What tables are available?" Schema discovery
"Show the top 10 conditions by patient count" Condition prevalence analysis
"What is the average medication cost by drug?" Pharmacy cost analytics
"How many encounters per encounter class?" Utilization breakdown
"Which providers have the most patients?" Provider workload distribution
"Show healthcare spending by gender and race" Health equity / demographic analysis
"List patients with diabetes and their medications" Comorbidity + treatment join

Testing

# Quick connectivity check
python tests/quick_test.py

# Run all 50+ stress-test queries (with 3-second delay between queries)
python tests/stress_test_healthcare_agent.py --all --delay 3

# Run a single category
python tests/stress_test_healthcare_agent.py --category patient_queries

See docs/STRESS_TEST_SUMMARY.md for detailed results.


Authentication

The app tries credentials in this order:

  1. Client Secret — production deployments with an App Registration
  2. Managed Identity — Azure Container Apps with system-assigned identity
  3. Azure CLI — local development (az login)

For local development, having a valid az login session is the simplest path.


Troubleshooting

Symptom Fix
"Fabric capacity not active" Resume the capacity in the Fabric Admin Portal → Capacity settings → Resume
Authentication errors Run az login, verify .env values, check App Registration API permissions
Query timeouts Complex joins can take 60–120 s. The agent retries automatically with exponential backoff.
Empty FABRIC_WORKSPACE_ID Make sure src/.env exists and is populated — the app won't fall back to hardcoded IDs.

Key Components

File Purpose
src/Home.py Landing page with stats, schema overview, and navigation
src/pages/01-Healthcare_Agent.py Chat UI — handles auth, API calls, retries, upload, export
src/services/agent_provider.py Azure AI Foundry async agent lifecycle
src/services/tool_provider.py Initialises Fabric + Genie toolset
src/services/genie_functions.py Databricks Genie NL-to-SQL bridge
deploy-azure.ps1 Automated Azure Container Apps deployment
Dockerfile Python 3.11-slim container with health check

Acknowledgements


Built for Health and Life Sciences teams exploring AI-powered data access with Microsoft Fabric.

Releases

No releases published

Packages

 
 
 

Contributors