DataStory

AI-augmented data storytelling for CSV and Excel datasets.

DataStory takes an uploaded dataset, runs it through a backend analysis pipeline, and returns:

dataset metadata and quality signals
suggested visualizations
audience-specific narrative stories
an optional preprocessing report
a rendered HTML report

The frontend provides the upload flow and live pipeline status. The backend performs file validation, preprocessing, analysis, chart generation, story generation, and report assembly.

Architecture

frontend/ - React + Vite UI
backend/ - FastAPI service and agent pipeline
WebSocket updates for live job progress
Optional AI provider support through Groq, Gemini, or Ollama

Features

Upload CSV and Excel datasets
Optional preprocessing before analysis
Live progress tracking across the pipeline
Dataset metadata summary
Automated chart selection and rendering
Narrative generation for executive, analyst, investor, or general audiences
Cleaned dataset download when preprocessing is enabled
Full HTML report output

Tech Stack

Frontend:

React 18
Vite
Tailwind CSS
Axios
Plotly

Backend:

FastAPI
Pydantic
Pandas
Plotly
Jinja2
Uvicorn

AI providers:

Groq
Gemini
Ollama

Project Flow

User uploads a dataset in the frontend.
The backend validates the file and creates a job.
The pipeline can optionally preprocess the data.
The backend analyzes structure, generates chart specs, renders charts, and creates narrative stories.
The final report and results are stored in the job store and shown in the dashboard.

Local Development

Backend

cd backend
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend

cd frontend
npm install
npm run dev

If your backend is not running on the same origin as the frontend, set:

VITE_API_URL=http://localhost:8000

Environment Variables

Backend .env:

AI_PROVIDER=groq
GROQ_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2
CORS_ORIGINS=http://localhost:5173
MAX_FILE_SIZE_MB=200
UPLOAD_DIR=./uploads

The backend supports these providers:

groq
gemini
ollama

API Endpoints

POST /api/upload - upload a dataset and start a job
GET /api/jobs/{job_id}/status - poll job status
GET /api/jobs/{job_id}/results - fetch completed results
GET /api/jobs/{job_id}/report - open the rendered HTML report
GET /api/jobs/{job_id}/cleaned - download the cleaned CSV
GET /api/jobs/{job_id}/preprocess-report - fetch preprocessing details
WS /api/ws/{job_id} - receive live progress updates
GET /health - health check

Deployment

Frontend: Vercel
Backend: Docker or any FastAPI-compatible host
Set VITE_API_URL in the frontend deployment to point at the backend API

Repository Structure

datastory/
  frontend/
  backend/
  README.md

Notes

Dataset uploads are limited to CSV and Excel formats.
The backend is designed to be provider-agnostic, so you can switch AI providers through environment settings.
Chart and story generation are handled in the backend pipeline, not in the browser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataStory

Architecture

Features

Tech Stack

Project Flow

Local Development

Backend

Frontend

Environment Variables

API Endpoints

Deployment

Repository Structure

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
backend		backend
frontend		frontend
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

DataStory

Architecture

Features

Tech Stack

Project Flow

Local Development

Backend

Frontend

Environment Variables

API Endpoints

Deployment

Repository Structure

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages