Skip to content

allenjoshua16/datastory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

DataStory

AI-augmented data storytelling for CSV and Excel datasets.

DataStory takes an uploaded dataset, runs it through a backend analysis pipeline, and returns:

  • dataset metadata and quality signals
  • suggested visualizations
  • audience-specific narrative stories
  • an optional preprocessing report
  • a rendered HTML report

The frontend provides the upload flow and live pipeline status. The backend performs file validation, preprocessing, analysis, chart generation, story generation, and report assembly.

Architecture

  • frontend/ - React + Vite UI
  • backend/ - FastAPI service and agent pipeline
  • WebSocket updates for live job progress
  • Optional AI provider support through Groq, Gemini, or Ollama

Features

  • Upload CSV and Excel datasets
  • Optional preprocessing before analysis
  • Live progress tracking across the pipeline
  • Dataset metadata summary
  • Automated chart selection and rendering
  • Narrative generation for executive, analyst, investor, or general audiences
  • Cleaned dataset download when preprocessing is enabled
  • Full HTML report output

Tech Stack

Frontend:

  • React 18
  • Vite
  • Tailwind CSS
  • Axios
  • Plotly

Backend:

  • FastAPI
  • Pydantic
  • Pandas
  • Plotly
  • Jinja2
  • Uvicorn

AI providers:

  • Groq
  • Gemini
  • Ollama

Project Flow

  1. User uploads a dataset in the frontend.
  2. The backend validates the file and creates a job.
  3. The pipeline can optionally preprocess the data.
  4. The backend analyzes structure, generates chart specs, renders charts, and creates narrative stories.
  5. The final report and results are stored in the job store and shown in the dashboard.

Local Development

Backend

cd backend
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend

cd frontend
npm install
npm run dev

If your backend is not running on the same origin as the frontend, set:

VITE_API_URL=http://localhost:8000

Environment Variables

Backend .env:

AI_PROVIDER=groq
GROQ_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2
CORS_ORIGINS=http://localhost:5173
MAX_FILE_SIZE_MB=200
UPLOAD_DIR=./uploads

The backend supports these providers:

  • groq
  • gemini
  • ollama

API Endpoints

  • POST /api/upload - upload a dataset and start a job
  • GET /api/jobs/{job_id}/status - poll job status
  • GET /api/jobs/{job_id}/results - fetch completed results
  • GET /api/jobs/{job_id}/report - open the rendered HTML report
  • GET /api/jobs/{job_id}/cleaned - download the cleaned CSV
  • GET /api/jobs/{job_id}/preprocess-report - fetch preprocessing details
  • WS /api/ws/{job_id} - receive live progress updates
  • GET /health - health check

Deployment

  • Frontend: Vercel
  • Backend: Docker or any FastAPI-compatible host
  • Set VITE_API_URL in the frontend deployment to point at the backend API

Repository Structure

datastory/
  frontend/
  backend/
  README.md

Notes

  • Dataset uploads are limited to CSV and Excel formats.
  • The backend is designed to be provider-agnostic, so you can switch AI providers through environment settings.
  • Chart and story generation are handled in the backend pipeline, not in the browser.

About

AI-augmented data storytelling platform that transforms raw datasets into executive-ready narratives, visualizations, and insights using multi-agent LLM orchestration with optional intelligent preprocessing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors