Skip to content

grishasen/decisioning-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DecisioningAssistant

DecisioningAssistant is a local-first MLX project for macOS that ingests PDFs, Markdown files, and Webex threads, generates QA datasets, fine-tunes small instruction models, builds a hybrid local RAG index, and serves a Streamlit chat assistant with source-aware citations.

What It Does

  • Ingests PDF and Markdown documentation with structure-aware paragraph chunking.
  • Fetches Webex room history directly from the Webex REST API using rooms.json plus a YAML config.
  • Groups Webex data into thread-based chunks so each chunk starts with the thread root.
  • Generates English QA pairs locally with an MLX-loaded model.
  • Fine-tunes MLX-compatible models with LoRA.
  • Builds or updates a local Qdrant index from source chunks and optional QA pairs.
  • Runs a Streamlit RAG app with chat history, retrieval controls, reranking, answer selection, citations, and source popups.

Current Pipeline Highlights

  • PDF and Markdown ingestion are structure-aware: PDFs are packed from whole paragraphs, while Markdown keeps H1 chapters whole by default and preserves section metadata.
  • Webex ingestion is thread-aware: threads with fewer than 2 messages are skipped, and each thread chunk keeps room and thread metadata.
  • Webex metadata now includes a webexteams://... deep link to the parent/root message in the thread.
  • Webex QA generation uses the thread start to generate the question and uses child messages as the answer.
  • Webex QA can be filtered to a specific user, keeping only threads where that user appears in child messages and only that user’s child messages in the answer.
  • RAG retrieval supports vector search plus reranking with cross_encoder, embedding_cosine, or none.
  • Answer generation supports Best-of-N answer selection with reranking. Default candidate count is 4.
  • The Streamlit source popup can show the retrieved text and the Webex parent-message link when available.

Repository Layout

configs/
  sources.yaml
  models.yaml
  qa_generation.yaml
  finetune.yaml
  rag.yaml
  webex_fetch.yaml

data/
  raw/pdf/
  raw/markdown/
  raw/webex/
  staging/documents/
  staging/chunks/
  qa/
  rag/vectordb/

pipelines/
  01_ingest.sh
  02_generate_qa.sh
  03_finetune.sh
  04_build_rag.sh
  05_eval.sh
  06_export_rag.sh
  07_import_rag.sh

src/
  common/
  decisioning_assistant/
  ingestion/
  qa/
  rag/
  training/

wiki/
  Home.md
  Overview.md
  Technical-Details.md
  Usage-and-Configuration.md

Requirements

  • macOS with Apple Silicon for MLX workflows.
  • Python >=3.10.
  • English-only source material and QA generation.

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .

The base install includes the test/lint tools, PDF markdown support, MLX-VLM model support with TorchVision image utilities, and TurboQuant conversion/runtime support used by the project.

Main Configuration Files

  • configs/sources.yaml: PDF, Markdown, and Webex ingestion paths plus normalization/chunking settings.
  • configs/models.yaml: QA generator, answer model, and embedding model settings.
  • configs/qa_generation.yaml: QA generation, validation, split, and Webex-specific QA controls.
  • configs/finetune.yaml: MLX LoRA fine-tuning settings.
  • configs/rag.yaml: indexing, retrieval, reranking, answer selection, and prompt-budget settings.
  • configs/webex_fetch.yaml: direct Webex API fetch settings.

Named machine profiles are available alongside the defaults:

  • configs/models.m3_24gb.yaml: explicit M3 24 GB MLX-LM generation profile.
  • configs/rag.m3_24gb.yaml: explicit M3 24 GB retrieval/context profile.
  • configs/qa_generation.m3_24gb.yaml: explicit M3 24 GB QA generation profile.
  • configs/models.m5_pro_64gb.yaml: larger MLX-LM generation profile.
  • configs/models.m5_pro_64gb.gemma4.yaml: Gemma 4 MLX-VLM generation profile.
  • configs/rag.m5_pro_64gb.yaml: larger retrieval/context profile.
  • configs/qa_generation.m5_pro_64gb.yaml: denser QA generation profile.

Typical End-to-End Workflow

  1. Fetch raw Webex spaces if needed.
  2. Put PDFs into data/raw/pdf/ and Markdown files into data/raw/markdown/.
  3. Run ingestion and chunking.
  4. Generate QA (optional step).
  5. Fine-tune model using QA dataset from step 4 (optional).
  6. Build or update the RAG index.
  7. Start the chat app.

Example:

#required for webex threads
decisioning-assistant webex-fetch \
  --rooms-json configs/rooms.json \
  --config configs/webex_fetch.yaml \
  --output-dir data/raw/webex

#put any pdf into pdf dir and markdown into markdown dir

decisioning-assistant ingest #required step
decisioning-assistant qa #optional
decisioning-assistant finetune --finetune-config configs/finetune.yaml #optional
decisioning-assistant rag-index --recreate
decisioning-assistant app --server-port 8501

CLI Commands

# Ingest PDF + Markdown + Webex + normalize
decisioning-assistant ingest

# Override Markdown chapter defaults for smaller chunks if needed
decisioning-assistant ingest --markdown-target-chars 900 --markdown-split-level 6

# Generate, validate, and split QA
decisioning-assistant qa

# Fine-tune with MLX LoRA
decisioning-assistant finetune --finetune-config configs/finetune.yaml

# Build or update the hybrid RAG index
decisioning-assistant rag-index

# Recreate the RAG collection from scratch
decisioning-assistant rag-index --recreate

# Export the RAG index
decisioning-assistant rag-export --output-dir data/rag/export

# Export only selected source types
decisioning-assistant rag-export --output-dir data/rag/export --source pdf

decisioning-assistant rag-export --output-dir data/rag/export --source markdown

decisioning-assistant rag-export --output-dir data/rag/export --source webex

# Import an exported RAG bundle
decisioning-assistant rag-import --input-dir data/rag/export --recreate

# Start the Streamlit app
decisioning-assistant app --server-port 8501

Direct Webex Fetch

Raw Webex exports can be created directly through the Webex API.

Example:

decisioning-assistant webex-fetch \
  --rooms-json /path/to/rooms.json \
  --config configs/webex_fetch.yaml \
  --output-dir data/raw/webex

Notes:

  • --room-type group is the default.
  • Output file names are derived from the room title and shortened to 80 characters.
  • The fetch config only uses token and max_total_messages.

QA Generation Notes

  • QA generation is local-only.
  • Short Webex chunks are skipped using min_webex_chunk_chars.
  • Webex thread QA uses generated questions plus child-message answers.
  • webex_user_name can restrict QA generation to replies from a specific user.
  • max_webex_thread_answer_chars controls the separate answer cap for Webex thread answers.

RAG Notes

  • Qdrant runs locally on disk.
  • The index can include raw source chunks, QA pairs, or both.
  • Retrieval reranking and answer reranking are separate stages.
  • The default retrieval reranker is cross_encoder.
  • The default answer-selection candidate count is 4.
  • The Streamlit app exposes the main retrieval, reranking, and prompt-budget controls in the sidebar.

Streamlit App

Run the app directly if needed:

PYTHONPATH=src streamlit run src/rag/assistant_app.py

The app provides:

  • session chat history,
  • configurable retrieval and prompt budgets,
  • answer Best-of-N selection,
  • source citations,
  • source popups with retrieved text,
  • Webex room timestamp display,
  • Webex parent-message deep links when available.

MLX-VLM and Gemma 4

Gemma 4 MLX checkpoints use mlx-vlm, so set provider: mlx_vlm in the relevant qa_generator or answer_model config block. The text-only MLX path remains provider: mlx or provider: mlx_lm.

Example:

answer_model:
  provider: mlx_vlm
  model: mlx-community/gemma-4-26b-a4b-it-mxfp4
  max_tokens: 2048
  temperature: 0.15
  trust_remote_code: true

The rag.chat_local CLI can also pass media to VLM models:

PYTHONPATH=src python3 -m rag.chat_local \
  "What does this screenshot show?" \
  --rag-config configs/rag.m5_pro_64gb.yaml \
  --models-config configs/models.m5_pro_64gb.gemma4.yaml \
  --image /path/to/screenshot.png

TurboQuant MLX

TurboQuant-compressed MLX models can be converted and used by the same QA, evaluation, local chat, and Streamlit app paths as standard MLX-LM models.

Convert the configured answer_model:

decisioning-assistant turboquant-convert \
  --models-config configs/models.yaml \
  --model-key answer_model \
  --mlx-path data/models/answer-model-tq3 \
  --bits 3 \
  --group-size 64

Or convert an explicit HuggingFace/local model:

decisioning-assistant turboquant-convert \
  --hf-path openai/gpt-oss-20b \
  --mlx-path data/models/gpt-oss-20b-tq3 \
  --bits 3 \
  --group-size 64

Point the app at the converted model with provider: turboquant_mlx:

answer_model:
  provider: turboquant_mlx
  model: data/models/answer-model-tq3
  max_tokens: 2048
  temperature: 0.15
  turboquant_kv_bits: 3
  turboquant_kv_group_size: 64

Set turboquant_kv_bits to 0 or omit it to use the TurboQuant weight-compressed model with the normal FP16 KV cache. Use turboquant_fast: true only for converted models that include QJL correction and where speed is preferred over the highest-quality decode.

Export and Import

Portable RAG bundles can be moved to another machine.

Example:

# Export
decisioning-assistant rag-export --output-dir data/rag/export

# Import on another machine
decisioning-assistant rag-import --input-dir data/rag/export --recreate

Notes

  • The defaults in configs/ were tuned for a MacBook Pro M3 with 24GB RAM, but the code is not hard-limited to that hardware.
  • Larger future Apple Silicon systems can increase model size, retrieval depth, and prompt budgets through config.
  • After changing Webex ingestion metadata, rerun ingestion, QA generation, and RAG indexing so new metadata reaches the app.
  • PyMuPDF uses a dual AGPL/commercial license. Check fit for your usage.

Documentation

See the wiki pages in wiki/ for a fuller walkthrough:

  • wiki/Overview.md
  • wiki/Technical-Details.md
  • wiki/Usage-and-Configuration.md

About

DecisioningAssistant is a local-first MLX project for Mac: ingest PDFs and Webex threads, generate QA datasets, fine-tune models, build/update a hybrid RAG index, and chat in Streamlit with source-aware citations and document previews.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors