DecisioningAssistant

DecisioningAssistant is a local-first MLX project for macOS that ingests PDFs, Markdown files, and Webex threads, generates QA datasets, fine-tunes small instruction models, builds a hybrid local RAG index, and serves a Streamlit chat assistant with source-aware citations.

What It Does

Ingests PDF and Markdown documentation with structure-aware paragraph chunking.
Fetches Webex room history directly from the Webex REST API using rooms.json plus a YAML config.
Groups Webex data into thread-based chunks so each chunk starts with the thread root.
Generates English QA pairs locally with an MLX-loaded model.
Fine-tunes MLX-compatible models with LoRA.
Builds or updates a local Qdrant index from source chunks and optional QA pairs.
Runs a Streamlit RAG app with chat history, retrieval controls, reranking, answer selection, citations, and source popups.

Current Pipeline Highlights

PDF and Markdown ingestion are structure-aware: PDFs are packed from whole paragraphs, while Markdown keeps H1 chapters whole by default and preserves section metadata.
Webex ingestion is thread-aware: threads with fewer than 2 messages are skipped, and each thread chunk keeps room and thread metadata.
Webex metadata now includes a webexteams://... deep link to the parent/root message in the thread.
Webex QA generation uses the thread start to generate the question and uses child messages as the answer.
Webex QA can be filtered to a specific user, keeping only threads where that user appears in child messages and only that user’s child messages in the answer.
RAG retrieval supports vector search plus reranking with cross_encoder, embedding_cosine, or none.
Answer generation supports Best-of-N answer selection with reranking. Default candidate count is 4.
The Streamlit source popup can show the retrieved text and the Webex parent-message link when available.

Repository Layout

configs/
  sources.yaml
  models.yaml
  qa_generation.yaml
  finetune.yaml
  rag.yaml
  webex_fetch.yaml

data/
  raw/pdf/
  raw/markdown/
  raw/webex/
  staging/documents/
  staging/chunks/
  qa/
  rag/vectordb/

pipelines/
  01_ingest.sh
  02_generate_qa.sh
  03_finetune.sh
  04_build_rag.sh
  05_eval.sh
  06_export_rag.sh
  07_import_rag.sh

src/
  common/
  decisioning_assistant/
  ingestion/
  qa/
  rag/
  training/

wiki/
  Home.md
  Overview.md
  Technical-Details.md
  Usage-and-Configuration.md

Requirements

macOS with Apple Silicon for MLX workflows.
Python >=3.10.
English-only source material and QA generation.

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .

The base install includes the test/lint tools, PDF markdown support, MLX-VLM model support with TorchVision image utilities, and TurboQuant conversion/runtime support used by the project.

Main Configuration Files

configs/sources.yaml: PDF, Markdown, and Webex ingestion paths plus normalization/chunking settings.
configs/models.yaml: QA generator, answer model, and embedding model settings.
configs/qa_generation.yaml: QA generation, validation, split, and Webex-specific QA controls.
configs/finetune.yaml: MLX LoRA fine-tuning settings.
configs/rag.yaml: indexing, retrieval, reranking, answer selection, and prompt-budget settings.
configs/webex_fetch.yaml: direct Webex API fetch settings.

Named machine profiles are available alongside the defaults:

configs/models.m3_24gb.yaml: explicit M3 24 GB MLX-LM generation profile.
configs/rag.m3_24gb.yaml: explicit M3 24 GB retrieval/context profile.
configs/qa_generation.m3_24gb.yaml: explicit M3 24 GB QA generation profile.
configs/models.m5_pro_64gb.yaml: larger MLX-LM generation profile.
configs/models.m5_pro_64gb.gemma4.yaml: Gemma 4 MLX-VLM generation profile.
configs/rag.m5_pro_64gb.yaml: larger retrieval/context profile.
configs/qa_generation.m5_pro_64gb.yaml: denser QA generation profile.

Typical End-to-End Workflow

Fetch raw Webex spaces if needed.
Put PDFs into data/raw/pdf/ and Markdown files into data/raw/markdown/.
Run ingestion and chunking.
Generate QA (optional step).
Fine-tune model using QA dataset from step 4 (optional).
Build or update the RAG index.
Start the chat app.

Example:

#required for webex threads
decisioning-assistant webex-fetch \
  --rooms-json configs/rooms.json \
  --config configs/webex_fetch.yaml \
  --output-dir data/raw/webex

#put any pdf into pdf dir and markdown into markdown dir

decisioning-assistant ingest #required step
decisioning-assistant qa #optional
decisioning-assistant finetune --finetune-config configs/finetune.yaml #optional
decisioning-assistant rag-index --recreate
decisioning-assistant app --server-port 8501

CLI Commands

# Ingest PDF + Markdown + Webex + normalize
decisioning-assistant ingest

# Override Markdown chapter defaults for smaller chunks if needed
decisioning-assistant ingest --markdown-target-chars 900 --markdown-split-level 6

# Generate, validate, and split QA
decisioning-assistant qa

# Fine-tune with MLX LoRA
decisioning-assistant finetune --finetune-config configs/finetune.yaml

# Build or update the hybrid RAG index
decisioning-assistant rag-index

# Recreate the RAG collection from scratch
decisioning-assistant rag-index --recreate

# Export the RAG index
decisioning-assistant rag-export --output-dir data/rag/export

# Export only selected source types
decisioning-assistant rag-export --output-dir data/rag/export --source pdf

decisioning-assistant rag-export --output-dir data/rag/export --source markdown

decisioning-assistant rag-export --output-dir data/rag/export --source webex

# Import an exported RAG bundle
decisioning-assistant rag-import --input-dir data/rag/export --recreate

# Start the Streamlit app
decisioning-assistant app --server-port 8501

Direct Webex Fetch

Raw Webex exports can be created directly through the Webex API.

Example:

decisioning-assistant webex-fetch \
  --rooms-json /path/to/rooms.json \
  --config configs/webex_fetch.yaml \
  --output-dir data/raw/webex

Notes:

--room-type group is the default.
Output file names are derived from the room title and shortened to 80 characters.
The fetch config only uses token and max_total_messages.

QA Generation Notes

QA generation is local-only.
Short Webex chunks are skipped using min_webex_chunk_chars.
Webex thread QA uses generated questions plus child-message answers.
webex_user_name can restrict QA generation to replies from a specific user.
max_webex_thread_answer_chars controls the separate answer cap for Webex thread answers.

RAG Notes

Qdrant runs locally on disk.
The index can include raw source chunks, QA pairs, or both.
Retrieval reranking and answer reranking are separate stages.
The default retrieval reranker is cross_encoder.
The default answer-selection candidate count is 4.
The Streamlit app exposes the main retrieval, reranking, and prompt-budget controls in the sidebar.

Streamlit App

Run the app directly if needed:

PYTHONPATH=src streamlit run src/rag/assistant_app.py

The app provides:

session chat history,
configurable retrieval and prompt budgets,
answer Best-of-N selection,
source citations,
source popups with retrieved text,
Webex room timestamp display,
Webex parent-message deep links when available.

MLX-VLM and Gemma 4

Gemma 4 MLX checkpoints use mlx-vlm, so set provider: mlx_vlm in the relevant qa_generator or answer_model config block. The text-only MLX path remains provider: mlx or provider: mlx_lm.

Example:

answer_model:
  provider: mlx_vlm
  model: mlx-community/gemma-4-26b-a4b-it-mxfp4
  max_tokens: 2048
  temperature: 0.15
  trust_remote_code: true

The rag.chat_local CLI can also pass media to VLM models:

PYTHONPATH=src python3 -m rag.chat_local \
  "What does this screenshot show?" \
  --rag-config configs/rag.m5_pro_64gb.yaml \
  --models-config configs/models.m5_pro_64gb.gemma4.yaml \
  --image /path/to/screenshot.png

TurboQuant MLX

TurboQuant-compressed MLX models can be converted and used by the same QA, evaluation, local chat, and Streamlit app paths as standard MLX-LM models.

Convert the configured answer_model:

decisioning-assistant turboquant-convert \
  --models-config configs/models.yaml \
  --model-key answer_model \
  --mlx-path data/models/answer-model-tq3 \
  --bits 3 \
  --group-size 64

Or convert an explicit HuggingFace/local model:

decisioning-assistant turboquant-convert \
  --hf-path openai/gpt-oss-20b \
  --mlx-path data/models/gpt-oss-20b-tq3 \
  --bits 3 \
  --group-size 64

Point the app at the converted model with provider: turboquant_mlx:

answer_model:
  provider: turboquant_mlx
  model: data/models/answer-model-tq3
  max_tokens: 2048
  temperature: 0.15
  turboquant_kv_bits: 3
  turboquant_kv_group_size: 64

Set turboquant_kv_bits to 0 or omit it to use the TurboQuant weight-compressed model with the normal FP16 KV cache. Use turboquant_fast: true only for converted models that include QJL correction and where speed is preferred over the highest-quality decode.

Export and Import

Portable RAG bundles can be moved to another machine.

Example:

# Export
decisioning-assistant rag-export --output-dir data/rag/export

# Import on another machine
decisioning-assistant rag-import --input-dir data/rag/export --recreate

Notes

The defaults in configs/ were tuned for a MacBook Pro M3 with 24GB RAM, but the code is not hard-limited to that hardware.
Larger future Apple Silicon systems can increase model size, retrieval depth, and prompt budgets through config.
After changing Webex ingestion metadata, rerun ingestion, QA generation, and RAG indexing so new metadata reaches the app.
PyMuPDF uses a dual AGPL/commercial license. Check fit for your usage.

Documentation

See the wiki pages in wiki/ for a fuller walkthrough:

wiki/Overview.md
wiki/Technical-Details.md
wiki/Usage-and-Configuration.md

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.streamlit		.streamlit
configs		configs
pipelines		pipelines
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DecisioningAssistant

What It Does

Current Pipeline Highlights

Repository Layout

Requirements

Installation

Main Configuration Files

Typical End-to-End Workflow

CLI Commands

Direct Webex Fetch

QA Generation Notes

RAG Notes

Streamlit App

MLX-VLM and Gemma 4

TurboQuant MLX

Export and Import

Notes

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DecisioningAssistant

What It Does

Current Pipeline Highlights

Repository Layout

Requirements

Installation

Main Configuration Files

Typical End-to-End Workflow

CLI Commands

Direct Webex Fetch

QA Generation Notes

RAG Notes

Streamlit App

MLX-VLM and Gemma 4

TurboQuant MLX

Export and Import

Notes

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages