Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
8913e1b
Fix settings framework to load defaults when JSON file missing (#31) …
BeckettFrey Jan 12, 2026
9422954
Add configurable first-launch startup script with loading dialog (#30…
BeckettFrey Jan 12, 2026
78ee4e9
feat: download models on first startup for GOP instance
BeckettFrey Jan 12, 2026
70ad522
fix: remove os.environ.clear() (#38)
BeckettFrey Jan 13, 2026
854d6f9
Extract ModelSelectionPanel component to eliminate code duplication (…
Copilot Jan 13, 2026
22d733b
fix: w2tg model registation support
BeckettFrey Jan 13, 2026
2a595d2
Fix alignment refresh on dataset page navigation (#44)
Copilot Jan 13, 2026
f6a5ac3
Fix model list refresh after training and on tab navigation (#43)
Copilot Jan 13, 2026
8308845
fix: normalize alignment status values to lowercase (#42)
Copilot Jan 13, 2026
fb1ee34
feat: add YAML-based configuration system for flexible app customization
BeckettFrey Jan 15, 2026
8d76065
Initial plan
Copilot Jan 15, 2026
024e451
feat: Add CSV viewer dialog and Details button to dataset management
Copilot Jan 15, 2026
edd2f69
test: Add tests for CSV viewer dialog component
Copilot Jan 15, 2026
eae1b08
Add CSV viewer dialog for dataset analysis summaries (#49)
Copilot Jan 16, 2026
4add427
Fix pipeline stacker scrollability when content exceeds viewport (#47)
Copilot Jan 16, 2026
f110d58
fix: raise errors to propagate to gui
BeckettFrey Jan 16, 2026
a48e9ba
fix: popup blurs full window not sub container
BeckettFrey Jan 16, 2026
6095459
Prune (#50)
BeckettFrey Jan 26, 2026
9a6d710
refac: remove circular imports
BeckettFrey Jan 27, 2026
66af564
fix: add various deps in bundle
BeckettFrey Jan 27, 2026
c6a9ed9
docs: update root docstring
BeckettFrey Jan 27, 2026
448b926
docs: add analyzer module documentation
BeckettFrey Jan 28, 2026
a6cc132
docs: add engine module documentation
BeckettFrey Jan 28, 2026
d79b721
docs storage doctring update
BeckettFrey Jan 28, 2026
b09d3a1
docs: add gui module documentation
BeckettFrey Jan 28, 2026
0e9c1f1
add template researcher config and uncomment commented block
BeckettFrey Jan 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,12 @@ computed_likelihoods/
.idea/
*.iml

# Test files
test_imports.py
IMPLEMENTATION_SUMMARY.md
CODE_REVIEW_CHECKLIST.md
QUICKSTART_STARTUP_SCRIPT.md

# OS files
.DS_Store
Thumbs.db
45 changes: 37 additions & 8 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,19 +83,48 @@ Abstraction layer for speech toolkit backends (MFA, W2TG, etc.) that perform ali

**Architecture:**
- **base.py**: `AlignmentEngine` abstract base class defining the contract for all engines
- **Concrete Engines**:
- `mfa_engine.py`: Montreal Forced Aligner integration
- `w2tg_engine.py`: Wav2TextGrid engine integration
- `whisperx_engine.py`: WhisperX engine (in development)
- **EngineManager**: Discovery, registration, and retrieval system for engines
- **mfa_engine.py**: Montreal Forced Aligner integration (align, train via adapt)
- **w2tg_engine.py**: Wav2TextGrid engine integration (align, train)
- **EngineManager**: Singleton manager for engine discovery and retrieval

**Storage Structure:**
```
~/.voxkit/{engine_id}/
├── aligner/
│ └── aligner_settings.json # Alignment tool settings
├── train/
│ ├── trainer_settings.json # Training tool settings
│ └── {model_id}/ # Trained models
│ ├── voxkit_model.json
│ └── entrypoint.model
└── ...
```

**API:**
- `ManageEngines.list_engines()` → List[str]: Registered engine IDs
- `ManageEngines.get_engine(id)` → AlignmentEngine: Retrieve by ID
- `ManageEngines.get_tool_providers(tool)` → dict: Engines providing a tool type

### 3.4 Analyzer Layer (`voxkit.analyzers`)
Extract structured metadata from datasets at registration time.

**Architecture:**
- **base.py**: Abstract base class
- **default_analyzer.py**: Built-in analyzer extracting file counts, speakers, duration
- **ManageAnalyzers**: Discovery and registration system for analyzers
- **base.py**: `DatasetAnalyzer` abstract base class defining the contract for all analyzers
- **default_analyzer.py**: Built-in analyzer extracting speaker counts and audio file counts
- **ManageAnalyzers**: Singleton manager for analyzer discovery and retrieval

**Output Structure:**
```
~/.voxkit/datasets/{dataset_id}/
├── voxkit_dataset.json # Dataset metadata
├── {analyzer_name}_summary.csv # Analyzer output (e.g., Default_summary.csv)
└── alignments/ # Alignment outputs
```

**API:**
- `ManageAnalyzers.list_analyzers()` → List[str]: Registered analyzer IDs
- `ManageAnalyzers.get_analyzer(id)` → DatasetAnalyzer: Retrieve by ID
- `ManageAnalyzers.get_analyzers()` → dict: All registered analyzers

**Purpose:**
Analyzers produce CSV summaries of datasets that can be visualized and analyzed within VoxKit without re-scanning the filesystem.
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ dev: ## Run the development server

build: clean ## Build standalone executable for current platform
@echo "$(BLUE)Building VoxKit for macOS...$(RESET)"
uv run --group installation python build.py build --entry main.py --name VoxKit --windowed
uv run --group installation python build.py build --entry main.py --name VoxKit --icon ./assets/voxkit.icns --windowed

build-info: ## Show information about the built app
@echo "$(BLUE)Checking build output...$(RESET)"
Expand Down
Binary file added assets/voxkit.icns
Binary file not shown.
Binary file added assets/voxkit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 12 additions & 2 deletions build.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,15 +135,19 @@ def build(args):
'g2p_en',
'speechbrain',
'speechbrain.utils',
# Engine modules that need to be explicitly included
'voxkit.engines._w2tg_engine',
'voxkit.engines._whisperx_engine',
'voxkit.engines.mfa_engine',
'PyQt6.QtCore',
'PyQt6.QtGui',
'PyQt6.QtWidgets',
'PyQt6.QtSvg',
'PyQt6.QtSvgWidgets'
'PyQt6.QtSvgWidgets',
'rich._unicode_data.unicode13-0-0',
'rich._unicode_data.unicode14-0-0',
'rich._unicode_data.unicode15-0-0',
'rich._unicode_data.unicode16-0-0',
'rich._unicode_data.unicode17-0-0',
]
for hi in default_hidden + args.hidden_import:
opts.append(f'--hidden-import={hi}')
Expand All @@ -155,6 +159,12 @@ def build(args):

# Add data
sep = ';' if os.name == 'nt' else ':'
# Add config folder if it exists
config_dir = Path(__file__).parent / "config"
if config_dir.exists() and config_dir.is_dir():
print(f"[INFO] Adding config folder to build assets")
opts.append(f'--add-data={config_dir}{sep}config')

for ad in args.add_data:
if sep in ad:
opts.append(f'--add-data={ad}')
Expand Down
54 changes: 54 additions & 0 deletions config/app_info.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Application Information Configuration
# This file contains metadata about the application version and purpose

app_name: "VoxKit"
version: "0.1.0"
description: "AI/ML Research -> Clinical Applications (Speech Pathology)"
help_url: "http://localhost:3000/help"

# Introduction text displayed to users
introduction: |
VoxKit bridges advanced ML alignment tools and clinical speech pathology research.
This toolkit enables rigorous phonetic analysis without requiring deep technical
expertise in machine learning or command-line interfaces.

Core Workflow:
1. Register and analyze your speech datasets
2. Train custom acoustic models or use pretrained engines (MFA, W2TG)
3. Generate phoneme-level forced alignments with timing precision
4. Extract Goodness of Pronunciation (GOP) scores for clinical assessment
5. Export results with full provenance tracking for reproducible research

Key Capabilities:
- Multiple alignment engines (MFA, Wav2TextGrid, WhisperX in development)
- Extensible analyzer system for custom metadata extraction
- Complete provenance tracking for every operation
- Post-build configuration via YAML (no coding required for workflow changes)

# Release information
release_date: "2026-01-14"
release_notes: |
v0.1.0 - Initial Configurable Release
- Declarative pipeline configuration via YAML
- Support for MFA and W2TG alignment engines
- Enhanced dataset analyzers with custom metadata extraction
- Model management interface with version tracking
- Startup routines for automated asset downloads

Configuration Changes:
- Researchers can now modify workflows by editing config/pipeline_definitions.yaml
- No code changes required for common workflow adaptations
- Versioned configuration files support reproducible research protocols

# Contact and support
contact_info:
github_issues: "https://github.com/wisclab/voxkit/issues"
email_support: "support@voxkit.org"
documentation: "http://localhost:3000/help"

# Research context
research_context: |
VoxKit was developed through collaboration between WISCLab and the Brain Behavior
Analytics Lab to democratize access to state-of-the-art forced alignment tools.
The platform is designed around established speech pathology research methodologies
rather than generic audio processing workflows.
166 changes: 166 additions & 0 deletions config/pipeline_definitions.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Pipeline Configuration
# Define the stackers to include in the pipeline and their order
# This file can be modified post-build to adapt to custom workflows

pipeline:

# ======
# INTRODUCTION & ORIENTATION
# ======
- id: "introduction"
label: "Introduction"
stacker_class: "MarkdownStacker"
enabled: true
markdown_content: |
<h1><a href="https://voxkit-web.vercel.app">Welcome</a></h1>

VoxKit separates developer-level code from researcher-level configuration:
- **Developer level**: Implement stackers, engines, analyzers, startup_script (requires coding).
- **Researcher level**: Edit YAML files in config folder post-build.

###

On the left side, you will find the `pipeline workflow navigation` menu, each item in this menu is called a `Stacker`. Each instance (version) of VoxKit comes with a set of Stackers baked in (see the available stackers for this version below). These Stackers can be composed into Pipelines to implement research workflows or repetitive data processing tasks.
The order and inclusion of Stackers in an apps Pipeline is configurable via the `config/pipeline_definitions.yaml` file (bundled with the app), you can access this file via the command line by navigating to the internal assets within the bundle. This file can be modified to adapt workflows without requiring code changes, or rebuilding the application.

###

## Researcher Level Configuration
- Review the default pipeline steps in the left navigation menu
- Verify startup routines have downloaded necessary models/datasets
- Customize collapsible help sections for each stacker as needed by editing `config/pipeline_definitions.yaml`
- Reorder and enable/disable stackers by editing `config/pipeline_definitions.yaml`
- Distribute your customized config file or app bundle to the relevant parties

###

## Available Stackers
- **TrainingStacker**: Train acoustic models on labeled datasets
- **PredictionStacker**: Generate forced alignments using trained/pretrained models
- **PLLRStacker**: Extract Goodness of Pronunciation (GOP) scores from alignments
- **MarkdownStacker**: Display informational markdown content (this step)

###

## Need Help?
- Visit the [Help Center](https://voxkit-web.vercel.app/help) for detailed guidance and overviews
- Report issues via [GitHub](https://github.com/BrainBehaviorAnalyticsLab/voxkit-desktop/issues)

collapsible_sections:
"Key Terminology": |
Engine: Low-level component providing a set of tools (MFA, W2TG, etc.).
Analyzer: Data passovers that happen when the dataset is registered (i.e. dataset summary, one time analyses, etc.).
Stacker: Orchestration layer composing engines/analyzers/gui-components into workflow steps.
Pipeline: Named sequence of Stackers (workflow steps) with navigation.
Artifact: Output files added to the host os's local memory (TextGrids, CSVs, models, logs, etc.).
startup_script: Code that runs once at application startup to prepare adn retrieve assets (requires coding).

"Configuration Best Practices": |
- Use Unicode symbols (Ⓐ Ⓑ Ⓒ) in labels for visual hierarchy
- Keep collapsible sections concise (1-3 sentences)
- Set enabled: false to hide steps without deleting configuration
- Test configuration changes in development before distributing
- Version control your YAML files alongside research protocols

# ======
# PIPELINE STEP A: MODEL TRAINING
# ======
- id: "training"
label: "Ⓐ Train Aligners"
stacker_class: "TrainingStacker"
enabled: true
collapsible_sections:
"Collapsible Section 1": |
To be, or not to be: that is the question: whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them? To die: to sleep; no more; and, by a sleep to say we end the heart-ache and the thousand natural shocks that flesh is heir to, 'tis a consummation devoutly to be wish'd. To die, to sleep; to sleep: perchance to dream: ay, there's the rub; for in that sleep of death what dreams may come when we have shuffled off this mortal coil, must give us pause. There's the respect that makes calamity of so long a life; for who would bear the whips and scorns of time, the oppressor's wrong, the proud man's contumely, the pangs of dispriz'd love, the law's delay, the insolence of office, and the spurns that patient merit of the unworthy takes, when he himself might his quietus make with a bare bodkin? Who would fardels bear, to grunt and sweat under a weary life, but that the dread of something after death, the undiscover'd country from whose bourn no traveller returns, puzzles the will, and makes us rather bear those ills we have, than fly to others that we know not of? Thus consience doth make cowards of us all; and thus the native hue of resolution is sicklied o'er with the pale cast of thought, and enterprises of great pith and moment with this regard their currents turn awry, and lose the name of action.

"Collapsible Section 2": |
- Bullet point one
- Bullet point two
- Bullet point three

"Collapsible Section 3": |
1. Numbered item one
2. Numbered item two
3. Numbered item three

# ======
# PIPELINE STEP B: FORCED ALIGNMENT
# ======
- id: "prediction"
label: "Ⓑ Generate Alignments"
stacker_class: "PredictionStacker"
enabled: true
collapsible_sections:
"Collapsible Section 1": |
To be, or not to be: that is the question: whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them? To die: to sleep; no more; and, by a sleep to say we end the heart-ache and the thousand natural shocks that flesh is heir to, 'tis a consummation devoutly to be wish'd. To die, to sleep; to sleep: perchance to dream: ay, there's the rub; for in that sleep of death what dreams may come when we have shuffled off this mortal coil, must give us pause. There's the respect that makes calamity of so long a life; for who would bear the whips and scorns of time, the oppressor's wrong, the proud man's contumely, the pangs of dispriz'd love, the law's delay, the insolence of office, and the spurns that patient merit of the unworthy takes, when he himself might his quietus make with a bare bodkin? Who would fardels bear, to grunt and sweat under a weary life, but that the dread of something after death, the undiscover'd country from whose bourn no traveller returns, puzzles the will, and makes us rather bear those ills we have, than fly to others that we know not of? Thus consience doth make cowards of us all; and thus the native hue of resolution is sicklied o'er with the pale cast of thought, and enterprises of great pith and moment with this regard their currents turn awry, and lose the name of action.

"Collapsible Section 2": |
- Bullet point one
- Bullet point two
- Bullet point three

"Collapsible Section 3": |
1. Numbered item one
2. Numbered item two
3. Numbered item three


# ======
# PIPELINE STEP C: GOP EXTRACTION
# ======
- id: "pllr"
label: "Ⓒ Extract GOP Scoring"
stacker_class: "PLLRStacker"
enabled: true
collapsible_sections:
"Collapsible Section 1": |
To be, or not to be: that is the question: whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them? To die: to sleep; no more; and, by a sleep to say we end the heart-ache and the thousand natural shocks that flesh is heir to, 'tis a consummation devoutly to be wish'd. To die, to sleep; to sleep: perchance to dream: ay, there's the rub; for in that sleep of death what dreams may come when we have shuffled off this mortal coil, must give us pause. There's the respect that makes calamity of so long a life; for who would bear the whips and scorns of time, the oppressor's wrong, the proud man's contumely, the pangs of dispriz'd love, the law's delay, the insolence of office, and the spurns that patient merit of the unworthy takes, when he himself might his quietus make with a bare bodkin? Who would fardels bear, to grunt and sweat under a weary life, but that the dread of something after death, the undiscover'd country from whose bourn no traveller returns, puzzles the will, and makes us rather bear those ills we have, than fly to others that we know not of? Thus consience doth make cowards of us all; and thus the native hue of resolution is sicklied o'er with the pale cast of thought, and enterprises of great pith and moment with this regard their currents turn awry, and lose the name of action.

"Collapsible Section 2": |
- Bullet point one
- Bullet point two
- Bullet point three

"Collapsible Section 3": |
1. Numbered item one
2. Numbered item two
3. Numbered item three

# ======
# UI CONFIGURATION
# ======
ui:
menu_max_width: 500 # Maximum width in pixels for navigation menu
animation_duration: 300 # Slide transition time in milliseconds
content_spacing: 20 # Padding between content elements in pixels

# ======
# NOTES FOR RESEARCHERS
# ======
#
# Customizing This Configuration:
#
# 1. To add a new pipeline step:
# - Developer must first implement a new Stacker subclass in code
# - Build the updated application executable
#
# 2. To reorder workflow steps:
# - Simply change the order of entries in the pipeline configuration above
# - Steps show in the order defined here
#
# 3. To hide a Stacker:
# - Set enabled: false (rather than deleting the configuration)
#
# 4. To customize guidance for your study:
# - Add collapsible_sections to match your protocol
# - Reference your lab's standard operating procedures
#
# 5. Version control recommendations:
# - If modifying the code, fork the repo and track changes via Git
# - Tag config versions when distributing to research staff
# - Document any deviations from standard workflows
#
# For questions about configuration options, see:
# http://voxkit-web.vercel.app/help/configuration
#
# ======
7 changes: 0 additions & 7 deletions docs/index.html

This file was deleted.

46 changes: 0 additions & 46 deletions docs/search.js

This file was deleted.

Loading
Loading