Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new post-processing pipeline step to ensure WhisperX transcripts always have word-level timing by interpolating missing start/end timestamps (notably for numeric/currency tokens), and wires it in as the first pipeline step so downstream processing and caption generation operate on monotonic timing data.
Changes:
- Introduce
fill-timing-gapsstep implementation + unit tests to interpolate missing word timestamps. - Add
fillTimingGapsconfiguration schema/defaults and wire the step into the Lambda pipeline as Step 1. - Update documentation (README + CLAUDE.md) to reflect the new step and pipeline order.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/config/index.ts | Adds FillTimingGapsConfigSchema and inserts fillTimingGaps into PipelineConfigSchema as Step 1. |
| packages/config/index.test.ts | Adds schema tests for FillTimingGapsConfigSchema and updates pipeline default expectations. |
| lambdas/pipeline/steps/fill-timing-gaps.ts | Implements gap detection + character-proportional interpolation to fill missing word timing. |
| lambdas/pipeline/steps/fill-timing-gaps.test.ts | Adds unit coverage for gap classification, padding, proportional distribution, and partial gaps. |
| lambdas/pipeline/index.ts | Wires the new step into the durable workflow before replacement/LLM/normalization. |
| README.md | Documents the new Step 1 and updates the architecture diagram and config docs. |
| CLAUDE.md | Updates repo architecture docs to include the new step. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
eoinsha
reviewed
Feb 13, 2026
eoinsha
approved these changes
Feb 13, 2026
Co-authored-by: Eoin Shanaghy <eoin.shanaghy@fourtheorem.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new fill timing gaps pipeline step that interpolates missing word-level timestamps in WhisperX transcripts. WhisperX's wav2vec2 alignment model silently drops
start/endtiming on numeric tokens (numbers, currency, percentages), which cascades into backwards timing jumps in generated captions. This step runs first in the post-processing pipeline and fills every gap using character-proportional interpolation, so all downstream steps receive clean timing data.The Problem
WhisperX uses wav2vec2 for forced alignment after transcription. This model cannot align tokens it doesn't recognize — primarily numbers, currency, and percentages. These words come back with
startandendcompletely absent from the JSON.Real data from episode 151 (6,784 words across 401 raw segments):
2026.1001410326401,2,241281$0.2012%.15%1010326410020 words affected (0.29% of total) — 100% are numeric tokens. Segment 248 was worst hit with 5 of 38 words missing timing.
A word with missing timing looks like this in the raw transcript JSON:
{ "word": "128", "score": null // no "start", no "end" — keys are completely absent }Impact on Captions
Missing word timestamps cause two downstream problems:
VTT backwards timing jumps — When a word has no
endtime, the caption generator falls back to0or the next available timestamp, creating cues wherestartTime > endTimeof the previous cue. Episode 151 had 4 backwards jumps across 12,527 VTT cues, making captions display out of order in players.Segments with
end: 0— When the last word in a segment has no timing, the segment's computed end time collapses to0, causing an entire segment to appear at timestamp 00:00:00.000 in captions.The Solution
The fill-timing-gaps step runs as Step 1 in the pipeline (before replacement rules, LLM refinement, and normalization) and uses character-proportional interpolation:
startandendstart(beginning of segment),end(end of segment),middle(between timed words), orentireSegmentend(orsegment.start); right anchor = next word'sstart(orsegment.end)segment duration / segment text lengthgives seconds-per-character for that segment's speech ratecharRateseconds so interpolated words don't crowd their neighbors (skipped when interval is too tight)score: 0on every filled word to distinguish from real alignmentsPartial gaps (only
startor onlyendmissing) are handled separately usingcharRate * word.lengthto estimate the missing bound.Before/After Examples
Word
"100"in segment 30 (segment: 78.627s–84.261s):Word
"$0.20"in segment 233 (segment: 685.614s–691.898s):Word
"128"in segment 197 (segment: 579.870s–585.894s):VTT timing (segment 248 — worst affected):
Changes
lambdas/pipeline/steps/fill-timing-gaps.tslambdas/pipeline/steps/fill-timing-gaps.test.tslambdas/pipeline/index.tspackages/config/index.tsFillTimingGapsConfigSchema+ renumber step commentspackages/config/index.test.tsCLAUDE.mdREADME.md+803 / -13 lines across 7 files.
Pipeline Order
Test Coverage
18 test cases covering:
segment.endas right anchor)segment.startas left anchor)enabled: falsereturns early, words unchangedstartpresent,endmissingendpresent,startmissingPlus 2 config schema tests (
FillTimingGapsConfigSchemadefaults, override) and 1PipelineConfigSchemaintegration test update.All 294 tests pass, lint clean.
Verification Checklist
start,end, andscore: 0end: 0in normalized outputscore: 0— distinguishable from real alignmentsfillTimingGaps.enabled: true)