Skip to content

feat: fill timing gaps step#23

Merged
lmammino merged 3 commits intomainfrom
feat/fill-word-sync-gaps
Feb 17, 2026
Merged

feat: fill timing gaps step#23
lmammino merged 3 commits intomainfrom
feat/fill-word-sync-gaps

Conversation

@lmammino
Copy link
Copy Markdown
Contributor

Summary

Adds a new fill timing gaps pipeline step that interpolates missing word-level timestamps in WhisperX transcripts. WhisperX's wav2vec2 alignment model silently drops start/end timing on numeric tokens (numbers, currency, percentages), which cascades into backwards timing jumps in generated captions. This step runs first in the post-processing pipeline and fills every gap using character-proportional interpolation, so all downstream steps receive clean timing data.

The Problem

WhisperX uses wav2vec2 for forced alignment after transcription. This model cannot align tokens it doesn't recognize — primarily numbers, currency, and percentages. These words come back with start and end completely absent from the JSON.

Real data from episode 151 (6,784 words across 401 raw segments):

Word Segment Context
2026. 4 "...in the year 2026. It's..."
100 30 "...gives you 100 concurrent..."
14 67 "...about 14 different..."
10 85 "...around 10 instances..."
32 131 "...up to 32 vCPUs..."
64 131 "...or 64 gigabytes..."
0 175 "...scaling from 0 to..."
1, 175 "...from 0 to 1, you..."
2, 175 "...from 1 to 2, that's..."
24 187 "...for 24 hours..."
128 197 "...up to 128 vCPUs..."
1 197 "...and 1 terabyte..."
$0.20 233 "...about $0.20 per..."
12%. 239 "...around 12%. So..."
15% 241 "...about 15% cheaper..."
10 243 "...around 10 percent..."
10 248 "...like 10 bucks..."
32 248 "...with 32 gigs..."
64 248 "...or 64 gigs..."
100 248 "...about 100 dollars..."

20 words affected (0.29% of total) — 100% are numeric tokens. Segment 248 was worst hit with 5 of 38 words missing timing.

A word with missing timing looks like this in the raw transcript JSON:

{
  "word": "128",
  "score": null
  // no "start", no "end" — keys are completely absent
}

Impact on Captions

Missing word timestamps cause two downstream problems:

  1. VTT backwards timing jumps — When a word has no end time, the caption generator falls back to 0 or the next available timestamp, creating cues where startTime > endTime of the previous cue. Episode 151 had 4 backwards jumps across 12,527 VTT cues, making captions display out of order in players.

  2. Segments with end: 0 — When the last word in a segment has no timing, the segment's computed end time collapses to 0, causing an entire segment to appear at timestamp 00:00:00.000 in captions.

The Solution

The fill-timing-gaps step runs as Step 1 in the pipeline (before replacement rules, LLM refinement, and normalization) and uses character-proportional interpolation:

  1. Find gaps — Scan each segment's word array for consecutive runs of words missing both start and end
  2. Classify — Categorize each gap as start (beginning of segment), end (end of segment), middle (between timed words), or entireSegment
  3. Resolve anchors — Left anchor = previous word's end (or segment.start); right anchor = next word's start (or segment.end)
  4. Compute character ratesegment duration / segment text length gives seconds-per-character for that segment's speech rate
  5. Apply padding — Offset from anchors by charRate seconds so interpolated words don't crowd their neighbors (skipped when interval is too tight)
  6. Distribute proportionally — Divide the padded interval among gap words proportional to each word's character count
  7. Mark as interpolated — Set score: 0 on every filled word to distinguish from real alignments

Partial gaps (only start or only end missing) are handled separately using charRate * word.length to estimate the missing bound.

Before/After Examples

Word "100" in segment 30 (segment: 78.627s–84.261s):

Before: { "word": "100" }                          // no timing at all
After:  { "word": "100", "start": 80.473, "end": 81.098, "score": 0 }

Word "$0.20" in segment 233 (segment: 685.614s–691.898s):

Before: { "word": "$0.20" }
After:  { "word": "$0.20", "start": 687.843, "end": 688.891, "score": 0 }

Word "128" in segment 197 (segment: 579.870s–585.894s):

Before: { "word": "128" }
After:  { "word": "128", "start": 582.066, "end": 582.624, "score": 0 }

VTT timing (segment 248 — worst affected):

Before: 5 untimed words caused backwards jumps in 4 consecutive cues
After:  All cues monotonically increasing — 12,527/12,527 cues chronological

Changes

File Change
lambdas/pipeline/steps/fill-timing-gaps.ts New step implementation (247 lines)
lambdas/pipeline/steps/fill-timing-gaps.test.ts 18 test cases (445 lines)
lambdas/pipeline/index.ts Wire step into pipeline as Step 1
packages/config/index.ts FillTimingGapsConfigSchema + renumber step comments
packages/config/index.test.ts Config schema tests (defaults, override)
CLAUDE.md Update project structure and architecture docs
README.md Add Step 1 docs, config table, update Mermaid diagram

+803 / -13 lines across 7 files.

Pipeline Order

Before                          After
──────                          ─────
0. Transcription (GPU)          0. Transcription (GPU)
                                1. Fill timing gaps        ← NEW
1. Replacement rules            2. Replacement rules
2. LLM refinement               3. LLM refinement
3. Segments normalization        4. Segments normalization
4. Caption generation            5. Caption generation
5. Notification                  6. Notification

Test Coverage

18 test cases covering:

# Test case
1 Single missing word between two timed words
2 Multiple separate gaps in one segment
3 No gaps — returns zero stats, leaves words unchanged
4 Gap at end of segment (uses segment.end as right anchor)
5 Gap at start of segment (uses segment.start as left anchor)
6 3 consecutive missing words — proportional distribution
7 Entire segment has missing timing
8 Character-proportional allocation (2-char vs 12-char word)
9 Padding applied when space allows
10 Padding skipped when interval is tight
11 Zero-duration gap (anchors identical)
12 Score set to 0 for filled words, existing scores untouched
13 Segments without words array skipped gracefully
14 Empty transcript returns zero stats
15 Real-world pattern: numbers and currency words
16 Config enabled: false returns early, words unchanged
17 Partial gap: start present, end missing
18 Partial gap: end present, start missing

Plus 2 config schema tests (FillTimingGapsConfigSchema defaults, override) and 1 PipelineConfigSchema integration test update.

All 294 tests pass, lint clean.

Verification Checklist

  • All 20 affected words in episode 151 now have interpolated start, end, and score: 0
  • VTT output has 0 backwards timing jumps (was 4)
  • All 12,527 VTT cues are chronologically ordered
  • No segments with end: 0 in normalized output
  • Filled words marked with score: 0 — distinguishable from real alignments
  • Step is enabled by default (fillTimingGaps.enabled: true)
  • Step can be disabled via config without affecting other steps
  • Runs before replacement rules so all downstream steps get clean timing
  • 18 unit tests + config tests all passing
  • Lint clean (Biome)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new post-processing pipeline step to ensure WhisperX transcripts always have word-level timing by interpolating missing start/end timestamps (notably for numeric/currency tokens), and wires it in as the first pipeline step so downstream processing and caption generation operate on monotonic timing data.

Changes:

  • Introduce fill-timing-gaps step implementation + unit tests to interpolate missing word timestamps.
  • Add fillTimingGaps configuration schema/defaults and wire the step into the Lambda pipeline as Step 1.
  • Update documentation (README + CLAUDE.md) to reflect the new step and pipeline order.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/config/index.ts Adds FillTimingGapsConfigSchema and inserts fillTimingGaps into PipelineConfigSchema as Step 1.
packages/config/index.test.ts Adds schema tests for FillTimingGapsConfigSchema and updates pipeline default expectations.
lambdas/pipeline/steps/fill-timing-gaps.ts Implements gap detection + character-proportional interpolation to fill missing word timing.
lambdas/pipeline/steps/fill-timing-gaps.test.ts Adds unit coverage for gap classification, padding, proportional distribution, and partial gaps.
lambdas/pipeline/index.ts Wires the new step into the durable workflow before replacement/LLM/normalization.
README.md Documents the new Step 1 and updates the architecture diagram and config docs.
CLAUDE.md Updates repo architecture docs to include the new step.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Comment thread lambdas/pipeline/steps/fill-timing-gaps.ts Outdated
Comment thread lambdas/pipeline/steps/fill-timing-gaps.ts
Comment thread lambdas/pipeline/steps/fill-timing-gaps.ts
Comment thread lambdas/pipeline/steps/fill-timing-gaps.test.ts Outdated
Co-authored-by: Eoin Shanaghy <eoin.shanaghy@fourtheorem.com>
@lmammino lmammino merged commit 1ac60db into main Feb 17, 2026
4 checks passed
@lmammino lmammino deleted the feat/fill-word-sync-gaps branch February 17, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants