feat: fill timing gaps step by lmammino · Pull Request #23 · fourTheorem/podwhisperer

lmammino · 2026-02-12T12:13:51Z

Summary

Adds a new fill timing gaps pipeline step that interpolates missing word-level timestamps in WhisperX transcripts. WhisperX's wav2vec2 alignment model silently drops start/end timing on numeric tokens (numbers, currency, percentages), which cascades into backwards timing jumps in generated captions. This step runs first in the post-processing pipeline and fills every gap using character-proportional interpolation, so all downstream steps receive clean timing data.

The Problem

WhisperX uses wav2vec2 for forced alignment after transcription. This model cannot align tokens it doesn't recognize — primarily numbers, currency, and percentages. These words come back with start and end completely absent from the JSON.

Real data from episode 151 (6,784 words across 401 raw segments):

Word	Segment	Context
`2026.`	4	"...in the year 2026. It's..."
`100`	30	"...gives you 100 concurrent..."
`14`	67	"...about 14 different..."
`10`	85	"...around 10 instances..."
`32`	131	"...up to 32 vCPUs..."
`64`	131	"...or 64 gigabytes..."
`0`	175	"...scaling from 0 to..."
`1,`	175	"...from 0 to 1, you..."
`2,`	175	"...from 1 to 2, that's..."
`24`	187	"...for 24 hours..."
`128`	197	"...up to 128 vCPUs..."
`1`	197	"...and 1 terabyte..."
`$0.20`	233	"...about $0.20 per..."
`12%.`	239	"...around 12%. So..."
`15%`	241	"...about 15% cheaper..."
`10`	243	"...around 10 percent..."
`10`	248	"...like 10 bucks..."
`32`	248	"...with 32 gigs..."
`64`	248	"...or 64 gigs..."
`100`	248	"...about 100 dollars..."

20 words affected (0.29% of total) — 100% are numeric tokens. Segment 248 was worst hit with 5 of 38 words missing timing.

A word with missing timing looks like this in the raw transcript JSON:

{
  "word": "128",
  "score": null
  // no "start", no "end" — keys are completely absent
}

Impact on Captions

Missing word timestamps cause two downstream problems:

VTT backwards timing jumps — When a word has no end time, the caption generator falls back to 0 or the next available timestamp, creating cues where startTime > endTime of the previous cue. Episode 151 had 4 backwards jumps across 12,527 VTT cues, making captions display out of order in players.
Segments with end: 0 — When the last word in a segment has no timing, the segment's computed end time collapses to 0, causing an entire segment to appear at timestamp 00:00:00.000 in captions.

The Solution

The fill-timing-gaps step runs as Step 1 in the pipeline (before replacement rules, LLM refinement, and normalization) and uses character-proportional interpolation:

Find gaps — Scan each segment's word array for consecutive runs of words missing both start and end
Classify — Categorize each gap as start (beginning of segment), end (end of segment), middle (between timed words), or entireSegment
Resolve anchors — Left anchor = previous word's end (or segment.start); right anchor = next word's start (or segment.end)
Compute character rate — segment duration / segment text length gives seconds-per-character for that segment's speech rate
Apply padding — Offset from anchors by charRate seconds so interpolated words don't crowd their neighbors (skipped when interval is too tight)
Distribute proportionally — Divide the padded interval among gap words proportional to each word's character count
Mark as interpolated — Set score: 0 on every filled word to distinguish from real alignments

Partial gaps (only start or only end missing) are handled separately using charRate * word.length to estimate the missing bound.

Before/After Examples

Word "100" in segment 30 (segment: 78.627s–84.261s):

Before: { "word": "100" }                          // no timing at all
After:  { "word": "100", "start": 80.473, "end": 81.098, "score": 0 }

Word "$0.20" in segment 233 (segment: 685.614s–691.898s):

Before: { "word": "$0.20" }
After:  { "word": "$0.20", "start": 687.843, "end": 688.891, "score": 0 }

Word "128" in segment 197 (segment: 579.870s–585.894s):

Before: { "word": "128" }
After:  { "word": "128", "start": 582.066, "end": 582.624, "score": 0 }

VTT timing (segment 248 — worst affected):

Before: 5 untimed words caused backwards jumps in 4 consecutive cues
After:  All cues monotonically increasing — 12,527/12,527 cues chronological

Changes

File	Change
`lambdas/pipeline/steps/fill-timing-gaps.ts`	New step implementation (247 lines)
`lambdas/pipeline/steps/fill-timing-gaps.test.ts`	18 test cases (445 lines)
`lambdas/pipeline/index.ts`	Wire step into pipeline as Step 1
`packages/config/index.ts`	`FillTimingGapsConfigSchema` + renumber step comments
`packages/config/index.test.ts`	Config schema tests (defaults, override)
`CLAUDE.md`	Update project structure and architecture docs
`README.md`	Add Step 1 docs, config table, update Mermaid diagram

+803 / -13 lines across 7 files.

Pipeline Order

Before                          After
──────                          ─────
0. Transcription (GPU)          0. Transcription (GPU)
                                1. Fill timing gaps        ← NEW
1. Replacement rules            2. Replacement rules
2. LLM refinement               3. LLM refinement
3. Segments normalization        4. Segments normalization
4. Caption generation            5. Caption generation
5. Notification                  6. Notification

Test Coverage

18 test cases covering:

#	Test case
1	Single missing word between two timed words
2	Multiple separate gaps in one segment
3	No gaps — returns zero stats, leaves words unchanged
4	Gap at end of segment (uses `segment.end` as right anchor)
5	Gap at start of segment (uses `segment.start` as left anchor)
6	3 consecutive missing words — proportional distribution
7	Entire segment has missing timing
8	Character-proportional allocation (2-char vs 12-char word)
9	Padding applied when space allows
10	Padding skipped when interval is tight
11	Zero-duration gap (anchors identical)
12	Score set to 0 for filled words, existing scores untouched
13	Segments without words array skipped gracefully
14	Empty transcript returns zero stats
15	Real-world pattern: numbers and currency words
16	Config `enabled: false` returns early, words unchanged
17	Partial gap: `start` present, `end` missing
18	Partial gap: `end` present, `start` missing

Plus 2 config schema tests (FillTimingGapsConfigSchema defaults, override) and 1 PipelineConfigSchema integration test update.

All 294 tests pass, lint clean.

Verification Checklist

Copilot

Pull request overview

Adds a new post-processing pipeline step to ensure WhisperX transcripts always have word-level timing by interpolating missing start/end timestamps (notably for numeric/currency tokens), and wires it in as the first pipeline step so downstream processing and caption generation operate on monotonic timing data.

Changes:

Introduce fill-timing-gaps step implementation + unit tests to interpolate missing word timestamps.
Add fillTimingGaps configuration schema/defaults and wire the step into the Lambda pipeline as Step 1.
Update documentation (README + CLAUDE.md) to reflect the new step and pipeline order.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
packages/config/index.ts	Adds `FillTimingGapsConfigSchema` and inserts `fillTimingGaps` into `PipelineConfigSchema` as Step 1.
packages/config/index.test.ts	Adds schema tests for `FillTimingGapsConfigSchema` and updates pipeline default expectations.
lambdas/pipeline/steps/fill-timing-gaps.ts	Implements gap detection + character-proportional interpolation to fill missing word timing.
lambdas/pipeline/steps/fill-timing-gaps.test.ts	Adds unit coverage for gap classification, padding, proportional distribution, and partial gaps.
lambdas/pipeline/index.ts	Wires the new step into the durable workflow before replacement/LLM/normalization.
README.md	Documents the new Step 1 and updates the architecture diagram and config docs.
CLAUDE.md	Updates repo architecture docs to include the new step.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Eoin Shanaghy <eoin.shanaghy@fourtheorem.com>

feat: fill timing gaps step

409cf28

lmammino requested review from Copilot and eoinsha February 12, 2026 12:13

Copilot started reviewing on behalf of lmammino February 12, 2026 12:14 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

Comment thread README.md Outdated

Comment thread lambdas/pipeline/steps/fill-timing-gaps.ts Outdated

Comment thread lambdas/pipeline/steps/fill-timing-gaps.ts

Comment thread lambdas/pipeline/steps/fill-timing-gaps.ts

lmammino mentioned this pull request Feb 12, 2026

add episode 151_v2 transcript and captions awsbites/aws-bites-site#229

Closed

chore: improvements after review

3241590

eoinsha reviewed Feb 13, 2026

View reviewed changes

Comment thread lambdas/pipeline/steps/fill-timing-gaps.test.ts Outdated

eoinsha approved these changes Feb 13, 2026

View reviewed changes

Update lambdas/pipeline/steps/fill-timing-gaps.test.ts

2a9eb2a

Co-authored-by: Eoin Shanaghy <eoin.shanaghy@fourtheorem.com>

lmammino merged commit 1ac60db into main Feb 17, 2026
4 checks passed

lmammino deleted the feat/fill-word-sync-gaps branch February 17, 2026 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fill timing gaps step#23

feat: fill timing gaps step#23
lmammino merged 3 commits intomainfrom
feat/fill-word-sync-gaps

lmammino commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lmammino commented Feb 12, 2026

Summary

The Problem

Impact on Captions

The Solution

Before/After Examples

Changes

Pipeline Order

Test Coverage

Verification Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants