feat: timeline extraction from meeting transcripts by garrytan-agents · Pull Request #1381 · garrytan/gbrain

garrytan-agents · 2026-05-24T20:16:40Z

Proposal: Timeline Extraction from Meeting Transcripts

Problem

In a production brain with 165K+ pages, 31% of entities have no timeline entries at all. Meeting transcripts and notes discuss entities extensively — milestones, decisions, status changes — but none of this becomes structured timeline data on the entity pages.

The timeline system exists and works well when populated manually, but there's no automated path from "we discussed Acme's launch date" in a meeting transcript to a timeline entry on Acme's page.

Scale of Impact

Metric	Value
Total pages	~165,000
Timeline coverage	~69%
Entities with zero timeline entries	~31%
Meeting/transcript pages (estimated)	~5,000+

Proposed Solution

Meeting-to-Timeline Extraction

Add a timeline extraction pass to the dream cycle that processes meeting transcripts and creates timeline entries on discussed entity pages.

How It Works

Identify meeting pages: Pages with type meeting, transcript, or note that have a date in frontmatter
Extract entity mentions: Find references to known entities (people, companies) in the meeting text
Identify timeline-worthy events: Look for temporal markers and significant events:
- Milestones: "launched", "raised Series A", "hit $1M ARR"
- Decisions: "decided to pivot", "chose to expand to Europe"
- Status changes: "promoted to CTO", "left the company"
- Plans: "planning to launch in Q3", "targeting 100 customers by EOY"
Create timeline entries on the entity pages with:
- Date (from meeting date or extracted temporal reference)
- Event description
- Source link back to the meeting page
- Confidence score

CLI Interface

# Extract timeline entries from all meeting pages
gbrain extract timeline --from-meetings

# Process only recent meetings
gbrain extract timeline --from-meetings --since 2024-01-01

# Dry run to preview extractions
gbrain extract timeline --from-meetings --dry-run

# Process specific meeting pages
gbrain extract timeline --from-meetings --page "meetings/2024-03-15-acme-oh"

Timeline Entry Format

timeline:
  - date: 2024-03-15
    event: "Launched v2.0 of their product"
    source: "meetings/2024-03-15-weekly-review"
    extracted: true
    confidence: 0.85

Dream Cycle Integration

The dream cycle should include a timeline extraction step for recently-synced meeting pages:

dream cycle:
  1. sync
  2. extract links
  3. extract timeline (new)
  4. embed
  5. score

Agent Onboarding

Doctor Detection

gbrain doctor should detect low timeline coverage:

⚠ Timeline coverage: 69%
  31% of entities have no timeline entries.
  You have ~5,000 meeting pages that could provide timeline data.
  Run `gbrain extract timeline --from-meetings` to backfill.

Migration Prompt

Timeline coverage is 69%. 
Run `gbrain extract timeline --from-meetings` to backfill 
timeline entries from meeting transcripts? [y/N]

Evidence

The production brain has thousands of meeting transcripts spanning months of operation. Each meeting discusses multiple entities — companies, people, deals — with temporal context. This information exists but is locked in unstructured text. Meanwhile, entity pages have empty timeline sections that could be rich with history if extraction existed.

Risks & Mitigations

Risk	Mitigation
Incorrect date extraction	Default to meeting date, flag uncertain dates
Duplicate timeline entries	Dedup by date + entity + event similarity
Low-quality extractions from noisy transcripts	Confidence threshold, dry-run preview
Performance with many meetings	`--since` flag for incremental processing

The judgeSignificance trimming (slice at 4000 chars) could split a UTF-16 surrogate pair when an emoji sits exactly at the boundary, producing a lone high surrogate that Anthropic's JSON parser rejects with 'no low surrogate in string'. Add safeSliceEnd() helper that backs up by one char when the cut lands between a high and low surrogate. Apply to: - judgeSignificance transcript trimming (the direct cause) - findBoundary hard-split fallback (defense-in-depth) Fixes: dream cycle SYNTH_PHASE_FAIL on 2026-05-24 caused by 🤖 emoji at pos 3999 in telegram/2026-05-20-topic-1-topic-1.md

Add proposal for extracting timeline entries from meeting pages and backfilling entity timelines. 31% of entities have no timeline data despite thousands of meeting transcripts.

root and others added 3 commits May 24, 2026 09:16

Merge branch 'garrytan:master' into master

5fbb0a7

feat: timeline extraction from meeting transcripts

5b472b6

Add proposal for extracting timeline entries from meeting pages and backfilling entity timelines. 31% of entities have no timeline data despite thousands of meeting transcripts.

garrytan-agents mentioned this pull request May 24, 2026

feat: gbrain onboard — guided agent onboarding with migration prompts #1383

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: timeline extraction from meeting transcripts#1381

feat: timeline extraction from meeting transcripts#1381
garrytan-agents wants to merge 3 commits into
garrytan:masterfrom
garrytan-agents:feat/meeting-timeline-extraction

garrytan-agents commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan-agents commented May 24, 2026

Proposal: Timeline Extraction from Meeting Transcripts

Problem

Scale of Impact

Proposed Solution

Meeting-to-Timeline Extraction

How It Works

CLI Interface

Timeline Entry Format

Dream Cycle Integration

Agent Onboarding

Doctor Detection

Migration Prompt

Evidence

Risks & Mitigations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant