Build a local, always-on recording system for a shared office room that captures full-day audio, splits it into manageable segments, transcribes locally, batches conversations, and delivers a daily summary of topics, decisions, and action items. Control must be available via a local web UI and via OpenClaw text commands ("start recording", "stop recording").
Recommendation choices
- Platform: Local web app on the Mac Studio (fastest build + easy access in any browser).
- Transcription: On-prem only (no cloud). Use faster-whisper (local) with optional upgrade to whisper.cpp/Metal.
- OpenClaw control: Skill + Exec tool calling local control script; optional webhook delivery for summaries.
- Start/Stop: Manual one-click start + optional schedule (not in v1, design-ready).
- Speaker handling: Start without diarization; add diarization (whisperX + pyannote) as Phase 2.
- Retention: Keep raw audio 30–90 days, transcripts/summaries longer; all local with configurable pruning.
- Compute split: Record on Mac Studio A (attached to mic interface), transcribe/summarize on Mac Studio B after hours to avoid daytime CPU load.
- Capture all in-room conversations during work hours with minimal friction.
- Provide reliable transcription and daily summaries without sending data off-prem.
- Enable quick control via UI + OpenClaw text commands.
- Keep the system stable for full-day sessions (8–12 hours).
- Real-time transcription/summary during the day (out of scope for v1).
- Full speaker identification with names (out of scope for v1; requires diarization + enrollment).
- Cloud storage or cloud LLM use (explicitly avoided in v1).
- Founders / partners: Need a daily digest of discussions, decisions, and action items.
- Visitors: Must be informed via consent/signage; expect recordings to be local and private.
- As a founder, I click “Start Recording” and the room records all day without mic fiddling.
- As a founder, I can text OpenClaw “start recording” and it starts immediately.
- As a founder, I get a daily summary with topics, decisions, and action items.
- As a founder, I can review the raw transcript per day if needed.
- Start/stop recording from UI and from OpenClaw command.
- Continuous recording for full day without file size issues (segment every N minutes).
- Store audio as WAV (16 kHz mono) or FLAC.
- Save session metadata (start time, device, format, sample rate).
- Batch transcription for a given day.
- Use local whisper model (faster-whisper).
- Store transcript per segment and a merged transcript.
- Group transcript segments into conversation blocks.
- Default rules: new block when silence gap > 7 minutes or block length exceeds max words.
- Produce JSON summary and Markdown summary per day.
- Daily summary includes: overview, top topics, decisions, action items (with owner/due if mentioned), questions, risks, follow-ups.
- Optional delivery via OpenClaw webhook.
- Provide a custom skill file to map “start recording / stop recording / status / summarize” to local API calls.
- Support sending daily summary via OpenClaw webhook (if configured).
- Environment variables for mic input, sample rate, segmentation, LLM endpoint, and retention.
- Optional retention job (Phase 2) to prune raw audio after X days.
- Local-only data: No external upload or cloud processing.
- Reliability: Must tolerate a full day of continuous capture.
- Performance: Recording must be lightweight; transcription can be scheduled after hours.
- Security: Data stored locally; optional disk encryption recommended.
- Recorder Service: ffmpeg-based audio capture with segmenting.
- API Server: FastAPI for control and pipeline orchestration.
- Transcriber: faster-whisper (local) producing segment JSON.
- Summarizer: Local LLM via OpenAI-compatible API (Ollama/LM Studio).
- UI: Local web dashboard (Start/Stop, pipeline buttons, summary view).
- OpenClaw Skill: Maps text commands to local control script.
- Start recording ? ffmpeg segments audio into
/OfficeRecorder/YYYY-MM-DD/audio. - Transcribe ? JSON per segment in
/transcripts. - Batch ? conversation blocks based on time gaps.
- Summarize ? daily JSON + Markdown in
/summaries. - Optional OpenClaw webhook ? summary overview text.
- All participants informed of recording with signage.
- Ensure clear internal policy for visitors.
- Local data only; define retention (e.g., 90 days audio, 1 year transcript).
- Recording uptime (hours recorded/day).
- Transcription completion time.
- Summary delivery success rate.
- “Missed decision” feedback from founders.
- Mic misconfigured: Provide device listing and test recording flow.
- Long audio files: Segment files every 5 minutes.
- Model drift: Allow model updates and prompt tuning.
- No diarization: Add Phase 2 diarization module.
- Phase 1 (MVP): Recording + transcription + summary + OpenClaw commands + UI.
- Phase 2: Diarization + auto-schedule + retention pruning + analytics.
- Start/stop recording from UI and OpenClaw text.
- Successfully record a full business day without failure.
- Generate a daily summary with topics and action items.
- All artifacts stored locally.