Skip to content

Add structured file logging with per-run correlation#263

Merged
wesm merged 10 commits intowesm:mainfrom
jesserobbins:jesse/structured-logging
Apr 15, 2026
Merged

Add structured file logging with per-run correlation#263
wesm merged 10 commits intowesm:mainfrom
jesserobbins:jesse/structured-logging

Conversation

@jesserobbins
Copy link
Copy Markdown
Contributor

@jesserobbins jesserobbins commented Apr 13, 2026

I had already added logging for my own work on the deduplicate and fixup features I've been working on, and so am proposing durable structured logging to address #129 and likely other development work. @wesm does this do what you want?

What changed

  • internal/logging: new package — fans log records to two sinks: human-readable text on stderr and a rotating JSON file under <data dir>/logs/. Every process gets a short run_id hex tag on every record for cross-invocation grep.
  • msgvault logs: new command to tail, filter, and follow the on-disk log files (-f, --run-id, --level, --grep, --all, --path).
  • root.go: installs the handler at startup; writes structured msgvault startup / msgvault exit lines; recovers panics to disk; new flags --log-file, --log-level, --no-log-file; new [log] config section.
  • Operations: search, stats, and list-accounts now emit structured start/done/fail lines with duration_ms.
  • TUI: loadData, loadMessages, and search operations emit structured lines; stderr logging is suppressed during alt-screen to prevent render corruption.
  • Store: new --log-sql and --log-sql-slow-ms flags; slow queries (>100 ms by default) emit WARN; every query optionally traced at INFO.

Why

Warnings and errors during sync workloads were silently dropped. The log file gives operators a reliable post-hoc record of what msgvault did and why it failed, without requiring stderr redirection.

jesserobbins and others added 4 commits April 13, 2026 09:11
Every msgvault invocation now leaves a durable, structured trail on
disk without the user having to redirect stderr. The default
destination is <data dir>/logs/msgvault-YYYY-MM-DD.log; stderr keeps
receiving the same human-readable text it always has.

internal/logging/logging.go:
  - BuildHandler fans records to two sinks: text on stderr + JSON
    on a daily log file. Per-process 6-byte hex run_id attached to
    every record for cross-invocation correlation.
  - Daily files rotate on size (50 MiB cap, 5 siblings kept).
  - Degrades to stderr-only if the log directory can't be prepared.

cmd/msgvault/cmd/root.go:
  - Installs the handler as slog default after config load.
  - Writes "msgvault startup" and "msgvault exit" structured lines
    with command, sanitized argv, version, os/arch, outcome.
  - Panic recovery logs stack trace before exit.
  - New persistent flags: --log-file, --log-level, --no-log-file.
  - Config gains a [log] section (dir / level / disabled).

cmd/msgvault/cmd/logs.go:
  - New 'msgvault logs' command: tail/filter the on-disk JSON logs.
  - Flags: -n, -f, --run-id, --level, --grep, --all, --path.

internal/tui/model.go:
  - loadAccounts logs a structured line on success/failure.

Addresses wesm#129.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add structured slog calls to search, stats, list-accounts, and the
TUI (loadData, loadMessages, loadSearchWithOffset). Each emits
start/done/fail lines with duration_ms so the daily log file gives
a full audit trail of what ran. Initialise the package-level logger
to a stderr text handler at declaration time so code paths that
bypass PersistentPreRunE (tests, library embeds) never hit a nil
pointer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wires the store's sql.DB through a thin logging adapter so every
query leaves a structured trace in the daily log file. Combined
with the run_id attribute, this lets you answer "what SQL did that
command run?" and "which transaction took 5 seconds?" without
reaching for an external profiler.

internal/store/db_logger.go:

  - loggedDB embeds *sql.DB and overrides Query, QueryContext,
    QueryRow, QueryRowContext, Exec, ExecContext so every call
    routes through logStmt/logStmtWith. Store methods that do
    s.db.Query(...) compile unchanged and automatically pick up
    the logging.
  - Exec records the rows_affected count so write sizes show up
    in the log alongside duration and stmt text.
  - Statement text is normalized (whitespace collapsed) and
    truncated to MaxStmtChars (default 300) so log lines stay
    readable and disk usage stays bounded.
  - Severity routing:
      * errors        -> WARN (always)
      * slow queries  -> WARN when duration >= SlowMs (default 100)
      * full trace    -> INFO for every query (--log-sql)
      * otherwise     -> DEBUG (only visible with --verbose)
  - Known-benign migration errors ("duplicate column name",
    "no such module: fts5") are downgraded to DEBUG so every
    startup doesn't spam expected ALTER TABLE failures.
  - ConfigureSQLLogging(opts) publishes SlowMs and FullTrace via
    process-wide atomics so the CLI entry point can update them
    once at startup without threading options through 32 store.Open
    call sites.

internal/store/store.go:

  - Store.db changed from *sql.DB to *loggedDB. DB() continues
    to return *sql.DB so external consumers (DuckDB sqlite_scan,
    for example) get the raw handle without a logging trampoline.
  - withTx() now logs transaction begin/commit/rollback with
    total duration. Transactions slower than SlowMs emit WARN;
    normal commits emit DEBUG. Rollbacks emit INFO with the
    triggering error's message in a 'reason' attribute.
  - queryInChunks / execInChunks generic helpers take a new
    chunkQuerier interface instead of *sql.DB so they accept
    either a raw sql.DB (tests) or the logging wrapper
    (production path). The interface covers exactly the two
    methods they call.

CLI wiring:

  - --log-sql persistent flag: enables FullTrace (every query
    at INFO level). Default off.
  - --log-sql-slow-ms persistent flag: override the slow-query
    threshold. Zero keeps the 100ms default.
  - [log].sql_trace and [log].sql_slow_ms in config.toml mirror
    the flags for per-install defaults.
  - PersistentPreRunE calls store.ConfigureSQLLogging(...) after
    building the handler so the first store.Open of the run
    already sees the configured values.

Tests cover Exec logging shape with rows_affected, slow-query
promotion to WARN with a synthetic duration, error always
logged, QueryRow emission, and normalizeStmt whitespace/truncation
behaviour. End-to-end smoke confirmed that the default mode is
quiet (no migration noise), --log-sql surfaces every query, and
--log-sql-slow-ms 1 warns on everything.
Bubble Tea takes over the terminal in alternate-screen mode. Any
slog write to stderr (e.g. the tui loadAccounts / loadData /
loadMessages diagnostic lines from the previous commits) corrupts
the render because Bubble Tea and slog end up racing for the same
file descriptor.

Fix: swap slog.Default() to a file-only logger for the duration
of the TUI run, restore it on return. The daily log file keeps
receiving every tui log line so 'msgvault logs -f' in another
pane still works exactly as before.

internal/logging/logging.go:

  - Expose the JSON file handler as Result.FileHandler alongside
    the existing multi-handler.
  - Add Result.FileOnlyLogger() which returns a logger bound to
    just the file sink, pre-attributed with the run_id so the
    file entries correlate with the rest of the run. Returns a
    discardHandler-backed logger when file logging is disabled
    so the caller's swap still suppresses stderr.
  - Add a minimal discardHandler type so FileOnlyLogger never
    hands back nil.

cmd/msgvault/cmd/tui.go:

  - Before p.Run() save slog.Default, swap to the file-only
    logger, defer the restore. Only reads logResult if it's
    non-nil so tests that bypass PersistentPreRunE still work.
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 13, 2026

roborev: Combined Review (0f58222)

Verdict: Changes are mostly solid, but there are 3 findings to address before merge.

High

  • cmd/msgvault/cmd/logs.go (findLogFiles, around line 135)
    Rotated log files are sorted lexicographically via sort.Strings(), which orders msgvault-2026-04-11.log before msgvault-2026-04-11.log.1/.2 even though the rotated files are older. This can make msgvault logs --all print entries out of chronological order and can also cause tail/ring-buffer logic to return the wrong “last N” records.
    Fix: Replace the lexical sort with ordering based on parsed log date plus rotation suffix/index, with older rotations first and the base .log last for a given day.

Medium

  • cmd/msgvault/cmd/root.go (PersistentPreRunE, --log-file handling)
    --log-file is documented as overriding the full log file path, but the implementation only passes filepath.Dir(logFile) into logging.BuildHandler. The actual filename still gets forced to msgvault-YYYY-MM-DD.log, so the flag does not match its documented behavior.
    Fix: Thread the full file path through logging.Options/openDailyLogFile, or change the flag/help text to explicitly mean “override log directory”.

  • cmd/msgvault/cmd/logs.go (followLogFile, around line 214)
    While tailing, ReadBytes('\n') can return a partial line together with io.EOF if a log entry is read mid-write. The current code attempts to parse that fragment, fails, and discards it; when the remainder arrives later, that chunk is also parsed independently and dropped. Live log lines can therefore be silently lost.
    Fix: Keep a buffer for partial lines when err == io.EOF and no trailing newline is present, then append subsequent bytes until a full line is available before parsing.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

jesserobbins and others added 2 commits April 13, 2026 10:17
- findLogFiles: sort log files chronologically instead of
  lexicographically so rotated files (.log.1, .log.2) appear before
  the active .log for the same date, fixing --all ordering and
  ring-buffer tail correctness.

- --log-file: thread the full path through logging.Options.FilePath
  so the flag actually overrides the log file path as documented,
  not just the directory.

- followLogFile: buffer partial lines when ReadBytes returns a
  fragment at EOF mid-write, preventing silent loss of live log
  lines that span two reads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove unused currentRun variable, add explicit _ discards for
fmt.Fprint* return values, and apply linter auto-fixes for
De Morgan's law and Sprintf-in-WriteString patterns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 13, 2026

roborev: Combined Review (6de9fc2)

Verdict: Changes introduce one high-severity reliability regression and several medium-severity logging/privacy issues that should be addressed before merge.

High

  • Highcmd/msgvault/cmd/root.go:178-196, cmd/msgvault/cmd/root.go:208-214
    Panic logging can be lost because deferred cleanup closes logResult before recoverAndLogPanic() runs. On panic, the logger may write to an already-closed file descriptor, so the crash never reaches the on-disk JSON log.
    Fix: Reverse the defer order so panic recovery logs before any file-backed handlers are closed.

Medium

  • Mediumcmd/msgvault/cmd/search.go:149-176, internal/tui/model.go:312-326, internal/tui/model.go:337-381, internal/tui/model.go:529-571, internal/tui/model.go:680-703
    Raw search text and account identifiers are now written to durable on-disk logs at default info level. In this product, those values can contain sensitive user data, and the TUI path can log them repeatedly.
    Fix: Do not log raw queries or account identifiers by default. Log coarse metadata instead, and gate any raw-value logging behind explicit debug opt-in.

  • Mediumcmd/msgvault/cmd/root.go:96-115, cmd/msgvault/cmd/root.go:122-149
    Startup logging persists raw args, but redaction only covers a narrow set of secret-looking flags. Sensitive positional input such as search <query>, email addresses, and message/account IDs still gets written to disk by default.
    Fix: Avoid logging raw argv. Prefer command path plus allowlisted metadata, or redact/suppress positional arguments for commands that accept user data.

  • Mediumcmd/msgvault/cmd/root.go:39,54-58,221-228
    Commands intentionally excluded from logging setup (version, completion, shell completion requests, etc.) still emit msgvault exit on stderr because ExecuteContext logs unconditionally through a preinitialized logger. That pollutes command output and can break completion-oriented integrations.
    Fix: Only emit lifecycle logs when full logging was initialized for the run, or make the default logger discard output for skipped commands.

  • Mediumcmd/msgvault/cmd/logs.go:139-155
    Rotated log files are sorted in the wrong order (.1, .2, .3 ascending), even though higher rotation numbers are older files. As a result, msgvault logs --all can read same-day files out of chronology and omit newer entries from the effective “last N lines.”
    Fix: Sort by date, then order rotated siblings oldest-to-newest (.N ... .2 .1 .log) so merged reads stay chronological.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

…mp dirs

The log file handle was only closed in ExecuteContext's defer, but
tests invoke commands directly via rootCmd.Execute() — leaving the
file open. On Windows this blocks TempDir cleanup ("file is being
used by another process"); in Docker the root-owned logs dir can't
be removed by the host runner.

- Add PersistentPostRunE that closes logResult after every command.
- Close any previous logResult at the top of PersistentPreRunE so
  repeated test invocations don't leak handles.
- Change log directory permissions from 0700 to 0755 so Docker
  host cleanup can traverse directories created by the container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 13, 2026

roborev: Combined Review (cf80b82)

PR is not ready to merge: 3 medium-severity issues remain.

Medium

  1. Persistent exit logging is dropped on successful runs
    Location: cmd/msgvault/cmd/root.go (PersistentPostRunE; ExecuteContext), cmd/msgvault/cmd/root.go:126-128
    PersistentPostRunE closes logResult before ExecuteContext emits the "msgvault exit" record. That means successful executions can miss the per-run outcome line in the structured log because the file sink has already been closed.
    Fix: Emit the exit record before closing handlers, or move shutdown out of PersistentPostRunE so it happens only after the exit log is written.

  2. Structured logs persist sensitive user input and account identifiers
    Location: cmd/msgvault/cmd/root.go:126-128, cmd/msgvault/cmd/search.go:157-180, internal/tui/model.go:340-382, internal/tui/model.go:548-568
    The new logging path records raw search text and account identifiers to disk. msgvault startup logs positional args verbatim, and the search/TUI logging duplicates that data via fields like query, search, scope, and account. Search strings can contain PII, ticket IDs, internal names, email addresses, or secrets, making the persistent log and msgvault logs a disclosure surface.
    Fix: Treat free-form input as sensitive by default. Log coarse metadata only in normal operation, and require explicit debug opt-in for any query/account logging with redaction or hashing.

  3. [log].dir path handling is inconsistent with other config paths
    Location: internal/config/config.go (new LogConfig / LogsDir() path handling)
    log.dir is returned verbatim; Load() does not expand ~ or resolve relative paths against the config/home directory. With an explicit config file, dir = "logs" writes under the current working directory instead of alongside the config/home directory, unlike the other path-based settings.
    Fix: Run expandPath on cfg.Log.Dir, and resolve relative log.dir against cfg.HomeDir when --config is explicitly provided.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@jesserobbins
Copy link
Copy Markdown
Contributor Author

roborev: Combined Review (cf80b82)

PR is not ready to merge: 3 medium-severity issues remain.

Medium

  1. Persistent exit logging is dropped on successful runs
    Location: cmd/msgvault/cmd/root.go (PersistentPostRunE; ExecuteContext), cmd/msgvault/cmd/root.go:126-128
    PersistentPostRunE closes logResult before ExecuteContext emits the "msgvault exit" record. That means successful executions can miss the per-run outcome line in the structured log because the file sink has already been closed.
    Fix: Emit the exit record before closing handlers, or move shutdown out of PersistentPostRunE so it happens only after the exit log is written.
  2. Structured logs persist sensitive user input and account identifiers
    Location: cmd/msgvault/cmd/root.go:126-128, cmd/msgvault/cmd/search.go:157-180, internal/tui/model.go:340-382, internal/tui/model.go:548-568
    The new logging path records raw search text and account identifiers to disk. msgvault startup logs positional args verbatim, and the search/TUI logging duplicates that data via fields like query, search, scope, and account. Search strings can contain PII, ticket IDs, internal names, email addresses, or secrets, making the persistent log and msgvault logs a disclosure surface.
    Fix: Treat free-form input as sensitive by default. Log coarse metadata only in normal operation, and require explicit debug opt-in for any query/account logging with redaction or hashing.
  3. [log].dir path handling is inconsistent with other config paths
    Location: internal/config/config.go (new LogConfig / LogsDir() path handling)
    log.dir is returned verbatim; Load() does not expand ~ or resolve relative paths against the config/home directory. With an explicit config file, dir = "logs" writes under the current working directory instead of alongside the config/home directory, unlike the other path-based settings.
    Fix: Run expandPath on cfg.Log.Dir, and resolve relative log.dir against cfg.HomeDir when --config is explicitly provided.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Given that a user accessing the logs have access to the actual databases... this feels overly cautious. However, it's not a bad standard to uphold so I'm taking a pass at it.

- Fix exit log dropped on success: remove PersistentPostRunE close
  that ran before ExecuteContext could emit the exit record; the
  deferred close in ExecuteContext already handles shutdown.

- Redact sensitive input from logs: positional args (emails, queries)
  logged as argc at info, full values at debug only. Search and TUI
  log query_len/has_search/has_account instead of raw text. Account
  scope uses "filtered" instead of the email address.

- Fix log.dir path handling: expand ~ and resolve relative paths
  for cfg.Log.Dir in config.Load(), consistent with data_dir and
  oauth paths.

Addresses: wesm#263 (comment)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 13, 2026

roborev: Combined Review (fb87bef)

Verdict: Changes are not ready to merge; there is 1 high-severity issue and 2 medium-severity issues to address.

High

  • Panic logging can be lost because the log file is closed before panic recovery runs
    Location: cmd/msgvault/cmd/root.go:223
    In ExecuteContext, the defer order is reversed for the intended behavior: logResult.Close() runs before recoverAndLogPanic(). If the command panics, the file handler may already be closed before the panic is written, which can silently drop the on-disk panic record.
    Fix: Defer panic recovery after the close is registered so it executes first, or otherwise ensure the file stays open until panic logging completes.

Medium

  • Persistent JSON file logging has no redaction layer for sensitive fields
    Location: cmd/msgvault/cmd/root.go:90, internal/logging/logging.go:153
    The new default logger now persists records to disk for most command runs, but there is no central redaction/suppression step for the file sink. Existing log fields that were previously only transient on stderr can now be stored at rest, including potentially sensitive user-derived values such as search queries, email addresses, and filenames. 0600 helps, but it does not remove the underlying data exposure risk.
    Fix: Add redaction for the file sink, or make durable logging opt-in. At minimum, suppress or hash clearly sensitive user/content-derived fields before they reach the JSON log handler.

  • Rotated log files are emitted in the wrong chronological order
    Location: cmd/msgvault/cmd/logs.go:134
    logFileSortKey sorts rotated files by ascending suffix (.1, .2, ...), but higher suffixes represent older files. That means msgvault logs --all can print same-day rotated chunks out of order, showing newer entries before older ones.
    Fix: Sort by date first, then sort rotation suffix in descending numeric order, with the active .log file last.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

… sort

- File logging is now opt-in by default. Enable via [log] enabled=true,
  [log] dir="...", or --log-file. Without any of these, msgvault only
  writes to stderr (the pre-existing behavior users expect). This
  eliminates the surprise disk-persistence surface for sensitive data.

- Fix panic logging loss: swap defer order in ExecuteContext so
  recoverAndLogPanic runs while the file handle is still open.
  Previously logResult.Close() ran first (LIFO), silently dropping
  the panic record.

- Fix rotated log chronological order: .log.5 (oldest) now sorts
  before .log.1 (newest), with the active .log file last. Previously
  the suffix sort was ascending, putting newer rotations before older.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jesserobbins
Copy link
Copy Markdown
Contributor Author

roborev: Combined Review (fb87bef)

Verdict: Changes are not ready to merge; there is 1 high-severity issue and 2 medium-severity issues to address.

High

  • Panic logging can be lost because the log file is closed before panic recovery runs
    Location: cmd/msgvault/cmd/root.go:223
    In ExecuteContext, the defer order is reversed for the intended behavior: logResult.Close() runs before recoverAndLogPanic(). If the command panics, the file handler may already be closed before the panic is written, which can silently drop the on-disk panic record.
    Fix: Defer panic recovery after the close is registered so it executes first, or otherwise ensure the file stays open until panic logging completes.

I have not deeply examined the panic logging handling here. However, I don't think this scenario represents a "high severity issue" given that we are adding missing logging rather than somehow improving an existing function. Regardless, I'm open to the improvement in this scenario.

Medium

  • Persistent JSON file logging has no redaction layer for sensitive fields
    Location: cmd/msgvault/cmd/root.go:90, internal/logging/logging.go:153
    The new default logger now persists records to disk for most command runs, but there is no central redaction/suppression step for the file sink. Existing log fields that were previously only transient on stderr can now be stored at rest, including potentially sensitive user-derived values such as search queries, email addresses, and filenames. 0600 helps, but it does not remove the underlying data exposure risk.
    Fix: Add redaction for the file sink, or make durable logging opt-in. At minimum, suppress or hash clearly sensitive user/content-derived fields before they reach the JSON log handler.

This is turning into a feedback loop. I am going to set logging as being opt-in to address this feedback. Redaction approaches are always tricky, and I think what most people are looking for here is to understand what is happening. This is not a robust external facing service. This is a personal tool.

  • Rotated log files are emitted in the wrong chronological order
    Location: cmd/msgvault/cmd/logs.go:134
    logFileSortKey sorts rotated files by ascending suffix (.1, .2, ...), but higher suffixes represent older files. That means msgvault logs --all can print same-day rotated chunks out of order, showing newer entries before older ones.
    Fix: Sort by date first, then sort rotation suffix in descending numeric order, with the active .log file last.

Fine. Although if the next review comment is timezone related I am going to just make them UUIDs. :-)

@jesserobbins jesserobbins marked this pull request as ready for review April 13, 2026 23:11
@jesserobbins jesserobbins marked this pull request as draft April 13, 2026 23:26
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jesserobbins jesserobbins marked this pull request as ready for review April 13, 2026 23:56
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 13, 2026

roborev: Combined Review (21c9a22)

Verdict: No medium-or-higher findings; the reviewed changes look clean.

All reported issues were below the requested reporting threshold, so there are no Medium, High, or Critical findings to include.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@jesserobbins
Copy link
Copy Markdown
Contributor Author

I also manually re-verified logging. I'm going to mark as ready.

@wesm wesm merged commit 82b1498 into wesm:main Apr 15, 2026
6 checks passed
@wesm
Copy link
Copy Markdown
Owner

wesm commented Apr 15, 2026

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants