Database backend abstraction: pluggable Dialect for PostgreSQL support

**TL;DR:** I want to add PostgreSQL as an opt-in database backend (SQLite stays the default). The approach: a thin `Dialect` interface that the existing `Store` delegates to for the ~50 database-specific call sites — FTS5 vs tsvector, `datetime()` vs `NOW()`, etc. The other ~90% of queries are standard SQL and only need `Rebind()`. I'd split it into two PRs: first extract `SQLiteDialect` (zero functional change), then add `PostgreSQLDialect`. Happy to do the work if you're open to it.

### Motivation

msgvault is excellent as a local email archive. I'm building a semantic search
and classification layer on top of it (pgvector embeddings, LLM-based labeling,
cross-entity graph connecting emails/documents/messages). The natural home for
that layer is PostgreSQL — and having email metadata in SQLite while embeddings
and intelligence live in PostgreSQL means cross-database joins that can't be
optimized by either query planner.

I suspect others building on msgvault's MCP server or HTTP API will hit similar
friction: the moment you want to enrich the archive with computed data
(embeddings, classifications, entity resolution, analytics beyond what
DuckDB/Parquet covers), you need the email metadata co-located with that data.

PostgreSQL also unlocks:
- **pgvector** for semantic search (cf. #154)
- **True multi-client concurrency** for `msgvault serve` (SQLite WAL allows
  one writer at a time)
- **Server-mode deployments** where the archive lives on a NAS or home server
  and multiple clients connect (cf. #116)

SQLite should absolutely remain the default — it's the right choice for the
single-user, local-first use case. This proposal adds PostgreSQL as an
opt-in alternative, not a replacement.

### Prior art in the codebase

The codebase already anticipates this:

1. **`Rebind()` method** (`store.go:286`) with an explicit TODO:
   ```go
   // When PostgreSQL support is added, this will convert ? to $1, $2, etc.
   ```
   24 call sites already use `Rebind()`.

2. **Split schema files**: `schema.sql` (portable DDL) is already separated
   from `schema_sqlite.sql` (FTS5 virtual table). The base schema is
   valid PostgreSQL with minor adjustments.

3. **Standard SQL for most queries**: The vast majority of the 125+ queries
   use standard SQL. SQLite-specific constructs are concentrated in a
   well-defined set of locations.

### Inventory of SQLite-specific constructs

Scanning through the `internal/` tree, here's what I see:

| Construct | Count | PostgreSQL equivalent | Complexity |
|-----------|------:|----------------------|------------|
| FTS5 (`messages_fts`, `MATCH`, `rowid` joins) | 51 refs, 6 files | `tsvector` + `GIN` index + `ts_rank` | High (but well-contained) |
| `datetime('now')` | 35 refs | `NOW()` / `CURRENT_TIMESTAMP` | Mechanical |
| `GROUP_CONCAT(col, sep)` | 9 refs | `STRING_AGG(col, sep)` | Mechanical |
| `INSERT OR IGNORE` | 6 refs | `ON CONFLICT DO NOTHING` | Mechanical |
| `INSERT OR REPLACE` | 3 refs | FTS-specific, replaced by tsvector trigger | Goes away |
| `PRAGMA` statements | 11 refs | Connection pool config / `SET` | Straightforward |
| `isSQLiteError()` | 11 refs | Driver-agnostic error type | Straightforward |
| `rowid` references | 13 refs | Standard PK (mostly FTS-related) | Goes away with tsvector |
| `?` placeholders | All queries | `$1, $2, ...` via `Rebind()` | Already stubbed |

**~90% of queries need no changes beyond `Rebind()`.** The SQLite-specific
surface area is concentrated, not scattered.

### Proposed approach: `Dialect` interface

Rather than two separate `Store` implementations (high duplication, hard to
maintain), I propose a thin `Dialect` interface that the single `Store`
delegates to for database-specific behavior:

```go
// Dialect abstracts database-specific SQL generation and behavior.
type Dialect interface {
    // Query rewriting
    Rebind(query string) string           // ? -> $1,$2,... for PostgreSQL
    Now() string                          // datetime('now') vs NOW()
    GroupConcat(col, sep string) string    // GROUP_CONCAT vs STRING_AGG

    // Full-text search
    FTSInsert(messageID int64, subject, body, from, to, cc string) string
    FTSSearch(term string) (joinClause, whereClause string)
    FTSDelete(messageID int64) string
    SchemaFTS() string

    // Upsert patterns
    UpsertIgnore(table string, cols []string, conflictCols []string) string

    // Connection lifecycle
    InitConn(db *sql.DB) error            // PRAGMAs vs SET/pool config
    SchemaSQL() []string                  // DDL statements for init

    // Error handling
    IsConflictError(err error) bool
    IsDuplicateColumnError(err error) bool
}
```

**Phase 1: Extract `SQLiteDialect`** — move all current SQLite-specific
behavior behind the interface. Zero functional change. All existing tests
pass unmodified.

**Phase 2: Implement `PostgreSQLDialect`** — `schema_pg.sql` with tsvector
columns + GIN index, `pgx` driver, `$N` placeholders, `STRING_AGG`, etc.

**Phase 3: Config-driven selection** — `config.toml` gains a
`database_url` option:

```toml
[data]
# SQLite (default, unchanged):
# database_url = "sqlite:///home/user/.msgvault/msgvault.db"

# PostgreSQL (opt-in):
database_url = "postgres://user:pass@localhost:5432/msgvault"
```

Driver is inferred from the URL scheme. No behavioral change for existing
SQLite users.

### FTS5 -> tsvector migration detail

This is the interesting part, so here's the concrete mapping:

**SQLite FTS5:**
```sql
CREATE VIRTUAL TABLE messages_fts USING fts5(
    message_id UNINDEXED,
    subject, body, from_addr, to_addr, cc_addr,
    tokenize='unicode61 remove_diacritics 1'
);

-- Search:
SELECT m.* FROM messages_fts fts
JOIN messages m ON m.id = fts.rowid
WHERE messages_fts MATCH ?
ORDER BY rank;

-- Insert:
INSERT OR REPLACE INTO messages_fts(rowid, message_id, subject, body, ...)
VALUES (?, ?, ?, ?, ...);
```

**PostgreSQL tsvector equivalent:**
```sql
-- Column on messages table (or message_bodies):
ALTER TABLE messages ADD COLUMN search_fts TSVECTOR
    GENERATED ALWAYS AS (
        setweight(to_tsvector('simple', COALESCE(subject, '')), 'A') ||
        setweight(to_tsvector('simple', COALESCE(from_addr, '')), 'B') ||
        to_tsvector('simple', COALESCE(body_text, ''))
    ) STORED;

CREATE INDEX messages_search_fts_idx ON messages USING GIN (search_fts);

-- Search:
SELECT m.* FROM messages m
WHERE m.search_fts @@ plainto_tsquery('simple', $1)
ORDER BY ts_rank(m.search_fts, plainto_tsquery('simple', $1)) DESC;

-- Insert: automatic (generated column), no separate INSERT needed.
```

The PostgreSQL approach is actually simpler — no separate FTS table to maintain,
no INSERT OR REPLACE, no rowid joins. The `GENERATED ALWAYS AS` column
auto-updates when the source columns change.

### What I'm offering

I'm prepared to implement this end-to-end and submit a PR. I'd like to
align on the approach before writing code:

1. Is a `Dialect` interface the right abstraction level, or would you prefer
   a different pattern?
2. Should `schema_pg.sql` live alongside `schema_sqlite.sql` in
   `internal/store/`, or in a separate package?
3. Any preferences on the PostgreSQL driver? I'd default to `pgx` (most
   popular, pure Go, good performance).
4. Should the DuckDB analytics layer remain SQLite-only (reading from
   Parquet), or should it also support reading from PostgreSQL? I'd suggest
   keeping DuckDB as-is initially — it reads from Parquet anyway, and
   `build-cache` can export from either backend.

### Scope boundary

This PR (or 2) would cover:
- PR 1
   - `Dialect` interface + `SQLiteDialect` (extract, no functional change)
- PR 2
   - `PostgreSQLDialect` implementation
   - `schema_pg.sql`
   - Config-driven backend selection
   - Tests passing against both backends (CI can use a PostgreSQL service
  container)

These PRs would NOT cover:
- pgvector / semantic search (separate feature, enabled by this foundation)
- Migration tooling (SQLite -> PostgreSQL data migration)
- Changes to DuckDB/Parquet analytics layer
- Changes to the TUI, MCP server, or HTTP API


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database backend abstraction: pluggable Dialect for PostgreSQL support #261

Motivation

Prior art in the codebase

Inventory of SQLite-specific constructs

Proposed approach: `Dialect` interface

FTS5 -> tsvector migration detail

What I'm offering

Scope boundary

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Construct	Count	PostgreSQL equivalent	Complexity
FTS5 (`messages_fts`, `MATCH`, `rowid` joins)	51 refs, 6 files	`tsvector` + `GIN` index + `ts_rank`	High (but well-contained)
`datetime('now')`	35 refs	`NOW()` / `CURRENT_TIMESTAMP`	Mechanical
`GROUP_CONCAT(col, sep)`	9 refs	`STRING_AGG(col, sep)`	Mechanical
`INSERT OR IGNORE`	6 refs	`ON CONFLICT DO NOTHING`	Mechanical
`INSERT OR REPLACE`	3 refs	FTS-specific, replaced by tsvector trigger	Goes away
`PRAGMA` statements	11 refs	Connection pool config / `SET`	Straightforward
`isSQLiteError()`	11 refs	Driver-agnostic error type	Straightforward
`rowid` references	13 refs	Standard PK (mostly FTS-related)	Goes away with tsvector
`?` placeholders	All queries	`$1, $2, ...` via `Rebind()`	Already stubbed

Database backend abstraction: pluggable Dialect for PostgreSQL support #261

Description

Motivation

Prior art in the codebase

Inventory of SQLite-specific constructs

Proposed approach: Dialect interface

FTS5 -> tsvector migration detail

What I'm offering

Scope boundary

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Proposed approach: `Dialect` interface