TL;DR: I want to add PostgreSQL as an opt-in database backend (SQLite stays the default). The approach: a thin Dialect interface that the existing Store delegates to for the ~50 database-specific call sites — FTS5 vs tsvector, datetime() vs NOW(), etc. The other ~90% of queries are standard SQL and only need Rebind(). I'd split it into two PRs: first extract SQLiteDialect (zero functional change), then add PostgreSQLDialect. Happy to do the work if you're open to it.
Motivation
msgvault is excellent as a local email archive. I'm building a semantic search
and classification layer on top of it (pgvector embeddings, LLM-based labeling,
cross-entity graph connecting emails/documents/messages). The natural home for
that layer is PostgreSQL — and having email metadata in SQLite while embeddings
and intelligence live in PostgreSQL means cross-database joins that can't be
optimized by either query planner.
I suspect others building on msgvault's MCP server or HTTP API will hit similar
friction: the moment you want to enrich the archive with computed data
(embeddings, classifications, entity resolution, analytics beyond what
DuckDB/Parquet covers), you need the email metadata co-located with that data.
PostgreSQL also unlocks:
SQLite should absolutely remain the default — it's the right choice for the
single-user, local-first use case. This proposal adds PostgreSQL as an
opt-in alternative, not a replacement.
Prior art in the codebase
The codebase already anticipates this:
-
Rebind() method (store.go:286) with an explicit TODO:
// When PostgreSQL support is added, this will convert ? to $1, $2, etc.
24 call sites already use Rebind().
-
Split schema files: schema.sql (portable DDL) is already separated
from schema_sqlite.sql (FTS5 virtual table). The base schema is
valid PostgreSQL with minor adjustments.
-
Standard SQL for most queries: The vast majority of the 125+ queries
use standard SQL. SQLite-specific constructs are concentrated in a
well-defined set of locations.
Inventory of SQLite-specific constructs
Scanning through the internal/ tree, here's what I see:
| Construct |
Count |
PostgreSQL equivalent |
Complexity |
FTS5 (messages_fts, MATCH, rowid joins) |
51 refs, 6 files |
tsvector + GIN index + ts_rank |
High (but well-contained) |
datetime('now') |
35 refs |
NOW() / CURRENT_TIMESTAMP |
Mechanical |
GROUP_CONCAT(col, sep) |
9 refs |
STRING_AGG(col, sep) |
Mechanical |
INSERT OR IGNORE |
6 refs |
ON CONFLICT DO NOTHING |
Mechanical |
INSERT OR REPLACE |
3 refs |
FTS-specific, replaced by tsvector trigger |
Goes away |
PRAGMA statements |
11 refs |
Connection pool config / SET |
Straightforward |
isSQLiteError() |
11 refs |
Driver-agnostic error type |
Straightforward |
rowid references |
13 refs |
Standard PK (mostly FTS-related) |
Goes away with tsvector |
? placeholders |
All queries |
$1, $2, ... via Rebind() |
Already stubbed |
~90% of queries need no changes beyond Rebind(). The SQLite-specific
surface area is concentrated, not scattered.
Proposed approach: Dialect interface
Rather than two separate Store implementations (high duplication, hard to
maintain), I propose a thin Dialect interface that the single Store
delegates to for database-specific behavior:
// Dialect abstracts database-specific SQL generation and behavior.
type Dialect interface {
// Query rewriting
Rebind(query string) string // ? -> $1,$2,... for PostgreSQL
Now() string // datetime('now') vs NOW()
GroupConcat(col, sep string) string // GROUP_CONCAT vs STRING_AGG
// Full-text search
FTSInsert(messageID int64, subject, body, from, to, cc string) string
FTSSearch(term string) (joinClause, whereClause string)
FTSDelete(messageID int64) string
SchemaFTS() string
// Upsert patterns
UpsertIgnore(table string, cols []string, conflictCols []string) string
// Connection lifecycle
InitConn(db *sql.DB) error // PRAGMAs vs SET/pool config
SchemaSQL() []string // DDL statements for init
// Error handling
IsConflictError(err error) bool
IsDuplicateColumnError(err error) bool
}
Phase 1: Extract SQLiteDialect — move all current SQLite-specific
behavior behind the interface. Zero functional change. All existing tests
pass unmodified.
Phase 2: Implement PostgreSQLDialect — schema_pg.sql with tsvector
columns + GIN index, pgx driver, $N placeholders, STRING_AGG, etc.
Phase 3: Config-driven selection — config.toml gains a
database_url option:
[data]
# SQLite (default, unchanged):
# database_url = "sqlite:///home/user/.msgvault/msgvault.db"
# PostgreSQL (opt-in):
database_url = "postgres://user:pass@localhost:5432/msgvault"
Driver is inferred from the URL scheme. No behavioral change for existing
SQLite users.
FTS5 -> tsvector migration detail
This is the interesting part, so here's the concrete mapping:
SQLite FTS5:
CREATE VIRTUAL TABLE messages_fts USING fts5(
message_id UNINDEXED,
subject, body, from_addr, to_addr, cc_addr,
tokenize='unicode61 remove_diacritics 1'
);
-- Search:
SELECT m.* FROM messages_fts fts
JOIN messages m ON m.id = fts.rowid
WHERE messages_fts MATCH ?
ORDER BY rank;
-- Insert:
INSERT OR REPLACE INTO messages_fts(rowid, message_id, subject, body, ...)
VALUES (?, ?, ?, ?, ...);
PostgreSQL tsvector equivalent:
-- Column on messages table (or message_bodies):
ALTER TABLE messages ADD COLUMN search_fts TSVECTOR
GENERATED ALWAYS AS (
setweight(to_tsvector('simple', COALESCE(subject, '')), 'A') ||
setweight(to_tsvector('simple', COALESCE(from_addr, '')), 'B') ||
to_tsvector('simple', COALESCE(body_text, ''))
) STORED;
CREATE INDEX messages_search_fts_idx ON messages USING GIN (search_fts);
-- Search:
SELECT m.* FROM messages m
WHERE m.search_fts @@ plainto_tsquery('simple', $1)
ORDER BY ts_rank(m.search_fts, plainto_tsquery('simple', $1)) DESC;
-- Insert: automatic (generated column), no separate INSERT needed.
The PostgreSQL approach is actually simpler — no separate FTS table to maintain,
no INSERT OR REPLACE, no rowid joins. The GENERATED ALWAYS AS column
auto-updates when the source columns change.
What I'm offering
I'm prepared to implement this end-to-end and submit a PR. I'd like to
align on the approach before writing code:
- Is a
Dialect interface the right abstraction level, or would you prefer
a different pattern?
- Should
schema_pg.sql live alongside schema_sqlite.sql in
internal/store/, or in a separate package?
- Any preferences on the PostgreSQL driver? I'd default to
pgx (most
popular, pure Go, good performance).
- Should the DuckDB analytics layer remain SQLite-only (reading from
Parquet), or should it also support reading from PostgreSQL? I'd suggest
keeping DuckDB as-is initially — it reads from Parquet anyway, and
build-cache can export from either backend.
Scope boundary
This PR (or 2) would cover:
- PR 1
Dialect interface + SQLiteDialect (extract, no functional change)
- PR 2
PostgreSQLDialect implementation
schema_pg.sql
- Config-driven backend selection
- Tests passing against both backends (CI can use a PostgreSQL service
container)
These PRs would NOT cover:
- pgvector / semantic search (separate feature, enabled by this foundation)
- Migration tooling (SQLite -> PostgreSQL data migration)
- Changes to DuckDB/Parquet analytics layer
- Changes to the TUI, MCP server, or HTTP API
TL;DR: I want to add PostgreSQL as an opt-in database backend (SQLite stays the default). The approach: a thin
Dialectinterface that the existingStoredelegates to for the ~50 database-specific call sites — FTS5 vs tsvector,datetime()vsNOW(), etc. The other ~90% of queries are standard SQL and only needRebind(). I'd split it into two PRs: first extractSQLiteDialect(zero functional change), then addPostgreSQLDialect. Happy to do the work if you're open to it.Motivation
msgvault is excellent as a local email archive. I'm building a semantic search
and classification layer on top of it (pgvector embeddings, LLM-based labeling,
cross-entity graph connecting emails/documents/messages). The natural home for
that layer is PostgreSQL — and having email metadata in SQLite while embeddings
and intelligence live in PostgreSQL means cross-database joins that can't be
optimized by either query planner.
I suspect others building on msgvault's MCP server or HTTP API will hit similar
friction: the moment you want to enrich the archive with computed data
(embeddings, classifications, entity resolution, analytics beyond what
DuckDB/Parquet covers), you need the email metadata co-located with that data.
PostgreSQL also unlocks:
msgvault serve(SQLite WAL allowsone writer at a time)
and multiple clients connect (cf. NAS/Docker Deployment: CI/CD, Headless OAuth Docs, Setup Guide #116)
SQLite should absolutely remain the default — it's the right choice for the
single-user, local-first use case. This proposal adds PostgreSQL as an
opt-in alternative, not a replacement.
Prior art in the codebase
The codebase already anticipates this:
Rebind()method (store.go:286) with an explicit TODO:// When PostgreSQL support is added, this will convert ? to $1, $2, etc.24 call sites already use
Rebind().Split schema files:
schema.sql(portable DDL) is already separatedfrom
schema_sqlite.sql(FTS5 virtual table). The base schema isvalid PostgreSQL with minor adjustments.
Standard SQL for most queries: The vast majority of the 125+ queries
use standard SQL. SQLite-specific constructs are concentrated in a
well-defined set of locations.
Inventory of SQLite-specific constructs
Scanning through the
internal/tree, here's what I see:messages_fts,MATCH,rowidjoins)tsvector+GINindex +ts_rankdatetime('now')NOW()/CURRENT_TIMESTAMPGROUP_CONCAT(col, sep)STRING_AGG(col, sep)INSERT OR IGNOREON CONFLICT DO NOTHINGINSERT OR REPLACEPRAGMAstatementsSETisSQLiteError()rowidreferences?placeholders$1, $2, ...viaRebind()~90% of queries need no changes beyond
Rebind(). The SQLite-specificsurface area is concentrated, not scattered.
Proposed approach:
DialectinterfaceRather than two separate
Storeimplementations (high duplication, hard tomaintain), I propose a thin
Dialectinterface that the singleStoredelegates to for database-specific behavior:
Phase 1: Extract
SQLiteDialect— move all current SQLite-specificbehavior behind the interface. Zero functional change. All existing tests
pass unmodified.
Phase 2: Implement
PostgreSQLDialect—schema_pg.sqlwith tsvectorcolumns + GIN index,
pgxdriver,$Nplaceholders,STRING_AGG, etc.Phase 3: Config-driven selection —
config.tomlgains adatabase_urloption:Driver is inferred from the URL scheme. No behavioral change for existing
SQLite users.
FTS5 -> tsvector migration detail
This is the interesting part, so here's the concrete mapping:
SQLite FTS5:
CREATE VIRTUAL TABLE messages_fts USING fts5( message_id UNINDEXED, subject, body, from_addr, to_addr, cc_addr, tokenize='unicode61 remove_diacritics 1' ); -- Search: SELECT m.* FROM messages_fts fts JOIN messages m ON m.id = fts.rowid WHERE messages_fts MATCH ? ORDER BY rank; -- Insert: INSERT OR REPLACE INTO messages_fts(rowid, message_id, subject, body, ...) VALUES (?, ?, ?, ?, ...);PostgreSQL tsvector equivalent:
The PostgreSQL approach is actually simpler — no separate FTS table to maintain,
no INSERT OR REPLACE, no rowid joins. The
GENERATED ALWAYS AScolumnauto-updates when the source columns change.
What I'm offering
I'm prepared to implement this end-to-end and submit a PR. I'd like to
align on the approach before writing code:
Dialectinterface the right abstraction level, or would you prefera different pattern?
schema_pg.sqllive alongsideschema_sqlite.sqlininternal/store/, or in a separate package?pgx(mostpopular, pure Go, good performance).
Parquet), or should it also support reading from PostgreSQL? I'd suggest
keeping DuckDB as-is initially — it reads from Parquet anyway, and
build-cachecan export from either backend.Scope boundary
This PR (or 2) would cover:
Dialectinterface +SQLiteDialect(extract, no functional change)PostgreSQLDialectimplementationschema_pg.sqlcontainer)
These PRs would NOT cover: