Context
DuckLake supports SQLite as a catalog database, which is great for single-user setups. However, when data is stored in cloud storage (R2, S3, GCS), the catalog is still a local file — meaning you need to keep track of it and it's not portable across machines.
Proposal
Add a LEA_DUCKLAKE_CATALOG_URL env var that points to a remote location (e.g., r2://my-bucket/catalog). When set, lea would:
- Download the SQLite catalog from the remote URL to a local temp file before running
- Attach using
ducklake:sqlite:<local_temp_file>
- Run the session as normal
- Upload the updated catalog back to the remote URL
This gives single users a fully serverless DuckLake setup with no local state — both catalog and data live in the same bucket.
Proof of concept
Tested and confirmed working with R2:
- Download via
read_parquet() (catalog stored as a blob in a parquet file)
- Upload via
COPY ... TO ... (FORMAT PARQUET) with a temp table containing the blob
- Full round-trip preserves the SQLite file perfectly
Considerations
- No concurrent access — SQLite is single-writer, so this is only suitable for single-user setups. For multi-user, PostgreSQL catalog is the right choice.
- Locking — could add advisory locking (e.g., a lock file in the bucket) to prevent accidental concurrent runs.
- First run — when no remote catalog exists yet, start fresh.
- Upload on failure — should probably still upload even if some scripts failed, to preserve partial state (audit tables get cleaned up on next run anyway).
🤖 Generated with Claude Code
Context
DuckLake supports SQLite as a catalog database, which is great for single-user setups. However, when data is stored in cloud storage (R2, S3, GCS), the catalog is still a local file — meaning you need to keep track of it and it's not portable across machines.
Proposal
Add a
LEA_DUCKLAKE_CATALOG_URLenv var that points to a remote location (e.g.,r2://my-bucket/catalog). When set, lea would:ducklake:sqlite:<local_temp_file>This gives single users a fully serverless DuckLake setup with no local state — both catalog and data live in the same bucket.
Proof of concept
Tested and confirmed working with R2:
read_parquet()(catalog stored as a blob in a parquet file)COPY ... TO ... (FORMAT PARQUET)with a temp table containing the blobConsiderations
🤖 Generated with Claude Code