-
Notifications
You must be signed in to change notification settings - Fork 6
WAL and Compaction
Every write (upsert or delete) goes through the WAL before being indexed. WAL fragments are immutable JSON files stored on S3.
Each fragment is a JSON file with an xxHash checksum:
{
"id": "01HXYZ...",
"namespace": "my_namespace",
"vectors": [
{
"id": "vec-1",
"values": [0.1, 0.2, 0.3],
"attributes": {"color": "red"}
}
],
"deletes": ["vec-old-1", "vec-old-2"],
"checksum": 12345678901234,
"created_at": "2026-01-15T10:00:00Z"
}-
S3 key:
<namespace>/wal/<ulid>.fragment.json - ID: ULID (time-sortable, unique)
- Checksum: xxHash-64 over the serialized vectors + deletes
- Immutable: Never modified after write
- Write: Client upsert/delete → create fragment → PUT to S3
- Read: Strong-consistency queries scan all uncompacted fragments
- Compact: Compaction merges fragments into an indexed segment
- Delete: Compacted fragments are removed after successful CAS
The manifest (manifest.json) is the authoritative record of what data exists in a namespace. It tracks WAL fragments, segments, pending deletes, and the fencing token.
{
"fragments": [
{
"id": "01HXYZ...",
"key": "my_ns/wal/01HXYZ.fragment.json",
"vector_count": 100,
"delete_count": 2,
"created_at": "2026-01-15T10:00:00Z"
}
],
"segments": [
{
"id": "seg-abc",
"key_prefix": "my_ns/segments/seg-abc/",
"vector_count": 5000,
"centroid_count": 32,
"created_at": "2026-01-15T12:00:00Z",
"quantization": null,
"bitmap_fields": ["color", "price"],
"fts_fields": ["content"]
}
],
"pending_deletes": ["vec-old-1"],
"fencing_token": 42,
"updated_at": "2026-01-15T12:00:00Z"
}| Field | Description |
|---|---|
fragments |
List of FragmentRef — uncompacted WAL fragments |
segments |
List of SegmentRef — indexed segments |
pending_deletes |
Vector IDs to exclude from query results |
fencing_token |
Monotonic counter for multi-writer lease protocol |
updated_at |
Last modification timestamp |
| Field | Description |
|---|---|
id |
Segment identifier |
key_prefix |
S3 key prefix for all segment artifacts |
vector_count |
Total vectors in this segment |
centroid_count |
Number of IVF centroids |
quantization |
Quantization type (null, "scalar", "product") |
bitmap_fields |
Fields with bitmap indexes |
fts_fields |
Fields with inverted indexes |
The manifest is updated atomically using ETag-based conditional PUTs. This prevents concurrent writers from corrupting the manifest.
1. GET manifest.json → (data, etag)
2. Modify data
3. PUT manifest.json with If-Match: etag
→ Success: commit
→ 412 Precondition Failed: retry from step 1
This requires S3ConditionalPut::ETagMatch to be enabled in the object_store builder.
Compaction merges WAL fragments into indexed segments. It runs on a configurable interval (default: 30 seconds).
- Read manifest — Load current fragments and segments
- Acquire lease — Obtain fencing token for exclusive write access
- Load fragments — Download and deserialize all WAL fragments
- Merge data — Combine fragment vectors with existing segment data
- Apply deletes — Remove vectors in the delete set
- Train centroids — Run k-means to find cluster centers
- Assign vectors — Assign each vector to its nearest centroid
- Write artifacts — Upload cluster data, bitmaps, inverted indexes to S3
- CAS manifest — Atomically update manifest: add new SegmentRef, clear processed FragmentRefs
- Deferred deletion — Delete old segment artifacts and compacted fragment files
Old artifacts are deleted after the new manifest is committed. This ensures that concurrent readers using the old manifest can still read old data. The sequence is:
Write new segment → CAS manifest → Delete old segment → Delete old fragments
If deletion fails, it's safe — the new manifest doesn't reference old artifacts, so they're just orphaned storage. A cleanup job can reclaim them later.
If cluster sizes become highly imbalanced (largest cluster / smallest cluster > retrain_imbalance_threshold), the compactor retrains centroids from scratch instead of incrementally updating.
Zeppelin supports a lease-based protocol for multi-writer safety.
Stored at <namespace>/lease.json:
{
"holder": "node-abc",
"fencing_token": 42,
"acquired_at": "2026-01-15T10:00:00Z",
"expires_at": "2026-01-15T10:05:00Z"
}Neither fencing alone nor CAS alone is sufficient to prevent zombie writes:
- Fencing check: Before writing, verify your fencing token matches the manifest's token. This catches most stale writers.
- CAS on manifest: Use ETag conditional PUT to atomically commit. This catches the TOCTOU gap between fencing check and write.
Both layers are required:
- Fencing without CAS: TOCTOU race between check and write
- CAS without fencing: A stale writer that reads the manifest can still win the CAS race
- Leases have a TTL (default: 5 minutes)
- A lease is never deleted — it is released by marking it expired
- If a lease expires and is acquired by another writer, the old writer's subsequent writes fail at the fencing check
- Release is best-effort: if it fails, the lease will expire naturally
Getting Started
API & SDKs
Configuration
Architecture
Operations