Skip to content

Commit d872b76

Browse files
author
Klas Kalaß
committed
more thoughts
1 parent 5e7bed3 commit d872b76

3 files changed

Lines changed: 199 additions & 0 deletions

File tree

proposed-changes/024-three-phase-sync-architecture.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,18 @@
44
**Created**: 2026-03-01
55
**Context**: Initial sync takes ~48–54s for Chat Essence app (2015 messages, 62 group shards × 2 types = 124 shard operations, 263 files total). Root cause is sequential per-shard processing where download, merge, upload, and DB commit are interleaved.
66

7+
## Decision Alignment (026)
8+
9+
This proposal is the mandatory baseline in the performance-first direction defined in `026-recap-sync-direction.md`.
10+
11+
- 024 is required regardless of storage mode.
12+
- It improves execution order and batching without forcing a storage-format decision.
13+
- It applies to both profiles:
14+
- Dataset/Flat mode (Dir, GDrive default profile).
15+
- Linked-Data mode (Solid/interoperability profile).
16+
17+
In short: 024 is phase A of the new strategy, not an optional optimization.
18+
719
---
820

921
## Problem Statement

proposed-changes/025-flat-file-storage-architecture.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,17 @@
55
**Depends on**: 024 (Three-Phase Sync Architecture)
66
**Context**: Even with three-phase sync (024), the fundamental problem of 263 files for ~2000 documents remains. This proposal reduces file count by consolidating resources into type-level files.
77

8+
## Decision Alignment (026)
9+
10+
This proposal is aligned with `026-recap-sync-direction.md` as the structural optimization for the Dataset/Flat profile.
11+
12+
- 025 is not a global replacement for all modes.
13+
- 025 is the default path for performance-first backends (Dir, GDrive).
14+
- Linked-Data mode remains supported for Solid/interoperability-sensitive use cases.
15+
- Fetch-policy implication remains explicit: flat mode prioritizes prefetch-style sync and does not target fine-grained `onRequest` semantics.
16+
17+
In short: 025 is phase B of the new strategy for the flat profile, not a full strategic retreat.
18+
819
---
920

1021
## Problem Statement
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
2+
# 026 - Recap Sync Direction
3+
4+
**Status**: Decision proposal
5+
**Created**: 2026-03-09
6+
**Related**: 024 (Three-Phase Sync Architecture), 025 (Flat File Storage Architecture)
7+
8+
## Problem Statement
9+
10+
Locorda has reached a strategic turning point:
11+
12+
- The API and core sync semantics are good.
13+
- The current shard/index-heavy execution is too slow for real use.
14+
- This is true even on local directory backend for chat-essence-scale data.
15+
16+
So performance is not an optional improvement. It is a product requirement.
17+
18+
## Decision
19+
20+
Adopt a **performance-first architecture with one canonical core model and two storage profiles**.
21+
22+
1. Make fast sync the default direction for the next months.
23+
2. Keep one internal model and expose two sync/storage profiles as projections:
24+
- **Dataset/Flat mode** (few files, type-level datasets, manifest-driven change detection) for `Dir` and `GDrive` by default.
25+
- **Linked-Data mode** (resource/shard-oriented, discoverability-oriented) for `Solid` and interoperability-sensitive deployments.
26+
3. Implement in sequence:
27+
- First: 024 execution-order improvements (three-phase sync).
28+
- Then: 025 structural file-count reduction (flat files/chunks).
29+
4. Define backend/profile switching semantics explicitly:
30+
- Same-profile switch: cheap.
31+
- Cross-profile switch: explicit projection rebuild/migration step.
32+
33+
This preserves developer simplicity (one API, one mental model) while making backend tradeoffs explicit where they belong.
34+
35+
## Why This Direction
36+
37+
### 1. Two independent bottlenecks exist
38+
39+
- **Execution-order bottleneck**: sequential per-shard download/merge/upload/commit loops.
40+
- **File-count bottleneck**: too many files, too much metadata and request overhead.
41+
42+
024 addresses the first. 025 addresses the second. Both are needed.
43+
44+
### 2. "One universal storage representation" is the wrong optimization target
45+
46+
Trying to preserve every capability in one execution model is what keeps performance below acceptable levels.
47+
48+
The goal should be one **canonical sync model**, not one universal file layout.
49+
50+
### 3. Current codebase already points to profile separation
51+
52+
There is already a mode axis (`useShardDatasets`) and fetch-policy constraints that naturally separate:
53+
54+
- Prefetch-heavy dataset sync for throughput.
55+
- Fine-grained linked-data sync for selective retrieval.
56+
57+
The missing piece is to document these as projection profiles over one core model, not as competing product concepts.
58+
59+
## Option Review
60+
61+
### Opt 1: Give up
62+
63+
Rejected. Does not match project goals.
64+
65+
### Opt 2: Radical state files + changes files only
66+
67+
Partially valid, but too absolute if applied globally.
68+
69+
- Good: maximum performance and simplified sync path.
70+
- Risk: unnecessary strategic loss if it fully replaces linked-data mode.
71+
72+
### Opt 3: Refine current approach only
73+
74+
Necessary but not sufficient.
75+
76+
- 024 alone likely gives meaningful speedup.
77+
- But 024 alone does not remove file-cardinality overhead.
78+
79+
### Opt 4: Recommended hybrid (new)
80+
81+
**Performance-first unified core + projection profiles**:
82+
83+
- Apply Opt 3 first (024).
84+
- Apply Opt 2-style structure where it brings clear value (025 for flat mode).
85+
- Preserve linked-data mode where interoperability/discoverability matters.
86+
- Avoid always maintaining both remote representations at runtime.
87+
88+
## Are Opt 2 Tradeoffs Inevitable?
89+
90+
Only if we insist on one universal storage mode.
91+
92+
At product level, they are **not** inevitable:
93+
94+
- In dataset/flat mode, we accept reduced fine-grained linked-data behavior to get required speed.
95+
- In linked-data mode, we keep semantics and selective capabilities, with a known slower performance envelope.
96+
- We keep one developer-facing API and one canonical internal semantics layer.
97+
98+
This makes tradeoffs explicit, testable, and backend-dependent rather than ideological.
99+
100+
## Capability Matrix (Target)
101+
102+
| Capability | Dataset/Flat mode (Dir, GDrive default) | Linked-Data mode (Solid default) |
103+
|---|---|---|
104+
| Initial sync speed | High | Medium/Low |
105+
| Incremental sync speed | High | Medium |
106+
| File count overhead | Low | High |
107+
| `onRequest`/fine-grained partial fetch | Limited (chunk-level only) | Full |
108+
| Linked-data discoverability | Limited | Full |
109+
| Cross-app semantic interoperability | Limited/optional | Strong |
110+
| Complexity in hot path | Lower | Higher |
111+
112+
## Consequences
113+
114+
### What we keep
115+
116+
- Offline-first CRDT sync.
117+
- User-owned storage model.
118+
- Interoperable linked-data path (in linked-data mode).
119+
- One developer-facing API and one core domain model.
120+
121+
### What we accept
122+
123+
- Not every backend needs every feature in its default mode.
124+
- Dataset mode prioritizes throughput over fine-grained remote structure.
125+
- Cross-profile backend switching is a migration operation, not a free runtime toggle.
126+
127+
### What we avoid
128+
129+
- Forcing Solid-style linked-data constraints onto backends where this is mostly overhead.
130+
- Forcing performance backends to always maintain shard/index remote artifacts they do not use.
131+
132+
## Simplicity Guardrails
133+
134+
To avoid "two products in one codebase", follow these guardrails:
135+
136+
1. **One API surface**: app developers configure profile defaults, not low-level index/shard behavior.
137+
2. **One canonical state contract**: CRDT merge semantics and local persistence stay profile-agnostic.
138+
3. **Projection adapters**: profile-specific remote serialization lives behind storage adapters.
139+
4. **No mandatory dual-write**: do not write both profile representations on every sync.
140+
5. **Explicit migration tooling**: if profile changes, run projection rebuild once with progress/reporting.
141+
6. **Docs by profile**: keep linked-data details out of flat-profile quickstarts.
142+
143+
## 90-Day Execution Plan
144+
145+
1. **Phase A (024)**: three-phase orchestration in production path.
146+
- Separate download, merge, upload/commit concerns.
147+
- Collapse per-shard commits into bulk commit where safe.
148+
- Introduce controlled backend concurrency with deterministic tests.
149+
2. **Phase B (025)**: flat-file mode hardening.
150+
- Per-type datasets + manifest hashes.
151+
- Conflict-safe upload (ETag/version compare-and-swap + retry merge).
152+
- Benchmarks on Dir and GDrive.
153+
3. **Phase C**: capability stabilization.
154+
- Keep linked-data mode for Solid with explicit performance expectations.
155+
- Add deterministic chunking for coarse partial sync in flat mode.
156+
4. **Phase D**: product defaults.
157+
- Default backend presets: `Dir/GDrive => flat`, `Solid => linked`.
158+
- Document profile choice as product decision, not low-level tweak.
159+
- Provide explicit migration command/process for cross-profile backend switching.
160+
161+
## Benchmark Gates (Required)
162+
163+
Before declaring success, define and enforce targets:
164+
165+
1. Initial sync budget (chat-essence-scale dataset) per backend profile.
166+
2. Incremental sync budget (small daily changes) per backend profile.
167+
3. Conflict-recovery correctness for concurrent writes to same type dataset.
168+
4. CRDT convergence checks after partial upload failures and retries.
169+
170+
## Final Position
171+
172+
We should not abandon the original vision.
173+
174+
We should stop forcing one remote representation to serve incompatible goals.
175+
176+
**Direction**: performance-first, one canonical core model, projection-based storage profiles, and explicit defaults by backend/use case.

0 commit comments

Comments
 (0)