Skip to content

feat(offline-first): SDK 2.0 — pull, push, write-path, and UX surface #36

Merged
dhwani-ankit merged 33 commits into
developfrom
om/offline_improvement
May 22, 2026
Merged

feat(offline-first): SDK 2.0 — pull, push, write-path, and UX surface #36
dhwani-ankit merged 33 commits into
developfrom
om/offline_improvement

Conversation

@Omprakash-48

@Omprakash-48 Omprakash-48 commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator

Adds the offline-first data layer to the SDK. The SDK now maintains two parallel stores per document — the legacy documents blob (push source-of-truth) and per-doctype
normalized tables docs__<doctype> (offline read path) — and ships pull/push engines, a unified read path, conflict handling, and the supporting UX.

What's in this PR

Schema & migration

  • New system tables: outbox, pending_attachments, sdk_meta — DDL is idempotent (IF NOT EXISTS / OR IGNORE) so an _onUpgrade re-entry after a partial failure no longer
    corrupts the migration
  • Per-doctype docs__<doctype> parent and child schemas with __norm / __is_local columns
  • v1→v2 migration archives the legacy documents table behind a feature gate

Meta service & dependency graph

  • BulkWatermarkProbe, MetaDiffer, MetaMigration, MetaService.ensureUpToDate()
  • ClosureBuilder fans out level-by-level meta fetch in parallel
  • DependencyGraphBuilder for prefetch ordering

Pull engine

  • SyncService.pullSync — phase-tracked cursors (initial / resume / incremental), tie-skip on resume, look-ahead pagination (page N+1 only fires when page N was full)
  • SyncService.pullSyncMany — bounded-concurrency (4 workers) closure drain
  • pull_apply.dartserver-time-aware conflict detection; a normal offline edit no longer surfaces as a conflict on the next pull (compares stored vs. incoming server
    modified)

Push engine & conflict handling

  • PushEngine, PayloadAssembler, ResponseWriteback, ThreeWayMerge, TierComputer
  • UUID rewriter: rewrites local mobile_uuid references to server names before push (dependency-aware, parent-first)
  • bulkGetWithChildren replaces N+1 per-name GETs with a single mobile_sync.get_docs_with_children round-trip; graceful fallback to per-name (bounded at 20 concurrent) on
    404 from older mobile_control deployments
  • Outbox-driven retries with idempotency_strategy and attachment_pipeline

Write path & repository

  • LocalWriter mirrors saves into per-doctype tables in a single transaction; reuses mobile_uuid from payload or generates a fresh one; splits child tables into their own
    docs__<child> tables; silently handles tables that don't yet exist (initial sync hasn't run)
  • OfflineRepository delegates parent + child writes through LocalWriter; preserves mobile_uuid from payload (prevents UUID mismatch after offline → online)
  • UnifiedResolver is the single read path for offline queries; _translateParentFilters maps Frappe's virtual parent column to parent_uuid for child-table queries;
    LinkOptionService consumes the resolver instead of the API client

UX & lifecycle

  • Sync status bar, sync progress screen, sync errors screen, migration blocked screen
  • Force-logout confirm + logout-guard dialogs (warns when outbox has pending push)
  • Delete-cascade prompt
  • Document-list filter chip and offline indicator
  • FrappeSDK performs a single connectivity probe at boot; offline launch skips bootstrap to avoid the ~4-minute HTTP timeout cascade
  • FormScreen connectivity gate before save; treats serverId == null as INSERT (offline-created docs going online for the first time)

Other changes riding along

  • LinkFilterBuilder hook — row/parent-aware Link filter overrides wired SDK → FormBuilder → fields
  • LinkOptionService caching simplification + form field label handling
  • CI workflow refactor; .releaserc / update-version.py restored to develop state

Server dependency

bulkGetWithChildren calls mobile_sync.get_docs_with_children (whitelisted endpoint shipped in the mobile_control Frappe app). Older deployments fall back to per-name
/api/resource/<doctype>/<name> GETs automatically.

Test plan

Unit + integration tests added under test/ (~120 new test files):

  • flutter test test/database/ — schema, DAOs, table naming, normalize_for_search, v1→v2 migration
  • flutter test test/sync/ — pull_apply (server-advanced conflict check), pull_engine, pull_page_fetcher, push_engine, payload_assembler, three_way_merge, tier_computer,
    uuid_rewriter, attachment_pipeline, response_writeback
  • flutter test test/services/ — local_writer, offline_repository (per-doctype + LocalWriter wiring), bulk_watermark_probe, closure_builder, meta_differ, meta_migration,
    link_option_service offline path, session_user_service
  • flutter test test/query/ — filter_parser (basic, like, null parity, OR, safety, timespan), unified_resolver, link_decorator, frappe_timespan
  • flutter test test/concurrency/ — concurrency_pool, write_queue, isolate_parser, connectivity_watcher, device_tier
  • flutter test test/ui/ — sync_status_bar, sync_progress_screen, sync_errors_screen, delete_cascade_prompt, force_logout_confirm, logout_guard_dialog,
    document_list_filter_chip
  • flutter analyze --no-fatal-infos passes
  • Manual: offline launch is fast (no 4-minute timeout cascade); offline edit + reconnect → push fires once; conflict surfaces only on real server-advance

Migration notes for consumers

  • OfflineRepository constructor now requires a LocalWriter. SDK builds it internally from MetaService; consumers using the public FrappeSDK need no changes.
  • LinkOptionService constructor signature changed from (client) to (resolver, metaResolver). Internal — consumers using FrappeSDK.linkOptions are unaffected.
  • New required server endpoint: mobile_sync.get_docs_with_children (in mobile_control ≥ the version that ships this method). Falls back gracefully on 404 if absent.

Out of scope / follow-up

  • Export MigrationBlockedScreen from the public frappe_mobile_sdk.dart surface
  • SyncController.acceptDeleteCascade()
  • AuthService.logout() switch from clearAllData() to AtomicWipe.wipe()
  • ConnectivityWatcher instantiation in FrappeSDK
  • Dedup duplicate _initialMetaAndDataSync() between SDK and consumer apps
  • Fix dep_graph.linkEdgeCountByField() constant-1 return (affects prefetch priority order only)

omprakashk22 and others added 6 commits April 24, 2026 18:37
Complete design for offline-first data layer covering per-doctype
columnar schema, cursor-based pull engine, topologically-ordered push
with mobile_uuid idempotency and UUID->server-name rewrite,
unified read resolver with DB-is-source-of-truth semantics,
Frappe-parity filter parser, attachment sub-pipeline, pull/push
coordination gate, atomic logout wipe, and phased implementation plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop unused outbox.idempotency_key (mobile_uuid dedup sufficient)
- Drop unused outbox.depends_on (tiers computed in-memory per run)
- Add v1.1.0 -> v2 data migration algorithm (§12.1), covers
  docstatus transitions, deleted-row replay, orphaned-doctype
  handling, dataJson corruption fallback, preserves mobile_uuids
- Add sync_status <-> outbox.state correspondence table (§5.2)
- Define public types: QueryResult, Cursor, ConflictAction,
  OutboxRow, DeleteCascadePlan, SyncError, RowOrigin (§9.4)
- Tighten retry-all priority matrix to key on (state, error_code)
- Rename PERMISSION -> PERMISSION_DENIED, add TIMEOUT + UNKNOWN to
  error_code enum
- Cascade-delete ordering: dependents get earlier created_at so
  tiers dispatch them before the parent's DELETE is retried
- Document idempotency and dep-ordering choices in decisions log

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove hard dependency on a proprietary mobile_control Frappe app.
- SDK now targets stock Frappe v15+ using standard endpoints
  (login, /api/resource/*, /api/method/frappe.client.*, upload_file,
   GET DocType/<x> for meta-change detection).
- Idempotency rewritten as three mitigation layers (§5.7):
    L1 autoname=field:mobile_uuid -> natural idempotency
    L2 consumer before_insert hook -> zero-retry duplication
    L3 client pre-retry GET check -> works against stock Frappe
  SDK auto-selects per DocType; emits InitWarning if none apply.
- §10 rewritten: required stock-Frappe endpoints vs. optional
  consumer server-app capabilities (advertised via
  X-Mobile-Essentials-Version header).
- SDKConfig: drop mobileControlMinVersion; add
  IdempotencyStrategy override.
- P4 throughput gates revised (§11.4) to reflect SQLite realities:
  500 queued saves drained via max_inflight_network (2/4/8 tier) +
  per-doctype serialized writes + 20-50 batch size,
  0 duplicates + 0 lock errors + peak mem delta < 50 MB, plus a
  mid-drain kill-relaunch resilience check. Old "100 concurrent"
  framing was unrealistic for low-end Android.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Implement tests for response writeback handling in `response_writeback_test.dart`
- Add tests for sync state management in `sync_state_test.dart`
- Create tests for three-way merge logic in `three_way_merge_test.dart`
- Introduce tier computation tests in `tier_computer_test.dart`
- Add UUID rewriting tests in `uuid_rewriter_test.dart`
- Implement UI tests for delete cascade prompts in `delete_cascade_prompt_test.dart`
- Add tests for document list filter chips in `document_list_filter_chip_test.dart`
- Create tests for force logout confirmation dialog in `force_logout_confirm_test.dart`
- Implement tests for logout guard dialog in `logout_guard_dialog_test.dart`
- Add tests for sync errors screen in `sync_errors_screen_test.dart`
- Create tests for sync progress screen in `sync_progress_screen_test.dart`
- Implement tests for sync status bar in `sync_status_bar_test.dart`
…folding

- LocalWriter mirrors saves into per-doctype docs__<doctype> tables;
  OfflineRepository delegates parent + child writes through it
- SyncService.pullSync gains DoctypePullPhase, look-ahead pagination,
  tie-skip on resume, and bounded-concurrency pullSyncMany
- DoctypeService.bulkGetWithChildren replaces N+1 per-name GETs with a
  single mobile_sync.get_docs_with_children call; falls back per-name
  on 404 (older mobile_control), bounded at 20 concurrent GETs
- UnifiedResolver constructed and wired into LinkOptionService;
  closure builder fans out meta fetch level-by-level in parallel
- pull_apply: server-advanced check before flagging conflict, so
  normal offline edits don't surface as conflicts on the next pull
- system_tables: IF NOT EXISTS / OR IGNORE so onUpgrade re-entry after
  partial failure no longer raises duplicate-table errors
- FrappeSDK: single connectivity probe at boot; offline launch skips
  bootstrap to avoid the ~4-minute HTTP timeout cascade
- FormScreen: connectivity gate before save; treat serverId==null as
  INSERT (offline-created docs going online for the first time)
- Tests: LocalWriter, OfflineRepository LocalWriter wiring,
  per-doctype repository, closure builder, link-option offline path,
  pull-apply server-advanced behaviour
Tripped flutter analyze --no-fatal-infos in CI. LocalWriter writes
through raw sqflite transactions and never touched SchemaApplier.
@Omprakash-48 Omprakash-48 force-pushed the om/offline_improvement branch from 4bf7f01 to 92977d4 Compare April 27, 2026 15:12
@Omprakash-48 Omprakash-48 changed the title Om/offline improvement feat(offline-first): SDK 2.0 — pull, push, write-path, and UX surface Apr 27, 2026
omprakashk22 and others added 12 commits May 1, 2026 14:06
Frappe v16's File controller (frappe/core/doctype/file/file.py:151)
rejects File insert with attached_to_doctype set but attached_to_name
empty/NULL. The pre-existing 'new-<doctype>' sentinel docname produced
orphaned File rows that attach_files_to_document could never relink
— regardless of v16's __temporary_name 60-min window, which is too
short for offline retries anyway.

SDK now uploads each pending attachment fully unattached (no dt, no
dn). Once the parent INSERT/UPDATE commits:

  - Parent docs: Frappe's stock attach_files_to_document (registered
    on *.on_update) walks the doc's Attach/Attach Image fields and
    relinks unattached File rows by file_url.

  - Child rows: in v16 child rows save via raw db_update() and skip
    lifecycle hooks (frappe/model/document.py:613-648), so stock never
    reaches them. The mobile_control hook 'relink_mobile_files' fills
    that gap.

Schema delta on pending_attachments: adds top_parent_uuid +
top_parent_doctype columns (DAO-level non-null, nullable in schema
because SQLite ALTER ADD COLUMN can't add NOT NULL without DEFAULT)
and ix_attach_top_parent index, so the push pipeline can discover
all attachments for one outbox row (parent + children) in a single
query. Migration consolidated into a single v3 -> v4 step combining
the SIG-12 is_parent_with_children column work and the attachment
columns.

API:
  - PendingAttachmentDao.enqueue() now requires topParentUuid +
    topParentDoctype (set to parent's mobile_uuid for top-level
    attaches, or to the parent's mobile_uuid even when the field
    lives on a child row).
  - findPendingForParent -> findPendingForTopParent.
  - AttachmentPipeline.uploadPendingFor -> uploadPendingForTopParent.
  - Upload call drops the doctype + docname args.

Tests: 495/495 pass. New tests cover the migration (forward +
idempotent re-run), DAO discovery across parent + children, and the
upload shape (asserts no doctype, no docname).

Live verified on Frappe v16.13: child-row relink smoke against
Website Slideshow -> Website Slideshow Item.image PASS.

Docs: doc/OFFLINE_FIRST.md gains an Attachments section; spec
2026-04-24-offline-first-sdk-design.md §5.3 + §4 schema updated;
implementation plan added under docs/superpowers/plans.
Frappe v16's File controller (frappe/core/doctype/file/file.py:151)
rejects File insert with attached_to_doctype set but attached_to_name
empty/NULL. The pre-existing 'new-<doctype>' sentinel docname produced
orphaned File rows that attach_files_to_document could never relink
— regardless of v16's __temporary_name 60-min window, which is too
short for offline retries anyway.

SDK now uploads each pending attachment fully unattached (no dt, no
dn). Once the parent INSERT/UPDATE commits:

  - Parent docs: Frappe's stock attach_files_to_document (registered
    on *.on_update) walks the doc's Attach/Attach Image fields and
    relinks unattached File rows by file_url.

  - Child rows: in v16 child rows save via raw db_update() and skip
    lifecycle hooks (frappe/model/document.py:613-648), so stock never
    reaches them. The mobile_control hook 'relink_mobile_files' fills
    that gap.

Schema delta on pending_attachments: adds top_parent_uuid +
top_parent_doctype columns (DAO-level non-null, nullable in schema
because SQLite ALTER ADD COLUMN can't add NOT NULL without DEFAULT)
and ix_attach_top_parent index, so the push pipeline can discover
all attachments for one outbox row (parent + children) in a single
query. Migration consolidated into a single v3 -> v4 step combining
the SIG-12 is_parent_with_children column work and the attachment
columns.

API:
  - PendingAttachmentDao.enqueue() now requires topParentUuid +
    topParentDoctype (set to parent's mobile_uuid for top-level
    attaches, or to the parent's mobile_uuid even when the field
    lives on a child row).
  - findPendingForParent -> findPendingForTopParent.
  - AttachmentPipeline.uploadPendingFor -> uploadPendingForTopParent.
  - Upload call drops the doctype + docname args.

Tests: 495/495 pass. New tests cover the migration (forward +
idempotent re-run), DAO discovery across parent + children, and the
upload shape (asserts no doctype, no docname).

Live verified on Frappe v16.13: child-row relink smoke against
Website Slideshow -> Website Slideshow Item.image PASS.

Docs: doc/OFFLINE_FIRST.md gains an Attachments section; spec
2026-04-24-offline-first-sdk-design.md §5.3 + §4 schema updated;
implementation plan added under docs/superpowers/plans.
…-1/9/12 fixes

Concurrency / coordination
  - Replace SyncService's boolean _isSyncing flag with a SyncMutex
    (single-occupant, try-only). Concurrent push/pull callers get
    'Sync already in progress' instead of racing each other.
  - SIG-2: PullEngine.run() returns the set of doctypes it deferred
    because a push was active. SyncController re-runs a targeted pull
    (RunPullForFn) over that subset after push completes — closing the
    'pull skipped while pushing → stale local state' gap.
  - SIG-1: per-task isolation via SQLite savepoints in the writeback
    path so a single failing row doesn't roll back the whole batch.
  - Add FetchSingleDocFn / ApplySingleDocFn typedefs so a single
    server snapshot can flow through the same PullApply pipeline used
    by full pulls (used by SyncController).

Schema / migrations
  - SIG-12: persist is_parent_with_children on doctype_meta so
    OfflineRepository.doctypesWithChildren survives cold start.
  - SIG-9: tolerate conflicting cursor formats from older SyncService
    cursor persistence.
  - Idempotency tests for v2 -> v3 and v3 -> v4 migrations.

Tests
  - sync_mutex_test, frappe_sdk_dispose_test (resource cleanup),
    offline_repository_child_registry_test (SIG-12 cold-start path).
  - Existing sync/pull/push/cursor/payload tests brought in line with
    the new SyncController type signatures (RunPullFn now returns the
    deferred set; new Fetch/Apply single-doc fns).

Tooling
  - analysis_options.yaml: drop the '**/*.g.dart' exclude block (no
    codegen remains in the SDK).
  - frappe_sdk.dart: replace lambda-bound MetaService.getMeta wrappers
    with tear-offs; collapse double-underscore unused params.
Adds the persistence layer for the offline-mode toggle: an OfflineMode
value object, a single-row SdkMetaDao on sdk_meta, and a
_persistOfflineFlagFromLogin helper wired into login, verifyLoginOtp,
and _fetchUserInfoAndApply. Schema bumps v4 -> v5 with two new columns
on sdk_meta (offline_enabled, offline_enabled_set_at). Zero behavior
change in this phase — services do not yet branch on the flag. P2 wires
the flag into UnifiedResolver / OfflineRepository / SyncService /
LinkOptionService.

See spec: docs/superpowers/specs/2026-05-01-offline-mode-toggle-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the persisted OfflineMode (added in P1) into the SDK's read/write
paths. When offline_enabled = false at boot:

- UnifiedResolver.resolve() short-circuits to a REST passthrough via
  client.doctype.list (extended to accept or_filters).
- OfflineRepository.create/update/delete route to FrappeClient instead
  of writing the outbox; getDirtyDocuments returns empty.
- SyncService.pushSync/pullSync/pullSyncMany/syncDoctype/getSyncStats
  return SyncResult.empty() — sync engine is fully no-op.
- _initialMetaAndDataSync skips the closure pull (meta + permissions +
  translations still run; doc tables stay absent).

FrappeSDK.initialize() reads sdk_meta.offline_enabled and resolves the
session-bound mode through _resolveBootMode, which carries a
conservative residue guard: if persisted=false but local docs__* /
outbox / pending_attachments exist, the session stays offline anyway
and prints a warning. P3 replaces this guard with a real drain/wipe
transition.

forTesting accepts an offlineMode parameter (default: offline) so
existing tests keep their behaviour.

See spec: docs/superpowers/specs/2026-05-01-offline-mode-toggle-design.md
Replaces P2's conservative residue guard with the real offline → online
transition. When initialize() observes persisted offline_enabled=false
AND on-disk residue exists (docs__* tables, outbox, or pending
attachments), it invokes OfflineTransitionService.runDrainAndWipe()
before _initialMetaAndDataSync.

OfflineTransitionService:
- Sealed state hierarchy: Idle / Draining / DrainFailed / WipingTables /
  Completed, emitted on a broadcast stream.
- runDrainAndWipe() loops drain → check residue → either wipe + complete
  or park in DrainFailed and await retry()/forceExit().
- forceExit() drops tables unconditionally (A1 fallback) and exits the
  loop.
- Drain runs against a transient SyncService constructed with
  OfflineMode(enabled: true) so its public methods actually push
  regardless of the session-bound mode.

OfflineTransitionScreen widget exposed for consumer apps to mount above
their router. PopScope(canPop: false) blocks the OS back button while
the transition is active.

AppDatabase.wipeOfflineDocumentTables() drops every docs__<doctype> and
clears outbox / pending_attachments / link_options. Preserves
doctype_meta, auth_tokens, doctype_permission, sdk_meta.

FrappeSDK.dispose() now also disposes the OfflineTransitionService's
broadcast stream controller.

See spec: docs/superpowers/specs/2026-05-01-offline-mode-toggle-design.md
P3 awaited runDrainAndWipe inside initialize() before runApp mounted
any UI. If drain hit DrainFailed, the user_action completer would never
be completed (no UI to call retry/forceExit) and init blocked
indefinitely on a frozen splash. Force-killing the app retried the
same loop.

Fix: kick off the transition with `unawaited(...)` from initialize().
Init returns; runApp mounts the widget tree; OfflineTransitionGuard
overlays OfflineTransitionScreen for as long as the service is in any
non-idle state. The user can now interact with retry/forceExit.

Adds:
- OfflineTransitionGuard widget — wraps a child and shows the screen
  whenever the SDK's offlineTransition stream emits Draining /
  WipingTables / DrainFailed.
- runOfflineTransitionIfPending() — public method consumers can call
  from their own boot widget if they prefer foreground orchestration
  (Spec §10.4(a) path) instead of the unawaited init kick-off.
- Top-level exports for OfflineMode, OfflineTransitionService and
  state classes, OfflineTransitionScreen, OfflineTransitionGuard.

This addresses the design's §10.4(b) trade-off without forcing
consumers into §10.4(a) — the guard pattern keeps the simple "block
init" UX for the success path but unblocks the failure path.
Public-facing documentation for the server-driven offline_enabled
toggle introduced in this release:

- doc/OFFLINE_MODE_TOGGLE.md — feature overview, server-side setup,
  consumer integration via OfflineTransitionGuard, transition flow
  (offline → online drain/wipe and online → offline bootstrap), what
  stays local in each mode, schema migration notes, testing seam,
  known limitations.
- doc/OFFLINE_FIRST.md — note + cross-reference at the top of the
  existing offline-first architecture doc, since most of it does not
  apply when the toggle is off.
- README.md — one-line mention of the toggle in the Overview feature
  list and a link in the Further Documentation section.
- CHANGELOG.md — 2.1.0 entry covering the new exports
  (OfflineMode / OfflineTransitionService + states /
  OfflineTransitionScreen / OfflineTransitionGuard), the schema bump
  v4 → v5, the additive constructor parameters on OfflineRepository /
  SyncService / UnifiedResolver / LinkOptionService, the new
  DoctypeService.list orFilters parameter, and upgrade notes.
- pubspec.yaml version 1.1.0 → 2.1.0.
…parens

Move home screen, document list, and doctype list reads off the legacy
`documents` table onto UnifiedResolver, so both online (`offline_enabled
= false`) and offline modes display records. Wire `frappe.client.get_count`
for cheap chip badges, fix the dependency evaluator's handling of
parenthesised expressions, and absorb the user-staged auth + repo
follow-ups.

Resolver-driven reads
- `DoctypeService.count(doctype, {filters})` wraps `frappe.client.get_count`.
- `UnifiedResolver.count(doctype, {dirtyOnly})` branches on offlineMode:
  online → server count (returns 0 for `dirtyOnly` since there is no
  outbox in online mode); offline → `SELECT COUNT(*)` against
  `docs__<doctype>`, dirtyOnly filters to dirty/sync_error/sync_blocked.
- `_onlinePassthrough` now requests `fields: ['*']` so list rows carry
  every top-level column (matches the offline `docs__*` shape).
- `Document.fromResolverRow(doctype, row)` adapts both shapes to a
  Document, falling back to `name` when `mobile_uuid` / `server_name`
  are absent (online rows).
- `LinkOptionService._rowsToEntities` falls back to `row['name']` so
  online passthrough rows surface as link options instead of being
  silently filtered out.
- `FrappeSDK.resolver` getter exposes the existing instance.
- `MobileHomeScreen`, `DocumentListScreen`, `DoctypeListScreen` consume
  the resolver; example app updated for the new constructor params.

depends_on parens fix
- `DependsOnEvaluator._splitOutsideBrackets` now tracks `( )` depth
  alongside `[ ]`, and a new `_stripOuterParens` removes balanced
  wrapping parens before evaluation. Without this, expressions like
  `(A && B) || (C && D)` were split on the inner `&&` first, leaking
  unmatched parens into the comparators and short-circuiting to false —
  hiding entire sections (the SNF Household Survey "Presence Check"
  block was a casualty).

Auth + repository follow-ups
- Drop the dead `AuthService.loginWithCredentials` path and route
  `logout` through `mobile_auth.logout`.
- `AuthService.loadConfiguration` unwraps the `/api/v2/method/*`
  `{data: ...}` envelope so role/permission hydration works on v2.
- `OfflineRepository._reconcileParentTableSchema` heals already-created
  `docs__<doctype>` parent tables whose schema has drifted from the
  current meta (e.g. a new `title_field` added server-side after first
  pull, leaving the `<field>__norm` column missing).

Tests
- New paren-handling cases in `depends_on_evaluator_test.dart`,
  including the verbatim Frappe `section_break_presence` regression.
- New `count()` group in `unified_resolver_test.dart` covering offline
  + dirtyOnly + missing-table + online short-circuit + missing-client
  paths.
- New `test/models/document_test.dart` covering both row shapes and
  `modified` parsing.
- `mobile_home_screen_test.dart` updated to await `resolver.count`
  instead of the removed `repository.getDocumentsByDoctype` path.
…plete\$ stream, system_columns/sqlite_utils DRY helpers

- Replace immutable _offlineMode field with OfflineModeNotifier threaded
  through OfflineRepository, SyncService, and UnifiedResolver so a
  post-login offline-enabled change is visible to all services without
  rebuilding them
- Add syncComplete\$ broadcast stream on FrappeSDK; home screen subscribes
  to refresh document counts after each batch pull
- Extract system_columns.dart as single source of truth for parent/child
  system column names (parent_schema, local_writer, pull_apply all import
  from here to prevent drift)
- Extract sqliteTableExists() into sqlite_utils.dart so the
  sqlite_master query lives in one place
- Add flipOfflineModeForTesting() seam on FrappeSDK for unit tests
- New tests: OfflineModeNotifier sharing, offline flag propagation,
  upgrade pull trigger, SyncService/OfflineRepository notifier wiring
@ankitDhwani ankitDhwani self-requested a review May 5, 2026 13:33
Initial sync drained doctypes serially: each pullSync call grabbed a
global SyncMutex, ran one doctype to completion, released, then the
next iteration started. Picker fetches scheduled while the loop ran
silently no-opped because tryProtect saw the mutex held. Result:
helper masters (State, District, Block, Village, ...) landed late and
Link pickers opened during sync stayed empty forever.

Three coordinated changes:

1. _runUpgradeClosurePull and forcePullAll now use SyncService.
   pullSyncMany(concurrency: 4) — one mutex hold for the whole closure
   instead of one-per-doctype. forcePullAll keeps a sequential cursor-
   clear pre-pass before the parallel batch. ~4x faster cold start.

2. LinkOptionService accepts a syncComplete$ stream wired from the
   SDK. _LinkFieldDropdownState subscribes; when its options are
   empty and a sync completes, it retries _loadOptions. The
   coordinator no longer caches empty results, so retries reach the
   resolver instead of returning a stale [].

3. SyncMutex grows a blocking protect() variant alongside tryProtect.
   SyncService.pullSyncWaiting uses it; the resolver's backgroundFetch
   now calls pullSyncWaiting so background-driven Link picker
   refreshes queue behind the batch instead of dropping silently.
   User-initiated pullSync callers (manual refresh, navigation,
   list-screen pull-to-refresh) keep the bail-fast tryProtect path so
   users still see "Sync already in progress" instead of hanging.

Tradeoff: the mutex is held for the entire pullSyncMany batch rather
than per-doctype, so user-triggered pushSync / pullSync during cold
start now consistently wait the whole window instead of slipping in
between doctypes. Push isn't lost — the outbox holds it. Cold start
finishing 4x faster more than offsets the longer single hold.

Tests: 6 new (4 SyncMutex.protect, 2 pullSyncMany short-circuits).
Total suite 564 passing.
…efactor

Database migration:
- AppDatabase._version 6 -> 3. Single _migrateV2ToV3 step replaces the
  v3-v4-v5-v6 ladder; no device in the wild was at any of those
  intermediate versions.
- Defensive duplicate-column guards via _safeAddColumn — applied only
  to ALTER TABLE ADD COLUMN. CREATE TABLE IF NOT EXISTS and
  DROP IF EXISTS unwrapped (already idempotent).
- Singleton sdk_meta upsert (INSERT OR REPLACE) recovers from missing
  or corrupted rows.
- AppDatabaseTestSeam exposes _onUpgrade / _version for migration tests.
- Tolerant login-response parsing: missing field, null, string "true",
  and integer 1 all leave offline_enabled at 0.

Sync engine refactor:
- SyncEngineBuilder + PayloadSerializer + UuidPattern split out of
  SyncService for testability.
- Outbox slimmed (drops payload / server_name / retry_count); per-doctype
  docs__* tables retain payload, error, and retry context.
- Tombstone PullApply support; delegate-style SyncService surface.

UI:
- New SyncErrorBanner widget.
- OfflineRepository reconcile + save split into focused methods.
- LinkField gating uses uuid_pattern for local-vs-server identity check.

Retire legacy:
- documents table + DocumentDao + Document model removed.
- migration_blocked_screen, v1_to_v2 + v3_to_v4 + v5 isolated tests
  deleted (their schema deltas are now part of _migrateV2ToV3).

Tests:
- 617 passing.
- Adds migration_v2_to_v3, migration_reentrant, fresh-vs-upgraded
  PRAGMA equivalence, post-migration gate, end-to-end push engine,
  payload serializer, uuid pattern, sync_error_banner, and 4 login-
  response tolerance cases.

pubspec:
- 1.1.0 -> 2.0.0 (semver-major: public API broke).

Spec: docs/superpowers/specs/2026-05-09-sdk-v2-v3-migration-design.md
Plan: docs/superpowers/plans/2026-05-09-sdk-v2-v3-migration.md
…uctor API cleanup

Add doc/release-2.0 suite (README, architecture, breaking-changes, migration,
schema-migration, limitations, what's-new). Consolidate three duplicate
ChildInfo structs into ChildTableInfo. Tighten constructor signatures for
OfflineRepository, UnifiedResolver, and SyncService to named parameters with
positional database. Extend test coverage across sync, services, concurrency,
and UI layers.
…ety fix

Resolved merge conflict in frappe_mobile_sdk.dart: kept link_option_entity.dart
export (new in develop); dropped document_entity.dart export (deleted in 2.0
refactor). Updated link_field_test.dart to use LinkOptionService.withoutResolver()
instead of the removed FrappeClient constructor.
PullEngine._runDoctype used `modified >= cursor.modified` (PullPageFetcher's
only filter) with a while-true loop that breaks only on an empty page. When all
rows in a page share the same modified timestamp — common for reference doctypes
bulk-imported on the server — the advanced cursor equals the input cursor and the
next request returns the same page forever, creating an infinite pull loop
observed in device logs (Block, Volunteer Hiring Stage polled with limit_start=0,
no delay, same cursor each time).

Fix: capture (priorModified, priorName) before advancing the cursor; break when
the new cursor matches, preventing the re-fetch loop. PullApply's
UPSERT-by-server_name idempotency keeps applied rows correct.

Also ships:
- Updated PullPageFetcher doc comment: replaces the misleading "bounded re-fetch"
  claim with an explicit stall-hazard warning and pointer to PullEngine's guard.
- Regression test: fetcher that always returns the same non-empty page; asserts
  exactly 2 fetches, rows applied, cursor persisted with complete:true.
- AC4 follow-up: runUpgradeClosurePullForTesting now returns Future<Set<String>>;
  two new wiring tests verify the no-op path and that an active outbox push
  propagates the deferred set through _runUpgradeClosurePull → PullEngine.run().
  Added injectPullEngineForTesting and overrideIsOnlineForTesting seams.
DocumentListScreen loads via _pullDocuments() in initState; the
initialDocuments pre-population was the only caller of the now-removed
method, so drop both the fetch and the parameter.
…ing throw

- ResponseWriteback: early-exit on OutboxOperation.delete — hard-deletes
  parent row, child rows, and outbox entry so confirmed deletes don't
  linger as ghost records
- FrappeSDK.logout: wrap clearAllData inside SyncMutex.protect() so a
  concurrent push cannot write to a table being dropped
- FrappeSDK.dispose: drain _pendingDrain before closing syncStateNotifier /
  syncCompleteController to prevent StateError on late notifier writes
- OfflineRepository.applyServerDocument: throw StateError on missing meta
  instead of silently returning; callers now correctly surface the failure
- OfflineTransitionService.runDrainAndWipe: add drainTimeout param (default
  5 min) and apply it to pushSync() so a hung drain cannot block forever
- MetaService: forward checkAndSyncDoctypes exceptions to onMetaSyncFailure
  callback instead of only printing
- QueryResult: add empty const for convenience
- Tests updated for StateError and DELETE writeback behaviour

@ankitDhwani ankitDhwani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review — Final Closure Report

om/offline_improvementdevelop

Reviewer: Ankit + Claude Code (claude-sonnet-4-6)
Branch: om/offline_improvement
Base: develop
Final HEAD: c62a865
Review rounds: 3 (original · R2 after f029b7c · R3 after 33803c0 · patch c62a865)
Date closed: 2026-05-12
Test suite: 791 tests · all passing


Merge Verdict

APPROVED — unconditional. All tracked issues resolved, dismissed with evidence, or confirmed N/A.

No open items remain. The three issues contested by the dev team (L4, M8, M10) were independently re-verified against the current source and confirmed correctly dismissed.


Legend

Symbol Meaning
✅ FIXED Verified in code; test coverage confirmed where required
✅ BY DESIGN Intentional behaviour; documented in limitations.md or inline
⏭️ DISMISSED Re-verified against source — original concern was either a misread or has no practical impact against a real Frappe server. Evidence cited.
N/A Was never a real issue; confirmed before R1

Critical (5 / 5 resolved)

ID Status Verified at
C1 ✅ FIXED (R3) link_option_service.dart:140–170_normalizeFiltersForDoctype handles 3-tuples; length == 3 guard prepends doctype. 4 regression tests in test/services/link_option_service_offline_test.dart.
C2 ✅ FIXED (R2) write_queue.dart — outer-tx failure drains remaining completers with completeError; verified in write_queue_test.dart.
C3 ✅ FIXED (design) pull_apply.dart_locallyDirtyStatuses gate: ['dirty','failed','conflict','blocked']. Parent-shields-children invariant. Covered by AC3 test phase 5.
C4 ✅ FIXED (R2) documents table, DocumentDao, Document model deleted. Single docs__* per-field storage. _pullOneInternalapplyServerDocument.
C5 ✅ FIXED (R2) frappe_sdk.dart_initInFlight: Completer<void>? coalesces concurrent initialize() calls.

High (14 original / 1 post-review — all resolved)

ID Status Verified at
H1 ✅ FIXED (R2) _rewritePayload / _resolveLinkedUuids deleted; UuidRewriter.rewrite() via PayloadAssembler.
H2 ✅ FIXED (R2) meta_service.dartfetchAndStoreInDb calls clearDocTypeCache(doctype) after store.
H3 ✅ FIXED (R3) sdk_meta_dao.dart:30writeOfflineMode uses INSERT OR REPLACE INTO sdk_meta (true upsert).
H4 ✅ FIXED (R2) sync_service.dartsyncNow() try/catch per phase; exceptions logged, execution continues.
H5 ✅ FIXED (R2) frappe_sdk.dartdispose() nulls 24 fields.
H6 ✅ FIXED (R3) sync_service.dart:75isOnline() first line: if (!offlineMode.enabled) return false;
H7 ✅ FIXED (R2) offline_transition_service.dartTimer.periodic emits live drainedRecords count.
H8 ✅ FIXED (R2) response_writeback.dart — mobile_uuid-first match, position-index fallback.
H9 ✅ FIXED (R2) filter_parser.dart_validateOrderBy splits on , and validates each segment individually.
H10 ✅ BY DESIGN Children inherit parent status. Documented in limitations.md.
H11 ✅ FIXED (R2) frappe_sdk.dart_pendingDrain tracks in-flight upgrade pull; awaited before null-out.
H12 ✅ FIXED (R2) documents table eliminated; single write path via LocalWriter.
H13 ✅ FIXED (R2) Both pull paths (PullApply.applyPage + applyServerDocument) write exclusively to docs__*.
H14 ✅ FIXED (R2) Single storage backend; dual-write path fully eliminated.
POST-4 ✅ FIXED (c62a865) response_writeback.dart:50–65 — DELETE early-exit hard-deletes parent + cascades children + retires outbox row before the name/docname check. Covered by test/sync/response_writeback_test.dart.

Medium (22 M-series + 2 N-series + 2 AC-series + 3 POST-REVIEW — all resolved)

ID Status Verified at
M1 ✅ FIXED (R2) ChildTableInfo consolidated; _ChildInfo alias deleted; typedefs in child_table_info.dart.
M2 ✅ FIXED (R2) closure_builder.dartConcurrencyPool used in BFS level; pullSyncMany uses pool.
M3 ✅ BY DESIGN Parent-shields-children on conflict — same as C3; documented.
M4 ✅ FIXED (R3) link_field_coordinator.dart:220unawaited(_processQueue()); recursive finally call removed; while loop drains.
M5 ✅ FIXED (c62a865) meta_service.dart:260 — outer catch now calls onMetaSyncFailure?.call('checkAndSyncDoctypes', e). Silent outer failure path closed.
M6 ✅ FIXED (R2) _uuidToServerName and duplicate UUID helpers deleted.
M7 ✅ FIXED (R2) pullSyncMany uses await DeviceTier.detect() for concurrency sizing.
M8 ⏭️ DISMISSED client.doctype.list() (doctype_service.dart) returns [] for any response missing a message key — Frappe HTTP error bodies fall into this path. HTTP-level errors throw at the client layer before reaching the parser. The one remaining edge case (message: null cast error) is caught by PullEngine._runDoctype's generic catch, cursor is not advanced, and the doctype retries next cycle. No data loss, no stall. Against a real Frappe server this never fires in production.
M9 ✅ FIXED (R2) meta_service.dart — LRU cleared in fetchAndStoreInDb.
M10 ⏭️ DISMISSED DependencyGraphBuilder is a 49-line pure static helper: buildOutgoing(DocTypeMeta)DepGraph. No async, no BFS, no metaFetcher. It is called by ClosureBuilder inside the BFS loop as an edge extractor for one node. ClosureBuilder owns the traversal. These are complementary classes with distinct responsibilities, not duplicated BFS implementations. Original review was a misread.
M11 ✅ FIXED (R2) sync_state.dartSyncState implements operator ==.
M12 ✅ FIXED (R2) outbox_dao.dartmarkFailed + BlockedByUpstream on retry exhaustion.
M13 N/A LinkDecorator uses server_name/mobile_uuid; no dynamic SQL interpolation risk.
M14 ✅ FIXED (c62a865) offline_repository.dart:607 — missing meta now throws StateError instead of silent print + return. Callers see a proper failure; outbox row is marked failed rather than incorrectly marked done.
M15 ✅ FIXED (c62a865) bulk_watermark_probe.dart — class-level docstring added: "Best-effort contract: endpoint absence is not an error. detect catches all exceptions and marks the probe unavailable — callers must handle available: false."
M16 ✅ FIXED (R2) meta_migration.dartMetaMigration.apply wrapped in db.transaction().
M17 ✅ FIXED (R2) sdk_meta_dao.dartreadOfflineMode returns OfflineMode.fallback when set_at IS NULL.
M18 ✅ FIXED (R2) connectivity_watcher.dart — subscription keeps _cachedOnline live across reconnects.
M19 ✅ FIXED (R2) Both pull paths write the same cursor format; SIG-9 drift eliminated.
M20 ✅ FIXED (R2) tier_computer.dartTierComputer.compute + dependency scan enforces parent-before-child ordering.
M21 ✅ FIXED (R2) idempotency_strategy.dartIdempotencyStrategy L1/L2/L3 covers lost-ACK on INSERT.
M22 ✅ FIXED (R2) app_database.dartwipeOfflineDocumentTables deletes pending_attachments.
N1 ✅ FIXED (R3) frappe_sdk.dart:1081_runUpgradeClosurePull calls _pullEngine!.run(closure, allowedDoctypes: pullable) inside _syncService!.protect().
N2 ✅ FIXED (R3) frappe_sdk.dart:1094_runPullForController delegates to _runUpgradeClosurePull; real Set<String> of deferred doctypes returned for SIG-2 re-pull.
AC3 ✅ FIXED (R3) test/offline_cycle_integration_test.dart (425 lines) — 5-phase end-to-end: write → push → pull → local edit → conflict. Real SQLite + typed fake HTTP.
AC4 ✅ FIXED (R3) Same as N1 — PullEngine wired into production path; dead-code label removed.
POST-1 ✅ FIXED (c62a865) frappe_sdk.dart:1205–1223_pendingDrain now awaited before _syncStateNotifier?.close(). Comment added explaining producer/consumer ordering. SyncStateNotifier.value= has no isClosed guard (verified), so ordering matters.
POST-2 ✅ FIXED (c62a865) offline_transition_service.dart:88,133drainTimeout = const Duration(minutes: 5) parameter added; pushSync().timeout(drainTimeout, onTimeout: ...) applied.
POST-3 ✅ FIXED (c62a865) frappe_sdk.dart:623logout() now wraps _authService!.logout() in _syncService!.protect(...). SyncMutex acquired before clearAllData() runs.

Low (17 L-series — all resolved or dismissed)

ID Status Verified at
L1 ✅ FIXED (R2) frappe_sdk.dart_mobileUuid cached; _resolveMobileUuid() avoids repeat platform calls.
L2 ✅ FIXED (R2) sync_result.dartSyncResult.empty(status: SyncStatus.offlineModeDisabled) is distinguishable from other empty results.
L3 ✅ FIXED (R2) connectivity_watcher.dart_onEvent deduplicates same-state events.
L4 ⏭️ DISMISSED write_queue_test.dart verified: zero Timer or Future.delayed calls. Tests await sequential queue submissions only. fakeAsync controls fake clocks and zones — it cannot control real SQLite futures from sqflite_ffi's in-process execution. Real async is the correct pattern here and results are deterministic regardless of CI runner speed.
L5 N/A No hard-coded route literals found in codebase; already refactored before review.
L6 ✅ FIXED (R3) form_screen.dart:864,887ValueListenableBuilder<bool> scopes rebuilds to AppBar save button and spinner only; FrappeFormBuilder tree is not in the rebuild scope.
L7 ✅ FIXED (R3) doc_type_meta.dart:52 — per-field try/catch; failures log fieldname and fall back to DocField(fieldname:..., fieldtype:'Data').
L8 ✅ FIXED (R2) field_type_mapping.dart — 35+ field types mapped; Duration, Rating, Geolocation covered.
L9 ✅ FIXED (R2) normalise_for_search.dartnormalizeForSearch strips U+200D Zero Width Joiner.
L10 ✅ FIXED (R2) doc/release-2.0/ replaces prior misleading docs.
L11 ✅ FIXED (R2) CHANGELOG.md updated with 2.0.0 storage migration entry.
L12 ✅ FIXED (R2) SyncErrorsScreen has empty-state UI.
L13 ✅ FIXED (R2) Drain screen renders '${drainedRecords} of ${totalRecords}'.
L14 ✅ FIXED (R2) TODOs reduced from 14 to 1 (dep_graph.dart:61, acceptable — explains deferred incoming-edge wiring).
L15 ✅ FIXED (c62a865) query_result.dart:26static const empty = QueryResult<Map<String, dynamic>>(...) added.
L16 N/A LinkDecorator checks name == null || name.isEmpty — no crash path.
L17 ✅ FIXED (c62a865) depends_on_evaluator.dart:7–11 — trust boundary docstring added: "expressions come from server-side DocType meta configured by Frappe admins… NOT sandboxed — do not pass user-supplied strings to evaluate."

Summary

Category Total tracked Fixed / BY DESIGN Dismissed (evidence) N/A
Critical 5 5 0 0
High (incl. POST-4) 15 15 0 0
Medium (incl. N, AC, POST) 30 27 2 (M8, M10) 1 (M13)
Low 17 13 1 (L4) 2 (L5, L16)
Total 67 60 3 3

All 67 tracked IDs are resolved. Zero items remain open.


Dismissed Items — Evidence Summary

These were not fixed by the dev because they do not require fixing.

M8 — PullPageFetcher Non-List Response

doctype_service.dart:list() returns [] when the Frappe response has no message key (covers most error body shapes). HTTP-level errors (4xx/5xx) throw at the client before the parser runs — PullEngine._runDoctype catches, logs via debugPrint, skips cursor advance, retries next cycle. No data loss; no stall. Practically unreachable against a Frappe server that returns RFC-compliant HTTP error responses.

M10 — ClosureBuilder / DependencyGraphBuilder "Duplication"

DependencyGraphBuilder.buildOutgoing(DocTypeMeta) is a 49-line pure synchronous edge extractor called inside ClosureBuilder's BFS loop. It has no metaFetcher, no visited set, no traversal. The original review incorrectly described it as a parallel BFS implementation. The two classes are correctly separated: one orchestrates traversal, the other computes the edges for one node.

L4 — fakeAsync Inconsistency

write_queue_test.dart contains zero Timer or Future.delayed calls. Tests use await on queue submissions backed by sqflite_ffi (in-process, no platform channel). fakeAsync manipulates fake clocks and zones — it has no effect on real SQLite futures. Real async is appropriate and deterministic here.


New Issues Found This Review Cycle (None)

No new defects were found during the c62a865 verification pass. All changed files (10) addressed exactly the tracked items without introducing regressions.


Final Test Suite

Round Tests passing
R2 baseline ~750
R3 (HEAD 33803c0) 790
Patch (HEAD c62a865) 791

791 tests, 0 failures. The 1 net new test covers the ResponseWriteback DELETE path (POST-4).


Verified against git diff develop...om/offline_improvement at HEAD c62a865. All code citations checked against current branch. No source files were modified during this review — findings only.

@ankitDhwani ankitDhwani requested a review from deepak-dhwani May 12, 2026 10:10
…ovement

# Conflicts:
#	lib/src/services/link_option_service.dart
#	lib/src/ui/widgets/fields/link_field.dart
#	lib/src/ui/widgets/fields/table_multi_select_field.dart
#	test/services/link_option_service_resolve_filters_test.dart

@deepak-dhwani deepak-dhwani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great effort on the scope of this PR — the architecture, test coverage, and documentation are thorough. A few issues need addressing before this merges, ranging
from correctness bugs to doc gaps.


Critical

SyncMutex.protect() — TOCTOU race (lib/src/concurrency/sync_mutex.dart)

When two concurrent callers are both awaiting _busy!.future, they both resume in the same microtask turn. The while re-check runs but neither has yet set _busy,
so both see null and both proceed — the mutex provides no protection. Additionally, the second caller overwrites the first's Completer, so any third waiter on
the abandoned Completer will hang forever.

tryProtect() is safe. Only protect() is affected.

Suggested fix — promise-chaining pattern:
Future _chain = Future.value();

Future protect(Future Function() body) {
final result = chain.then(() => body());
chain = result.then(() {}, onError: (_) {});
return result;
}


PushEngine — unbounded recursive merge loop (lib/src/sync/push_engine.dart)

The guard if (row.retryCount < 1) is meant to allow three-way merge exactly once then fall through to markConflict. But row.retryCount (the outbox field) is
never incremented on the merge path — only docs__.sync_attempts is. Since row.retryCount is always 0 on a freshly-dequeued row, the merge fires unconditionally
on every TimestampMismatchError. The recursive call to _process(updated) at line 521 also carries retryCount == 0, so this recurses without bound for any
persistently conflicting row.

Fix: gate on sync_attempts from docs__ (consistent with what _dispatchOnce increments), or increment outbox.retry_count inside _autoMergeAndRetry
before re-queuing.


SdkMetaDao.writeOfflineMode — INSERT OR REPLACE zeroes schema_version and bootstrap_done (lib/src/database/daos/sdk_meta_dao.dart)

INSERT OR REPLACE in SQLite is DELETE + INSERT. Only the three named columns survive — schema_version, bootstrap_done, and session_user_json revert to their
column defaults (0, 0, NULL). The first time a user toggles offline mode, bootstrap_done becomes 0 and schema_version becomes 0, causing any code gating on
these fields to behave incorrectly.

Fix:
await _db.rawUpdate(
'UPDATE sdk_meta SET offline_enabled = ?, offline_enabled_set_at = ? WHERE id = 1',
[enabled ? 1 : 0, setAtMs],
);
Worth adding a test that calls writeOfflineMode then reads back schema_version and bootstrap_done to prevent regression.


High

Non-WriteQueue merge path — torn state on crash (push_engine.dart)

When writeQueueResolver is null, the merged doc write and the outbox state update happen in two separate db.update calls. A crash between them leaves
docs__ with merged values but the outbox row still in_flight. On restart, resetInFlightToPending re-queues it; _dispatchOnce then builds the payload
from the already-merged row, so the next ThreeWayMerge treats the post-merge state as the local delta — silently discarding the user's original edit. The
WriteQueue path wraps both operations in a single transaction correctly; the non-WriteQueue path should do the same.


WriteQueue — SAVEPOINT failure leaves caller permanently suspended (lib/src/concurrency/write_queue.dart)

If txn.execute('SAVEPOINT $sp') throws, the inner catch attempts ROLLBACK TO $sp (which will also fail since the savepoint was never created), logs the rollback
error, and returns — but p.completer is never settled. The caller's await wq.submit(...) hangs forever. The outer catch only drains items still in _queue;
items already dequeued but not yet completed are silently abandoned.

Fix: unconditionally call p.completer.completeError(e, st) in the SAVEPOINT catch block before attempting the rollback.


ThreeWayMerge — null clear treated as "unchanged"; empty base discards all local edits (lib/src/sync/three_way_merge.dart)

Two related issues:

  1. _eq(null, null) returns true. If a user explicitly clears a field that was already null in base, localChanged evaluates to false and the server value wins —
    the intentional clear is silently lost.
  2. When push_base_payload is null (any row written before this feature was introduced), base is {}. base[k] returns null for every key, so on a document where
    many local fields are also null, localChanged = false across the board and all local edits are discarded in favour of server values.

Medium

OutboxDao.recordSave() — no transaction around multi-step query+mutate (lib/src/database/daos/outbox_dao.dart)

The SELECT (find collapsable rows) and the subsequent DELETE/INSERT/UPDATE are separate statements with no wrapping transaction. A concurrent recordSave for the
same (doctype, mobileUuid) can observe the same collapsable row between the first caller's read and delete, resulting in two live outbox rows for what should
be one collapsed update.

Fix: wrap the body of recordSave() in await (db as Database).transaction((txn) async { ... }).


resolveConflict(pullAndOverwriteLocal) — local mirror stuck in conflict permanently (lib/src/services/sync_controller.dart)

When serverName is empty (an INSERT that never reached the server), markDone clears the outbox row but the docs__ row remains with sync_status =
'conflict'. Since the outbox row is gone, no retry path will touch it — the document appears permanently broken with no recovery path.


Documentation gaps (will cause compile errors for consumers following the migration guide)

Wrong method names in breaking-changes.md and migrating-from-1.x.md:

  • Both documents reference offlineRepository.createDocument(doctype:, data:) — the actual method is saveDocument(doctype:, data:).
  • The migration table shows deleteDocument(localId: ...) — the actual signature is deleteDocument(doctype:, mobileUuid:).
  • updateDocumentData(localId: ...) also does not match the actual API.

OfflineModeNotifier is not exported from lib/frappe_mobile_sdk.dart despite being referenced in four constructor signatures in breaking-changes.md §2. Consumers
who wire services manually cannot type this parameter.

DocumentEntity removal is undocumented. It was a public type in 1.x (return type of all DocumentDao methods). Any consumer who typed List will
get a compile error with no hint in the migration guide.

SyncController constructor uses internal types (OutboxDao, RunPullFn, etc.) that are not exported. The docs should explicitly state that SyncController must be
obtained via sdk.syncController and not constructed directly.


Minor

  • pending_attachments.top_parent_uuid has no FOREIGN KEY ... ON DELETE CASCADE to outbox.mobile_uuid. Orphaned attachment rows accumulate when outbox entries
    are hard-deleted.
  • SchemaApplier.apply uses CREATE TABLE without IF NOT EXISTS. A concurrent isolate calling apply for the same doctype will fail with "table already exists".
    Adding IF NOT EXISTS costs nothing.
  • wipeOfflineDocumentTables drops all docs__* tables but does not reset bootstrap_done or offline_enabled in sdk_meta. After a wipe, bootstrap_done = 1 remains,
    and the next sync startup may skip re-bootstrap against empty tables.
  • No onDowngrade handler — if a user rolls back to a 1.x build, the app will hard-crash rather than show a user-visible error. A graceful handler with a clear
    message is safer for a publicly distributed SDK.
  • Both CI checks are green, but actions/checkout@v4 (Node.js 20) needs bumping before September 16 — same as PR #35.

The core feature design is solid and the test suite is extensive. Addressing the critical and high items (particularly the mutex race, the merge loop, and the
INSERT OR REPLACE bug) with targeted regression tests covering those specific paths will get this to merge-ready.

… UX polish

Port of om/logic_correction fixes to om/offline_improvement.

- DoctypeService.list: guard non-List Frappe message shape → treat as empty page
- RestHelper.get/getPublic: expose per-call timeout and maxRetries overrides so
  boot/splash callers can fast-fail instead of burning the default 30s × 3-retry budget
- PermissionService.syncFromApi: accept timeout param; remove silent catch so
  callers own error handling
- DoctypeMetaDao: add tableNameFor() helper — getTableName() ?? normalize fallback
- child_schema / parent_schema: IF NOT EXISTS on all CREATE TABLE/INDEX for idempotency
- MobileHomeScreen: suppress full-screen spinner on background syncComplete$ reloads —
  only show it on first paint when the list is empty
- FieldStyle: default showLabel/showDescription to false
- FrappeSDK: isInitialSync flag brackets boot pull; fast-fail permissions probe (10s);
  schema pre-create before closure pull; retryInitialMetaAndDataSync() public API;
  clearLastError() on logout; schemaReconciler wired into SyncEngineBuilder
- OfflineRepository: reconcileParentTableForMeta() public entrypoint
- SyncEngineBuilder: wire schemaReconciler; bump default pageSize 100→500; add limitStart
- Cursor: add start field for offset pagination; markComplete resets start=0
- PullEngine: SchemaReconcilerFn typedef; invoke reconciler per doctype; stall guard
  restricted to complete=true cursors only
- PullPageFetcher: dual-mode pagination — initial sync uses limit_start offset,
  incremental uses modified >= filter with limit_start=0
- SyncStateNotifier: recordLastError() / clearLastError() for boot error surfacing
- Tests: Cursor.start round-trip, initial-sync limit_start advance, incremental cursor
  assertions, stall guard pre-seeding
Push engine
- Break the merge-retry recursion loop: the slim outbox no longer
  carries `retry_count`, so the `row.retryCount < 1` guard always
  saw 0 and `_autoMergeAndRetry` recursed unbounded on any
  persistently conflicting row. Replace with an explicit
  `mergeAttempted` flag passed through `_process`.
- Wrap the non-WriteQueue merge writeback in a single
  `db.transaction` so a crash between the docs__ update and the
  outbox state flip can't tear state. The WriteQueue path was
  already transactional.
- Guard `_autoMergeAndRetry` against an empty `push_base_payload`.
  Without a baseline, ThreeWayMerge can't distinguish "user left
  null" from "user cleared", and a row with many null fields would
  silently lose local edits. Bail out to `markConflict` instead.

sdk_meta
- `writeOfflineMode` was `INSERT OR REPLACE`, which is DELETE +
  INSERT in SQLite — wiping `schema_version`, `bootstrap_done`,
  and `session_user_json` to defaults. Switch to a single-statement
  `INSERT ... ON CONFLICT(id) DO UPDATE SET` so the offline-mode
  columns are surgically upserted and the rest is preserved.
- `wipeOfflineDocumentTables` now resets `bootstrap_done = 0` in
  the same transaction (the per-doctype mirrors that bootstrap
  built are dropped by wipe). Other sdk_meta fields untouched.

Conflict resolution
- `resolveConflict(pullAndOverwriteLocal)` deleted the outbox row
  when there was no server snapshot to fetch but left
  `docs__.sync_status = 'conflict'` — the row was stuck in the
  errors UI with no recovery path. Add an optional
  `clearLocalConflict` callback on SyncController; wire it from
  SyncEngineBuilder to flip the docs__ row back to `dirty` (best-
  effort, logged on failure).

Attachments
- `deleteDocument` on a never-pushed doc cancels the INSERT and
  hard-deletes the docs__ row; pending_attachments with that
  top_parent_uuid would orphan and retry forever. Add a
  `deleteForTopParent` helper on PendingAttachmentDao and clear
  attachments inside the same delete transaction.

Public surface
- Export `OfflineModeNotifier` from the SDK barrel — referenced by
  four constructor signatures in the migration docs but previously
  not in `lib/frappe_mobile_sdk.dart`.

Migration docs (doc/release-2.0)
- breaking-changes.md / migrating-from-1.x.md: fix method-name
  references (`createDocument` / `updateDocumentData` /
  `deleteDocument(localId:)` -> `saveDocument` /
  `deleteDocument(doctype:, mobileUuid:)`).
- breaking-changes.md §1.3: document `DocumentEntity` removal
  with a 1.x -> 2.0 before/after.
- breaking-changes.md §2 and whats-new.md §6: note that
  SyncController is not user-constructable; obtain via
  `sdk.syncController` (constructor takes unexported internal
  types).

Tests (+6 regression cases, 814 passing)
- push_engine: persistent TimestampMismatch markConflict + empty
  base markConflict
- sdk_meta_dao: writeOfflineMode preserves sibling columns
- sync_controller: serverName=null branch calls clearLocalConflict
- offline_repository: deleteDocument clears pending_attachments
- wipe_offline_document_tables: resets bootstrap_done but keeps
  schema_version + session_user_json

Push-backs (no change):
- SyncMutex.protect() TOCTOU — Dart microtasks run serially;
  existing 3-waiter FIFO test passes.
- WriteQueue savepoint completer leak — completer IS settled at
  line 81 in the catch block.
- OutboxDao.recordSave no txn — all 4 call sites wrap with txn.
- SchemaApplier IF NOT EXISTS — already present.
- onDowngrade handler — downgrade not supported; hard-crash is
  intentional, not silent data loss.
- actions/checkout@v4 bump — CI scope, separate PR.

See docs/pr36_sr_review_1_summary.md for the full breakdown.
@Omprakash-48

Copy link
Copy Markdown
Collaborator Author

At a glance

Severity Total Fixed Pushed back Partial
Critical 3 2 1 0
High 3 1 1 1
Medium 2 1 1 0
Doc 4 4 0 0
Minor 5 1 3 1
Total 17 9 6 2

✅ Fixed (9 items + 2 partial)

Critical / High / Medium

#2 — PushEngine recursive merge loop

File: lib/src/sync/push_engine.dart

Bug: if (row.retryCount < 1) gated the auto-merge retry on a counter that's never incremented (the outbox.retry_count column was dropped when the outbox was slimmed in v2.0). Every TimestampMismatchError re-fired _autoMergeAndRetry → recursive _process → unbounded recursion that hung the engine on any persistently conflicting row.

Fix: Replaced the broken counter guard with an explicit mergeAttempted flag passed through _process. After one merge attempt, a second TimestampMismatchError falls through to markConflict and surfaces the row to the user.

Test: TimestampMismatch persists across merge retry → markConflict (no unbounded recursion) in test/sync/push_engine_test.dart.


#3writeOfflineMode wiped sibling columns

File: lib/src/database/daos/sdk_meta_dao.dart

Bug: INSERT OR REPLACE INTO sdk_meta (id, offline_enabled, offline_enabled_set_at) VALUES (1, ?, ?) is DELETE + INSERT in SQLite. Only the three named columns survived; schema_version, bootstrap_done, and session_user_json reverted to defaults the first time a user toggled offline mode.

Fix: Single-statement upsert — INSERT INTO sdk_meta ... ON CONFLICT(id) DO UPDATE SET offline_enabled = ?1, offline_enabled_set_at = ?2. One DB roundtrip; sibling columns untouched.

Test: writeOfflineMode preserves schema_version, bootstrap_done, session_user_json in test/database/sdk_meta_dao_test.dart.


#4 — Non-WriteQueue merge path: torn state on crash

File: lib/src/sync/push_engine.dart

Bug: When writeQueueResolver was null (defensive path), _autoMergeAndRetry did two separate db.update calls — one to write merged values to docs__<doctype>, one to flip the outbox row back to pending. A crash between them would leave docs__ merged but the outbox row stuck in_flight; on restart, resetInFlightToPending would re-queue it and the next payload assembly would be based on the already-merged row.

Fix: Wrapped both writes in a single db.transaction((txn) async {...}), mirroring the WriteQueue path. Sqflite guarantees atomic commit/rollback.

Note: Production wires writeQueueResolver unconditionally; the fallback path is currently only exercised by tests. Fix is defensive.


#8pullAndOverwriteLocal left docs__ stuck in conflict

Files: lib/src/services/sync_controller.dart, lib/src/services/sync_engine_builder.dart

Bug: When the user picked "discard local changes, use server version" on a conflict but the doc never reached the server (empty serverName), the SDK deleted the outbox row but forgot to clear docs__.sync_status = 'conflict'. The row appeared stuck in the SyncErrorsScreen with no recovery path.

Fix: Added an optional clearLocalConflict(doctype, mobileUuid) callback on SyncController. The empty-serverName branch of resolveConflict now invokes it (best-effort, logged on failure) before markDone. SyncEngineBuilder wires it to a function that runs UPDATE docs__<doctype> SET sync_status = 'dirty', sync_error = NULL WHERE mobile_uuid = ? AND sync_status = 'conflict'.

Test: resolveConflict pullAndOverwriteLocal: serverName=null calls clearLocalConflict in test/services/sync_controller_test.dart.


#6 (partial) — ThreeWayMerge with empty push_base_payload

File: lib/src/sync/push_engine.dart

Bug (claim 2 only): When push_base_payload was null or empty, base became {}. For any local null field, localChanged resolved to false and the server value won — a row written without parentMeta available could silently lose local edits.

Fix: _autoMergeAndRetry now bails out to markConflict when base is empty. Safer than silently picking server values without a reliable change-detection baseline. User then resolves manually.

Claim 1 (_eq(null, null) swallows an "explicit clear") rejected — that's the inherent ambiguity of 3-way merge without per-field touch tracking, not a bug.

Test: TimestampMismatch with empty push_base_payload → markConflict (no auto-merge) in test/sync/push_engine_test.dart.


Documentation

#9 — Wrong method names in migration docs

Files: doc/release-2.0/migrating-from-1.x.md, doc/release-2.0/breaking-changes.md

Bug: Docs referenced createDocument(doctype:, data:), updateDocumentData(localId:, ...), deleteDocument(localId:) — none of which match the actual 2.0 API.

Fix: Migration table now points to saveDocument({required String doctype, required Map<String, dynamic> data}) (the unified create+update entry point) and deleteDocument({required String doctype, required String mobileUuid}). Added a note explaining how mobile_uuid replaces the 1.x localId field.


#10OfflineModeNotifier not exported

File: lib/frappe_mobile_sdk.dart

Bug: Docs told consumers to pass OfflineModeNotifier? into four service constructors, but the type wasn't in the public barrel.

Fix: Added export 'src/models/offline_mode_notifier.dart' show OfflineModeNotifier;.


#11DocumentEntity removal undocumented

File: doc/release-2.0/breaking-changes.md

Bug: DocumentEntity was a public type in 1.1.0 (verified via git show v1.1.0:lib/frappe_mobile_sdk.dart). It's gone in 2.0 with no migration entry — consumers typed on List<DocumentEntity> would get compile errors with no hint.

Fix: Added §1.3 DocumentEntity — removed to breaking-changes.md with a before/after example showing how to switch to List<Map<String, Object?>> and how localId maps to the mobile_uuid column.


#12SyncController construction not user-facing

Files: doc/release-2.0/breaking-changes.md, doc/release-2.0/whats-new.md

Bug: SyncController is exported but its constructor requires several internal types (OutboxDao, RunPullFn, RunPullForFn, etc.) that are intentionally NOT exported. Docs didn't mention this — consumers attempting manual construction would hit unresolvable types.

Fix: Added a "SyncController is not user-constructable — obtain via sdk.syncController after FrappeSDK.initialize()" note in both breaking-changes.md §2 (where migrators look) and whats-new.md §6 (where the feature is introduced).


Minor

pending_attachments orphan cleanup

Files: lib/src/services/offline_repository.dart, lib/src/database/daos/pending_attachment_dao.dart

Bug: When a never-pushed doc was cancelled via deleteDocument, docs__ was hard-deleted but pending_attachments rows with that top_parent_uuid remained. The uploader would retry them forever against a non-existent parent.

Fix: Added deleteForTopParent(mobileUuid) helper on the DAO and called it from deleteDocument's cancelledLocally branch inside the same transaction.

Test: deleteDocument on never-pushed doc also clears its pending_attachments in test/services/offline_repository_save_test.dart.


wipe_offline_document_tables now resets bootstrap_done (partial)

File: lib/src/database/app_database.dart

Bug: A wipe dropped every docs__* table but left sdk_meta.bootstrap_done = 1. Today no code reads bootstrap_done, so the documented impact ("next sync skips re-bootstrap") doesn't manifest — but the column is wired for future use.

Fix: Wipe now runs UPDATE sdk_meta SET bootstrap_done = 0 WHERE id = 1 inside the same transaction. Preserves all other sdk_meta fields (schema_version, session_user_json).

Test: wipeOfflineDocumentTables resets bootstrap_done but keeps other sdk_meta in test/database/wipe_offline_document_tables_test.dart.


❌ Pushed back (6 items)

#1 — SyncMutex.protect() TOCTOU

File: lib/src/concurrency/sync_mutex.dart

Reviewer claim: Two concurrent waiters could both see _busy == null and both proceed.

Why rejected: Dart microtasks run serially, not concurrently. When c.complete() fires, waiter B's continuation runs synchronously from the await return through final c = Completer(); _busy = c; (no await between them) before yielding; waiter C's microtask then sees _busy = c_B and re-enters the loop. The existing multiple protect waiters serialize FIFO test in test/concurrency/sync_mutex_test.dart exercises this 3-waiter scenario and passes consistently.

The reviewer's suggested promise-chaining pattern is a cleaner idiom but not required for correctness.


#5 — WriteQueue savepoint completer leak

File: lib/src/concurrency/write_queue.dart

Reviewer claim: If SAVEPOINT throws, the catch block logs and returns without settling the completer, so the caller hangs forever.

Why rejected: Reviewer missed line 81. The outer catch (e, st) wraps the nested rollback try/catch AND a p.completer.completeError(e, st) call. After the nested rollback handler runs (success or log-and-continue), line 81 unconditionally settles the completer with the original error.


#7OutboxDao.recordSave no transaction

File: lib/src/database/daos/outbox_dao.dart

Reviewer claim: SELECT + DELETE/INSERT/UPDATE in recordSave run unprotected, so concurrent saves could see the same row and double-insert.

Why rejected: All 4 production call sites pass a Transaction executor:

  • offline_repository.dart:478, 488, 497 — inside the rawDatabase.transaction((txn) {...}) block at line 465
  • offline_repository.dart:531 — inside the transaction at line 530

The DAO field is typed DatabaseExecutor (accepts both Database and Transaction). The reviewer misread the call sites — the wrapping txn is one level up.


Minor — SchemaApplier.apply CREATE TABLE without IF NOT EXISTS

Files: lib/src/database/schema/parent_schema.dart, lib/src/database/schema/child_schema.dart, lib/src/database/schema_applier.dart

Reviewer claim: CREATE TABLE lacks IF NOT EXISTS; a concurrent isolate could race and one would fail with "table already exists".

Why rejected: Both DDL builders already use CREATE TABLE IF NOT EXISTS (parent_schema:67, child_schema:48). SchemaApplier.apply additionally pre-checks sqliteTableExists before running the DDL. Triple-guarded.


Minor — No onDowngrade handler

File: lib/src/database/app_database.dart

Reviewer claim: Installing an older build over newer data hard-crashes the app; suggested onDatabaseDowngradeDelete or a custom handler.

Why rejected (per user direction): Downgrade is not a supported flow for this project. Silently nuking local data via onDatabaseDowngradeDelete would lose user work without consent. The hard-crash surfaces the problem to the user/embedder; any downgrade-handling policy belongs in the app shell, not the library.


Minor — CI actions/checkout@v4 Node.js 20 deprecation

Files: .github/workflows/ci.yml, release.yml, semantic-commits.yml

Reviewer claim: GitHub will sunset Node.js 20 support on Sept 16, 2026; bump to @v5.

Why rejected (per user direction): .github/ changes are out of scope for this PR — CI/workflow bumps are tracked separately.


Files changed

Library code (7 files)

  • lib/frappe_mobile_sdk.dart — export OfflineModeNotifier
  • lib/src/sync/push_engine.dart — mergeAttempted flag, txn-wrapped fallback, empty-base guard
  • lib/src/database/daos/sdk_meta_dao.dart — upsert preserves siblings
  • lib/src/database/daos/pending_attachment_dao.dartdeleteForTopParent helper
  • lib/src/database/app_database.dart — reset bootstrap_done on wipe
  • lib/src/services/offline_repository.dart — orphan attachment cleanup on cancelled INSERT
  • lib/src/services/sync_controller.dartclearLocalConflict hook
  • lib/src/services/sync_engine_builder.dart — wire clearLocalConflict

Tests (4 files — all additions, no rewrites)

Docs (3 files)

  • doc/release-2.0/breaking-changes.md — method-name corrections, §1.3 DocumentEntity entry, SyncController note
  • doc/release-2.0/migrating-from-1.x.md — method-name corrections, cross-ref to §1.3
  • doc/release-2.0/whats-new.md — SyncController construction note

Tracking artifacts (this folder)

  • docs/pr36_sr_review_1.md — original review (unchanged)
  • docs/pr36_sr_review_1_checklist.md — per-item verdict + technical detail
  • docs/pr36_sr_review_1_summary.md — this file

Verification

flutter test          # 814 passing
flutter analyze       # No issues found

P0–P5 coverage uplift driven by docs/test_coverage_checklist.md.
Baseline 814 → 1,274 tests. flutter analyze clean.

Coverage highlights (measured via `flutter test --coverage`):
- 50+ files at 100% (api/auth, api/client, api/document_service,
  api/attachment_service, api/exceptions, api/frappe_document, 3 new DAOs,
  permission_service, app_status_service, sync_state_notifier, plus
  text/password/read_only/rating/button widgets, etc.)
- 30+ files at ≥80% (push_engine 88.79%, rest_helper 88.74%,
  link_option_service 88.79%, unified_resolver 80.77%,
  doctype_service 82.61%, workflow_service 83.33%,
  translation_service 96.77%, link_field_coordinator 99.43%)

Layers covered:
- api/*: rest_helper, doctype_service, document_service, auth,
  attachment_service, client, exceptions, frappe_document
- services/*: link_field_coordinator, workflow, translation, permission,
  app_status, sync_controller (cascade), sync_engine_builder
  (clearLocalConflict), meta_service (cache), offline_repository (lookups),
  sync_service (pull phases)
- database/daos/*: link_option, auth_token, doctype_permission
- sync/*: push_engine (SUBMIT/CANCEL/DELETE/WriteQueue), sync_state_notifier
- query/*: unified_resolver parent-filter translation
- ui/widgets/fields/*: data, check, date, datetime, time, select, numeric,
  text, phone, password, read_only, rating, duration, html, button, attach,
  geolocation, image, link (static options), searchable_select, field_factory,
  child_table
- ui/widgets/*: sync_error_banner (message parsing), form_builder (lifecycle)
- ui/*: login_screen_style

Remaining 0% (deferred to integration): full-screen UI widgets
(login_screen, form_screen, sync_status_screen, etc.) and OAuth/secure-storage
code that requires platform-channel mocks. Tracked in
docs/test_coverage_checklist.md and covered by Appium/Playwright suites.

@deepak-dhwani deepak-dhwani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #36 — SDK 2.0 offline-first — Round 2 Review

Round-1 fixes verified

All 10 round-1 items are correctly fixed. The test suite additions (+460 tests) are solid. Completer, WriteQueue savepoint, and OutboxDao call-site push-backs
are accepted — the code analysis confirms the dev's reasoning was correct there.


BLOCKER B1 — SyncMutex: concurrent waiters both acquire the lock

lib/src/concurrency/sync_mutex.dart:~237

When N callers are blocked on await _busy!.future, the holder sets _busy = null before calling c.complete(). All N continuations queue as microtasks, all see
_busy == null, and all enter the critical section simultaneously. Two concurrent pullSyncMany calls can run against the same SQLite table in parallel.

Fix: set _busy = newCompleter before completing the old one, so late-waking microtasks loop again.


BLOCKER B2 — _migrateV2ToV3 wipes sdk_meta sibling columns

lib/src/database/app_database.dart:~165

ConflictAlgorithm.replace = DELETE + INSERT. An upgrading device with offline_enabled=1 or session_user_json loses those values silently.

Fix: ON CONFLICT(id) DO UPDATE SET schema_version = excluded.schema_version (same pattern as writeOfflineMode).


BLOCKER B3 — AtomicWipe silently no-ops if primary .db deletion fails

lib/src/services/atomic_wipe.dart:~1473

The same try/catch covers all three file deletions. A deletion failure on the main file reopens the old database — no actual wipe occurs, no error returned.

Fix: rethrow for suffix == ''; swallow only for -wal/-shm.


BLOCKER B4 — deleteDocument attachment cleanup failure silently swallowed

lib/src/services/offline_repository.dart:~3004

Inner on DatabaseException eats the error and continues. Transaction commits with docs__ row deleted but pending_attachments rows intact — uploader retries
forever.

Fix: remove the inner catch or rethrow; let the outer transaction roll back.


BLOCKER B5 — form_screen.dart TOCTOU: network drop between isOnline check and HTTP call loses user edit

lib/src/ui/form_screen.dart:~568

The offline save path is only taken when isOnline() returns false at check-time. A drop during the await gap (payload assembly, UUID fetch) hits the server
branch, fails, and sets _errorMessage — edit is never persisted offline.

Fix: in the catch block of the server path, fall through to the saveDocument offline path.


BLOCKER B6 — form_screen.dart setState after dispose in _save() finally

lib/src/ui/form_screen.dart:~487, ~760

setState(() => _isSaving = false) in the finally block has no mounted guard — widget may be disposed before it runs.

Fix: add if (mounted) to both setState(() => _isSaving = ...) sites.


HIGH H1 — Cursor infinite loop when all rows on a non-final page are tie-skipped

lib/src/services/sync_service.dart:~4302–4335

If all rows on a non-final page share the same modified as the cursor and are tie-skipped, pageLastModified stays null and neither advance branch fires. The
cursor doesn't move; the next call re-fetches the same page. Triggered by any bulk import where many records get the same modified timestamp — symptom is a
doctype that never finishes pulling.

Fix: when all rows on a non-final page are skipped, advance cursor to (cursorModified, last_name_on_page).


HIGH H2 — clearLocalConflict sets sync_status='dirty' with no outbox row → stuck unsynced forever

lib/src/services/sync_controller.dart:~260

For pullAndOverwriteLocal + serverName == null branch: resolveConflict deletes the outbox row AND sets sync_status='dirty'. No outbox entry will ever drive a
push. Document appears unsynced forever.

Fix: set sync_status='synced' for this branch — the INSERT never reached the server, the local copy is ground truth with no pending work.


HIGH H3 — Pull engine stall guard is dead code from page 2 onwards

lib/src/sync/pull_engine.dart:~1164

scratch.complete is always false after page 1 (Cursor.advance() defaults complete=false). Any bulk-edit scenario where multiple rows share the same modified
will loop forever from page 2.

Fix: remove the scratch.complete condition from the stall guard.


HIGH H4 — clearAllData crash window: drops tables then rebuilds schema in two separate operations

lib/src/database/app_database.dart:~362

A process kill between the DROP transaction commit and _onCreate completing leaves an empty, unversioned database. The SDK hard-bricks on next open.

Fix: fold _onCreate into the same transaction as the drops, or add a schema_rebuilt sentinel checked in onOpen.


HIGH H5 — Table names interpolated into raw SQL without quoting

lib/src/query/filter_parser.dart:~790, lib/src/query/unified_resolver.dart:~1193, 1331, 1344

$tableName, $childTable, $parentTable are interpolated without double-quote wrapping. Any Frappe doctype whose normalized name is a SQLite reserved word
(values, table, group) produces a parse error. (wipeOfflineDocumentTables already quotes correctly — apply the same pattern here.)

Fix: "$tableName" everywhere; escape embedded " as "".


HIGH H6 — _retrySyncRow and pullSync errors silently swallowed, no user feedback

lib/src/ui/form_screen.dart:~1102, lib/src/ui/document_list_screen.dart:~719

Retry failures print to console only; pullSync failure leaves list empty/stale with no message.

Fix: SnackBar in both catch blocks.


HIGH H7 — ThreeWayMerge._eq uses identity equality for List values — silent field loss

lib/src/sync/three_way_merge.dart:~2152

List == List is identity, not deep equality. Any JSON-array or multi-select field always appears "locally changed" even when unchanged, so the server's version
is silently discarded on every auto-merge.

Fix: add a List branch in _eq using const DeepCollectionEquality().equals(a, b).


HIGH H8 — UuidRewriter: Dynamic Link reads sibling value before sibling is rewritten

lib/src/sync/uuid_rewriter.dart:~2315

If a sibling Link field appears after its Dynamic Link in payload.entries order, the sibling hasn't been rewritten yet when the Dynamic Link reads it.
resolveServerName(mobileUuid_as_doctype, value) finds no match and throws BlockedByUpstream.

Fix: two-pass rewrite — resolve all non-Dynamic-Link fields first, then Dynamic Links.


MEDIUM (grouped)

  • M1 attachment_pipeline.dart:115 — backoff uses attempt+1; the 2s first-retry slot is never used (harmless, misleading)
  • M2 push_engine.dart:~1719 — merged row set to pending before markInFlight; concurrent runOnce could double-dispatch
  • M3 meta_migration.dart:~1942 — addedFields loop missing _columnExists guard (unlike addedIsLocalFor/addedNormFor)
  • M4 meta_migration.dart:~1930 — DROP INDEX swallows all DatabaseException, not just "no such index"
  • M5 meta_differ.dart + meta_migration.dart — typeChanged list computed but never acted upon; type mismatches silently ignored
  • M6 outbox_dao.dart:~773 — recordSave lacks Transaction enforcement; safe today, race if called with bare Database
  • M7 connectivity_watcher.dart:~112 — PlatformException from checkConnectivity() hard-fails initialize() instead of degrading to offline
  • M8 sync_engine_builder.dart:~3510 — 4 unconditional [DIAG resolveServerName] print(...) calls will flood production logs
  • M9 frappe_sdk.dart — _syncCompleteController leaks if _doInitialize throws after its creation
  • M10 form_screen.dart — offline-save SnackBar text identical to server-save; user can't distinguish queued vs confirmed
  • M11 logout_guard_dialog.dart — outbox count query failure silently bypasses the logout guard
  • M12 sync_progress_screen.dart / sync_errors_screen.dart — onPause/onCancel/onRetry futures discarded by button onPressed; errors swallowed
  • M13 link_field.dart — no debounce on syncComplete$ listener; N concurrent re-fetches on multi-doctype sync
  • M14 form_screen.dart — didChangeAppLifecycleState fires for off-screen tabs in IndexedStack/PageView

6 blockers + 8 highs need to be fixed before merge. The most critical path is: B1 (SyncMutex concurrent access) → H1 (cursor stuck on bulk-import
timestamps) → B2 (v2→v3 migration wipes user prefs) → B5/B6 (form save crash + edit loss) → H2 (document stuck unsynced forever after conflict resolution).

The test suite (+460 tests, all layers covered) is excellent. Once the above are fixed this is close to ready.

…< 3.24 compat

Round-2 review items spanning correctness, durability and UX:

H4  app_database: clearAllData DROP loop + base rebuild now share one
    transaction so a process kill cannot leave the DB with dropped
    application tables but an unchanged user_version (next open would
    skip both onCreate and onUpgrade and hard-brick the install).
H2  sync_engine_builder: empty-server-name conflict resolution flips
    docs__ back to `synced` (not `dirty`) — the outbox row is about to
    be deleted, so there is no pending push to drive.
H7  three_way_merge: deep equality for List/Map; default identity-`==`
    was treating every decoded-JSON container as locally-changed and
    discarding server updates.
M2  push_engine: scheduled retry writes `in_flight` directly to close
    the gap where a concurrent drain could re-pick a `pending` row.
M4/M5 meta_migration: rethrow non-"no such index" DROP failures; log a
    typeChanged warning so column-type drift is visible during audit.
M6  outbox_dao.recordSave: debug-assert it runs inside a Transaction so
    the collapse rules can rely on read+write atomicity.
M7  connectivity_watcher: catch PlatformException from the initial
    probe (older Android OEMs without network-info permission) and
    default to offline instead of aborting SDK.initialize.
M8  sync_engine_builder: gate the resolveServerName DIAG dumps to
    kDebugMode — release builds were paying for an extra `SELECT *`
    per resolve and flooding production logs.
M9  frappe_sdk: close the orphan _syncCompleteController if
    _doInitialize throws after creating it.
M12 sync_errors / sync_progress / document_list / form_screen: button
    actions await the future and surface failures via SnackBar instead
    of letting them become uncaught zone errors.
M14 form_screen: didChangeAppLifecycleState gates on ModalRoute.isCurrent
    so a multi-tab IndexedStack host doesn't refetch outbox N× per
    foreground.

Other:
- sdk_meta_dao.writeOfflineMode: replace `INSERT … ON CONFLICT … DO
  UPDATE` (SQLite ≥ 3.24, fails to compile on Android < 9) with
  UPDATE-then-INSERT-OR-IGNORE in a txn. Preserves schema_version /
  bootstrap_done / session_user_json the same way the UPSERT did.
- offline_repository.deleteDocument: stop swallowing pending_attachments
  cleanup failures during hard-delete; sweep queued attachments before
  the soft-delete tombstone too, so an already-pushed DELETE doesn't
  leave orphan uploads racing the server-side cascade.
- atomic_wipe: rethrow on primary `.db` delete failure so the next
  open doesn't mask the real error as "table already exists"; -wal /
  -shm deletes remain best-effort.
- attachment_pipeline: fix off-by-one in backoff index (`backoff[i+1]`
  was skipping the first delay and risking a range overshoot at the
  last attempt).
- form_screen: guard setState in catch/finally with `mounted` to avoid
  the post-dispose assert if the user navigates away mid-save.

Tests updated for every behavior change above.
@Omprakash-48

Copy link
Copy Markdown
Collaborator Author

PR #36 round-2 — Final Summary

✅ Fixed (19 items applied across 17 numbered fixes)

Sev ID Title Files changed
B B3 AtomicWipe silently no-ops on primary .db delete failure atomic_wipe.dart
B B4.1 deleteDocument cancelledLocally branch orphans pending_attachments offline_repository.dart
B B4.2 deleteDocument tombstone branch never cleans attachments offline_repository.dart
B B6 setState after dispose — 4 unguarded sites in _save + _handleDelete form_screen.dart
H H2 clearLocalConflict left rows stuck-unsynced (dirty → synced) sync_engine_builder.dart
H H4 clearAllData crash window between DROP and rebuild — folded into one txn app_database.dart
H H6 _retrySyncRow + _pullDocuments errors print-only → red SnackBar form_screen.dart, document_list_screen.dart
H H7 ThreeWayMerge._eq used identity equality for List/Map — deep equality added three_way_merge.dart
M M1 attachment_pipeline backoff off-by-one (backoff[attempt+1] → backoff[attempt]) attachment_pipeline.dart
M M2 push_engine merge → pending → markInFlight race — write in_flight directly push_engine.dart
M M3 addedFields loop missing _columnExists guard meta_migration.dart
M M4 DROP INDEX catch swallowed all DatabaseException — narrowed to 'no such index' meta_migration.dart
M M5 typeChanged silently ignored — warn-log added (partial; full rebuild deferred) meta_migration.dart
M M6 recordSave lacked Transaction enforcement — debug assert added outbox_dao.dart
M M7 connectivity_watcher.production() hard-fails on PlatformException — try/catch + default offline connectivity_watcher.dart
M M8 5 unconditional [DIAG resolveServerName] prints in release — kDebugMode gate sync_engine_builder.dart
M M9 _syncCompleteController leaks on init failure — close in catch frappe_sdk.dart
M M12 Discarded futures in sync screens — _runAndSurface helper sync_progress_screen.dart, sync_errors_screen.dart
M M14 didChangeAppLifecycleState fires for off-screen tabs — mounted + route.isCurrent gate form_screen.dart

Tests added: B3, B4.1, B4.2, H2 (updated existing), H7 (×4), M1, M3, M4. Mechanical / structural fixes verified by static review + the full 1,284-test suite passing.


⏭ Pushed back (10 items — claim refuted by code/docs)

Sev ID Reason for SKIP
B B1 Dart microtask scheduling already serializes SyncMutex waiters; empirical 50-caller test → maxRunning == 1
B B2 git show d73b517 — v2 schema has no sdk_meta; migration creates it fresh, REPLACE wipes only just-seeded defaults
B B5 TOCTOU only fires in legacy online-only mode by design; offline-first gate is mode-based, not connectivity-based
H H1 Pagination is offset-based (start += pageSize) — advances unconditionally regardless of tie-skip outcomes
H H3 PullPageFetcher.fetch constructs next cursor directly with complete: true preserved; Cursor.advance only used in tests
H H5 normalizeDoctypeTableName always prefixes docs__ — no SQLite reserved word matches; quoting inconsistency is style
H H8 Frappe docs (Controls page): Dynamic Link's options sibling holds a doctype-name string, not a mobile_uuid — no rewrite needed
M M11 showLogoutGuardDialog takes count as parameter; the claimed outbox query lives in the caller, not this file
M M13 Existing _isLoading + _options.isNotEmpty guards in link_field.dart already serialize bursts; my prior 200ms debounce empirically broke fresh-install dropdowns
M M10 UX-only ("Saved successfully" vs "Queued for sync"); not a correctness fix — left as optional polish

📊 Totals

  • Reviewer raised 28 items
  • 17 fixes applied (one fix item — M5 — is a partial warn-only landing)
  • 10 push-backs with code/doc citations
  • 1 UX-only (M10) — flagged for follow-up, not fixed

@deepak-dhwani deepak-dhwani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #36 — SDK 2.0 offline-first — Round 3 Review

Round-2 fixes verified

All 18 items addressed in this commit are correctly fixed. Highlights:

  • _onCreateBody called inside the same transaction as the DROP loop — crash window closed
  • clearLocalConflict correctly sets synced (not dirty) with clear reasoning in the comment
  • ThreeWayMerge._eq deep equality for List and Map — clean recursive implementation
  • _runAndSurface helper pattern for button futures is the right approach; context.mounted checked before SnackBar
  • writeOfflineMode UPDATE-then-INSERT-OR-IGNORE is correct and covers SQLite < 3.24 (Android 8)
  • AtomicWipe rethrow on primary .db failure, best-effort on wal/shm — correct split

BLOCKER B1 — SyncMutex: concurrent waiters all acquire the lock

lib/src/concurrency/sync_mutex.dart — not touched in this commit.

The holder sets _busy = null before calling c.complete(). All N waiting microtasks resume in the same flush and all see _busy == null, so all enter the critical
section simultaneously. Two concurrent pullSyncMany calls can write to the same SQLite table in parallel.

The fix is to claim the new lock before completing the old completer:

Future protect(Future Function() body) async {
while (_busy != null) await _busy!.future;
final c = Completer();
_busy = c; // claim BEFORE any await
try {
return await body();
} finally {
_busy = null;
c.complete();
}
}


BLOCKER B5 — form_screen.dart: TOCTOU race loses user edit on mid-save network drop

lib/src/ui/form_screen.dart — mounted guards were added (B6, correctly fixed), but the TOCTOU itself is not addressed.

_save() calls isOnline(), stores serverReachable, then assembles the payload across multiple awaits. If the network drops during that gap, the server branch
throws, the catch block sets _errorMessage — and the offline saveDocument path is never reached. The edit is lost.

Fix: in the catch block of the server-first path, fall through to the offline saveDocument call rather than only setting _errorMessage.


HIGH H1 — Cursor infinite loop when a full page is entirely tie-skipped

lib/src/services/sync_service.dart — not touched in this commit.

When all rows on a non-final page share the same modified timestamp as the cursor and are tie-skipped, pageLastModified stays null and neither advance branch
fires. The cursor does not move. The next call re-fetches the same page, skips all rows again, and repeats forever. Triggered by any bulk import where many
records land with the same modified value — symptom is a doctype that never finishes pulling with no error surfaced.

Fix: when all rows on a non-final page are skipped, advance the cursor to (cursorModified, last_name_on_page) to guarantee forward progress through the tie
group.


HIGH H3 — Pull engine stall guard is dead code from page 2 onwards

lib/src/sync/pull_engine.dart — not touched in this commit.

scratch.complete is always false after the first page (Cursor.advance() defaults complete=false). The stall guard only fires on page 1. Any scenario where page
2+ contains rows all sharing the cursor's modified value loops forever.

Fix: remove the scratch.complete condition from the stall guard, or check the stall condition unconditionally for incremental pulls.


HIGH H5 — Table names interpolated into raw SQL without quoting

lib/src/query/filter_parser.dart, lib/src/query/unified_resolver.dart — not touched in this commit.

$tableName, $childTable, and $parentTable are interpolated directly into raw SQL strings without double-quote wrapping. Any Frappe doctype whose normalized name
is a SQLite reserved keyword (table, values, group, order, index) will cause a parse error. wipeOfflineDocumentTables already quotes table names correctly —
apply the same pattern here.

Fix: wrap all interpolated table names as "$tableName"; escape any embedded " as "".


HIGH H8 — UuidRewriter: Dynamic Link reads sibling value before sibling is rewritten

lib/src/sync/uuid_rewriter.dart — not touched in this commit.

For a Dynamic Link field, the rewriter reads payload[siblingFieldname] to get the target doctype. If the sibling Link field appears after the Dynamic Link in
payload.entries iteration order, the sibling has not been rewritten yet. resolveServerName is called with a mobile_uuid string as the doctype, finds no match,
and throws BlockedByUpstream.

Fix: two-pass rewrite — resolve all non-Dynamic-Link fields first, then resolve Dynamic Links using the already-rewritten sibling values.


Verdict

Two blockers (B1, B5) and four highs (H1, H3, H5, H8) remain unaddressed. The fixes in this commit are clean and correct. Once these six items
are handled the PR is ready to merge.

@Omprakash-48

Copy link
Copy Markdown
Collaborator Author
---
**Round 3 Review — Reply**

Thanks for the thorough re-read. After verifying each item against the current code on `om/offline_improvement` (head `cfa82c0`), I'm pushing back on all six. Citations below.
---

### B1 — SyncMutex race ❌ Not a bug

**Claim:** When the holder releases, all waiters resume in the same flush, all see `_busy == null`, all enter the critical section simultaneously.

**Verification:** `lib/src/concurrency/sync_mutex.dart:36-48`

The current `protect()` is:

```dart
Future<T> protect<T>(Future<T> Function() body) async {
  while (_busy != null) {
    await _busy!.future;
  }
  final c = Completer<void>();
  _busy = c;
  try {
    return await body();
  } finally {
    _busy = null;
    c.complete();
  }
}

Why this is safe:

  1. Dart's microtask queue runs microtasks sequentially — not in parallel.
  2. When the holder calls c.complete(), all waiters' continuations are scheduled as microtasks in FIFO order.
  3. Waiter 1's continuation resumes from await _busy!.future, re-checks while (_busy != null) (the loop re-check is the key), sees null, exits the loop, then synchronously runs _busy = c1_new before yielding at await body().
  4. By the time waiter 2's microtask runs, _busy is already non-null. It re-enters the while loop and awaits c1_new.future.

The proposed fix in the review is byte-identical to the existing code (including placing _busy = c between the loop and the first await). The // claim BEFORE any await comment in the suggestion describes what the code already does — _busy = c is on line 41, with no await between it and the loop exit on line 39.


B5 — Mid-save TOCTOU loses user edit ❌ Edit isn't lost

Claim: Network drop during a server-first save throws, catch sets _errorMessage, offline saveDocument is never reached → edit is lost.

Verification: lib/src/ui/form_screen.dart:566-787

When the catch fires (lines 768-776), it sets _errorMessage via setState but does not touch the form data. The form is rendered via FrappeFormBuilder (lines 977-1018) keyed by a stable ValueKey('form_${widget.document!.localId}') (lines 978-980). The FormBuilder's internal _formData is preserved across the rebuild because:

  • The widget key is unchanged.
  • initialData (lines 982-985) is computed from _workflowUpdatedDocData ?? widget.document?.data ?? widget.initialData — none of which mutate on save error.

The user sees the error banner; their typing remains in the form; they can retry on the next online attempt. No data loss.

Why "fall through to offline saveDocument" doesn't apply: The server-first branch only runs when offlineMode.enabled == false (line 591). In that mode there is no outbox, so calling saveDocument would have no push path. The legacy online-only mode is intentionally fail-loud — there's nowhere to queue.


H1 — Cursor infinite loop on tie-skip ❌ Not infinite; proposed fix would regress the cursor

Claim: When a non-final page is entirely tie-skipped, pageLastModified stays null, neither advance branch fires, cursor doesn't move, server returns the same page forever.

Verification (no infinite loop): lib/src/services/sync_service.dart:497-611

The pagination uses start += pageSize at line 609 before requesting the next page (via fetchPage(start + pageSize) look-ahead at line 510). Even if every row on page N is tie-skipped, start advances, the next request hits a different offset, and the server eventually returns an empty/short page → loop exits at line 502 or 608.

Verification (proposed fix would regress the cursor): The tie-skip rule at line 535 is:

if (modCmp == 0 && serverId.compareTo(cursorName) <= 0) continue;

For a page to be entirely tie-skipped, every row must satisfy name <= cursorName. With server order name asc, the last row of that page has the largest name in the page — but it is still <= cursorName. Setting cursor.name = last_name_on_page would move the cursor name backwards, allowing previously-applied rows to re-enter the apply path on the next call. That's a correctness regression.

There is a real but mild concern: the persisted cursor stays put when every page in a session is tie-skipped. Fix would require a separate "scan progress" marker independent of the cursor.


H3 — Pull engine stall guard dead from page 2 onwards ❌ Not a bug

Claim: scratch.complete is always false after page 1 (Cursor.advance() defaults complete=false), so the stall guard at pull_engine.dart:260-264 only fires on page 1.

Verification: The engine doesn't use Cursor.advance() here. lib/src/sync/pull_page_fetcher.dart:76-82 explicitly constructs the advanced cursor with complete: true in the incremental branch:

if (cursor.complete) {
  next = Cursor(
    modified: last['modified'] as String?,
    name: last['name'] as String?,
    complete: true,
  );
}

So result.advancedCursor.complete stays true across all pages of an incremental pull. The stall guard fires correctly on every page.

For initial sync (complete: false), the fetcher uses limit/offset pagination which always advances — no stall possible.


H5 — Unquoted table names → SQLite reserved keyword crash ❌ Not reachable

Claim: $tableName / $childTable / $parentTable are interpolated into raw SQL without quoting.

Verification: lib/src/database/table_name.dart:18-36

String normalizeDoctypeTableName(String doctype, {String prefix = 'docs__'}) {
  // ... lowercase, [^a-z0-9]+ → '_', collapse '_+', strip leading/trailing '_'
  return '$prefix$body';
}

Output is guaranteed to match docs__[a-z0-9_]+. The docs__ prefix is non-optional in production. No reserved keyword collision possible.


H8 — UuidRewriter reads sibling before sibling is rewritten ❌ Not a bug

Claim: For a Dynamic Link, the rewriter reads payload[siblingFieldname] before it has been rewritten.

Verification: lib/src/sync/uuid_rewriter.dart:67-105

Frappe Dynamic Link semantics: field.options names a sibling field whose value is a doctype name string (e.g. "Customer"), not a Link/UUID. It is a plain Data/Select field.

The rewriter only enters the rewrite branch for Link and Dynamic Link:

if (field != null &&
    (field.fieldtype == 'Link' || field.fieldtype == 'Dynamic Link')) {

The sibling field is never rewritten, so reading it always returns the original string value regardless of iteration order.


Summary

Item Verdict Cite
B1 Not a bug; proposed fix is identical sync_mutex.dart:36-48
B5 Edit is preserved; offline fallback inapplicable form_screen.dart:768-776, 978-985
H1 Not infinite; proposed fix would regress cursor sync_service.dart:535, 609
H3 Guard fires every page pull_page_fetcher.dart:76-82
H5 Normalizer prevents reserved-word collision table_name.dart:18-36
H8 Dynamic Link sibling holds doctype name uuid_rewriter.dart:67-68

No code changes from this round. Happy to walk through any of the citations live if anything still seems off.

@deepak-dhwani deepak-dhwani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #36 — SDK 2.0 Offline-First — Round 3 (Comprehensive) Review

This review covers all 307 changed files across sync, database, services, concurrency, query, UI, and docs layers.


BLOCKER B1 — PushEngine.runOnce() has no concurrent guard

lib/src/sync/push_engine.dart — runOnce() has no _running flag or mutex. SyncController calls it from multiple triggers simultaneously (user save, connectivity
restore, syncNow()). Each concurrent call independently resets in_flight rows back to pending and re-fetches the outbox, producing duplicate dispatches. A
document in in_flight gets pushed twice in parallel; the second wins the server write but the first's ResponseWriteback overwrites server_name/modified with a
stale value.

Fix: Add a bool _running = false guard at the top of runOnce() — if (_running) return; _running = true; — with a finally { _running = false; } block, mirroring
the pattern already used in WriteQueue._kick().


BLOCKER B2 — WriteQueue completers resolve before the outer transaction commits

lib/src/concurrency/write_queue.dart — Inside db.transaction((txn) async { ... }), p.completer.complete(r) is called after each savepoint release, while the
enclosing sqflite transaction has not yet committed. Dart microtasks scheduled by complete() fire at the next await in the loop (the next SAVEPOINT command), so
await wq.submit(...) callers in PullEngine and PushEngine receive a success signal before the data is durable. If the outer transaction subsequently fails
(disk full, lock timeout, DB closed), callers have already advanced their cursors or marked outbox rows done — data loss without an observable error.

Fix: Move p.completer.complete(r) / p.completer.completeError(e, st) out of the transaction callback. Collect results in a list inside the transaction, then
resolve completers after db.transaction(...) returns.


BLOCKER B3 — Cursor never graduates to incremental for zero-row doctypes

lib/src/sync/cursor.dart — Cursor.toJson() returns null when isNull is true (modified == null && name == null). For a doctype that has no rows on the server,
the initial pull's result.rows.isEmpty breaks immediately, scratch remains Cursor.empty, and markComplete() produces a cursor that still satisfies isNull.
PullEngine then calls if (cursorJson != null) metaDao.setLastOkCursor(...) — the null check silently skips the write. The cursor is never persisted, and on
every restart the doctype is re-fetched as a full initial pull instead of as an incremental delta.

Fix: Change toJson() to if (isNull && !complete) return null; so a completed-but-empty cursor is serialized as {"complete": true}.


BLOCKER B4 — FilterParser LIKE values are not wildcard-escaped

lib/src/query/filter_parser.dart — In the 'like' / 'not like' case, the value is passed directly as a SQLite parameter without escaping % and . SQLite's LIKE
treats _ as "any single character", so a filter for "PRJ_001" also matches "PRJ0001", "PRJx001", etc. Frappe server-side escapes these before passing to
MariaDB; offline queries diverge from online results. This also allows a caller to inject arbitrary %/
patterns via user-facing filter inputs.

Fix: Before binding the value, escape it — value.replaceAll('\', '\\').replaceAll('%', '\%').replaceAll('', '\') — and append ESCAPE '\' to the
generated SQL fragment.


BLOCKER B5 — FormScreen online delete orphans the local docs__ row

lib/src/ui/form_screen.dart — In the !offlineEnabled branch, widget.api!.document.deleteDocument(...) is called to remove the record server-side, but the
corresponding hardDeleteDocument call that existed in the prior version was removed. The local docs__ row is never cleaned up. The document reappears
in list screens on next app load and persists until the next incremental pull overwrites it.

Fix: After the successful API call in the !offlineEnabled branch, call await widget.repository.hardDeleteDocument(doctype: widget.meta.name, mobileUuid:
widget.document!.localId).


BLOCKER B6 — BaseFieldStyle showLabel / showDescription defaults silently changed to false

lib/src/ui/widgets/fields/base_field.dart — showLabel and showDescription were changed from true to false as constructor defaults. Any consumer app that
constructs BaseFieldStyle() without these parameters now renders all fields with no labels and no descriptions. This is a silent breaking change with no
changelog entry and no deprecation path.

Fix: Revert defaults to true to maintain backward compatibility. If false is the intended default for the new internal FrappeFormBuilder style, introduce a
named constructor — e.g. BaseFieldStyle.hiddenLabels() — instead of changing the primary default.


HIGH H1 — ThreeWayMerge._eq(null, null) returns false

lib/src/sync/three_way_merge.dart — static bool _eq(Object? a, Object? b) returns false when either argument is null because of if (a == null || b == null)
return false. When both base and ours have null for a field (the common case for optional fields never set by the user), localChanged evaluates to !_eq(null,
null) == true, so every null-valued field is treated as a local edit. On every push-with-merge, those fields overwrite whatever the server set for them — a
silent partial PUT that discards server updates to optional fields.

Fix: Change the guard to if (a == null && b == null) return true; if (a == null || b == null) return false;.


HIGH H2 — ResponseWriteback child-row fallback ignores rows-affected count

lib/src/sync/response_writeback.dart — When matching a child row by mobile_uuid returns 0 rows (updated == 0), the fallback txn.update(..., where: 'parent_uuid
= ? AND parentfield = ? AND idx = ?', ...) runs but its return value is discarded. If that fallback also matches 0 rows (child row not yet written, or idx
mismatch), the child's server_name and modified are never updated. On the next sync, UuidRewriter sees server_name == null and blocks the row as
BlockedByUpstream.

Fix: Capture the return value of the fallback update and log a warning (or throw to surface as a sync error) when it is 0.


HIGH H3 — AttachmentPipeline upload and markDone are not atomic

lib/src/sync/attachment_pipeline.dart — uploader(...) and dao.markDone(...) are in the same try block. If the upload succeeds but markDone throws (DB error,
constraint violation), the retry loop re-uploads the file. Frappe's attach_files_to_document hook creates a second File doctype record pointing to the same
binary — a duplicate attachment indistinguishable from the original.

Fix: After a successful upload, call markDone in a separate try/catch. If markDone fails, surface a distinct error rather than retrying the upload.


HIGH H4 — PermissionService.syncFromApi() no-throw contract removed without notice

lib/src/services/permission_service.dart — The prior implementation wrapped the API call in try/catch and returned null on any exception. The new implementation
removes that catch and lets exceptions propagate. Any call site in FrappeSDK or a consumer app that checks result == null for failure (the documented contract)
will now crash on NetworkException, TimeoutException, or any HTTP error. This contract change is not mentioned in the changelog.

Fix: Either restore the catch-and-return-null behavior, or document the new throwing contract explicitly and audit all call sites that relied on the null-return
pattern.


HIGH H5 — FrappeSDK login() can trigger concurrent _initialMetaAndDataSync calls

lib/src/sdk/frappe_sdk.dart — login() fires unawaited(_initialMetaAndDataSync()). The _initInFlight guard lives inside initialize(), not inside
_initialMetaAndDataSync() itself. Two rapid login() calls — or a login() racing an already-in-progress initialize() — produce two concurrent
_initialMetaAndDataSync() executions, both writing to doctype_meta and sdk_meta tables simultaneously without a write-order guarantee.

Fix: Lift the _initInFlight guard into _initialMetaAndDataSync() itself (or introduce a dedicated _metaSyncInFlight completer) so concurrent callers coalesce
onto one execution regardless of entry point.


HIGH H6 — MetaMigration backfill holds SQLite write lock for full duration

lib/src/services/meta_migration.dart — A backfill migration that iterates all rows and applies column updates in a loop does so inside a single write
transaction. For large datasets this holds the SQLite write lock for seconds, blocking any concurrent write from WriteQueue, PullApply, or the outbox. On first
launch after upgrade, all background sync operations queue and may timeout.

Fix: Batch the backfill in chunks of 200–500 rows with a commit between batches.


HIGH H7 — resolveConflict deletes the outbox row before applying the server version

lib/src/services/offline_repository.dart — resolveConflict removes the outbox conflict row first, then attempts to apply the server snapshot to the local docs__
table. If the apply step throws, the conflict row is gone with no recovery path — the document is stuck with pre-conflict local data, no outbox entry, and a
synced status the next pull will not revisit.

Fix: Reverse the order: apply the server snapshot first, then delete the outbox row only on success.


HIGH H8 — deleteDocument swallows child cascade failures

lib/src/services/offline_repository.dart — The child cascade loop logs 'child cascade delete failed' and continues. A partial delete (parent row removed, some
child rows remaining) is reported to the caller as success. Orphaned child rows with a dangling parent_uuid cause constraint violations or ghost rows on the
next pull.

Fix: Collect failures across all children; if any fail, either rethrow after attempting all (so the caller can surface the error) or re-insert the parent row
and fail the whole operation atomically.


HIGH H9 — outbox_row hard-casts nullable columns without null check

lib/src/database/daos/outbox_dao.dart — Column reads that use as String or as int on nullable database columns will throw a Dart cast exception on any row where
that column is NULL (e.g. newly inserted rows before server round-trip). This silently crashes the outbox drain for the affected row rather than surfacing a
recoverable error.

Fix: Use as String? (or ?? defaultValue) for all nullable column reads and handle null explicitly.


MEDIUM M1 — ConnectivityWatcher stream errors are unhandled

lib/src/concurrency/connectivity_watcher.dart — Errors from the connectivity plugin's stream (e.g. permission revoked on Android 13+) are not caught. An
unhandled stream error propagates downstream and surfaces as an unhandled exception in the isolate.

Fix: Add .handleError((e) { debugPrint('ConnectivityWatcher: $e'); }) on the stream transformation.


MEDIUM M2 — ConcurrencyPool._drain() in finally block can mask the original error

lib/src/concurrency/concurrency_pool.dart — If the awaited task throws and _drain() also throws inside finally, the drain error replaces the original error in
the stack trace.

Fix: Wrap _drain() inside finally in its own try/catch that logs the drain error separately without rethrowing.


MEDIUM M3 — FormBuilder dataForDepends evaluates against completeFormData, not live formValues

lib/src/ui/widgets/form_builder.dart — dataForDepends is built from completeFormData (assembled before the visibility pass), not from formValues (the live
user-entered map). For fields whose visibility depends on other fields that are themselves hidden, the evaluation uses a snapshot that may still include stale
values from hidden fields, causing sections to remain visible when they should be collapsed.

Fix: Use formValues as the basis for dataForDepends instead of completeFormData.


MEDIUM M4 — OfflineTransitionService._forceExited set after c.complete()

lib/src/services/offline_transition_service.dart — _forceExited = true is assigned after c.complete(). Dart's microtask scheduler fires the completer's
listeners before the assignment, creating a brief window where a re-entrant caller sees _forceExited == false and enters the transition again.

Fix: Set _forceExited = true before calling c.complete().


MEDIUM M5 — SessionUserService stream emits before DB write is confirmed

lib/src/services/session_user_service.dart — The service emits the new user on its stream before await db.update(...) has returned. A listener that immediately
reads from the DB may observe the pre-write value.

Fix: Await the DB write to completion before emitting on the stream.


MEDIUM M6 — Bare print() calls in production paths

lib/src/sdk/frappe_sdk.dart — Several print(...) statements appear in production code paths unconditionally, including in release builds. These may leak
internal state (doctype names, auth status, error messages) to crash reporters or log aggregators.

Fix: Replace with debugPrint(...) or gate with if (kDebugMode).


MEDIUM M7 — RunFn typedef accidentally exported from the public barrel

lib/frappe_mobile_sdk.dart — show SyncController, ConflictAction, DeleteCascadePlan, RunFn — RunFn is an internal implementation typedef. Exporting it widens
the public API surface with a symbol consumers cannot meaningfully use, and any future rename becomes a breaking change.

Fix: Remove RunFn from the show clause on that export line.


MEDIUM M8 — pubspec.yaml specifies meta: any

pubspec.yaml — A published library should constrain its dependencies. meta: any allows the resolver to pick any version including future majors, producing
version-skew conflicts for downstream consumers who pin their own meta.

Fix: Replace with a bounded constraint, e.g. meta: ">=1.9.0 <3.0.0".


DOC BLOCKER D1 — sdk.unifiedResolver does not exist (4 locations)

doc/release-2.0/breaking-changes.md, doc/release-2.0/whats-new.md, doc/OFFLINE_FIRST.md — The correct accessor on FrappeSDK is sdk.resolver (UnifiedResolver get
resolver). Every code sample calling sdk.unifiedResolver.resolve(...) throws NoSuchMethodError at runtime. There are 4 occurrences across these files.

Fix: Replace all 4 instances of sdk.unifiedResolver with sdk.resolver.


DOC BLOCKER D2 — whats-new.md sample calls createDocument (renamed to saveDocument)

doc/release-2.0/whats-new.md — The sample calls sdk.offlineRepository.createDocument(...). That method was renamed to saveDocument in this PR. The sample
produces a compile error as written.

Fix: Replace createDocument with saveDocument in the sample.


DOC BLOCKER D3 — cancelInitialSync() documented but not implemented

doc/OFFLINE_FIRST.md — await ctrl.cancelInitialSync() is shown as a public API. SyncController has no such method. A consumer following this sample gets a
compile error.

Fix: Either implement cancelInitialSync() or remove the section before release.


DOC BLOCKER D4 — MigrationBlockedScreen documented but not exported

doc/OFFLINE_FIRST.md — MigrationBlockedScreen is shown as a widget consumers embed. It is not exported from frappe_mobile_sdk.dart. Consumers cannot use it
without importing internal paths.

Fix: Add MigrationBlockedScreen to the barrel export, or remove the section from docs.


DOC BLOCKER D5 — OFFLINE_FIRST.md describes the dropped documents table as active

doc/OFFLINE_FIRST.md — "Two-store design — every save goes to the legacy JSON blob store (documents) and a normalized per-doctype docs__

table." The
documents table was dropped in the v2→v3 migration. This description is false for SDK 2.0 and will mislead integrators.

Fix: Remove the two-store description and replace it with the current single-store (docs__) design.


What's done well

  • SyncMutex while-loop re-check — the while (_busy != null) pattern correctly serializes concurrent waiters with no gap; confirmed across all prior rounds.
  • PullPageFetcher incremental cursor — complete: true is explicitly preserved in the incremental branch, so the stall guard fires correctly on every page.
  • UuidRewriter Dynamic Link — reading the sibling field for the doctype name string before rewriting the link value is correct; the sibling is never itself a
    UUID.
  • Cursor stall guard scope — guarding only cursor.complete == true pages is correct; initial offset-based pagination cannot stall.
  • UPDATE-then-INSERT-OR-IGNORE — correct SQLite <3.24 compatibility workaround for the absence of ON CONFLICT DO UPDATE.
  • AtomicWipe rethrow on primary .db deletion failure — correct; a failed primary DB deletion is not silently swallowed.
  • ThreeWayMerge deep equality for List/Map — the recursive _eq for containers correctly fixes identity-comparison false positives on decoded JSON.
  • kDebugMode gate on diagnostic prints — sync engine diagnostic output correctly gated.
  • OutboxDao savepoint pattern — using named SQLite savepoints inside an outer transaction for per-task rollback isolation is the right approach.

Verdict

Six blockers and five doc blockers must be resolved before this ships as a public SDK. The concurrent-guard gaps (B1, B2, H5) and merge
correctness issues (H1, B3) are highest priority — they affect data integrity for every consumer app.

@dhwani-ankit dhwani-ankit merged commit 5dc2a94 into develop May 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants