Skip to content

Fix data-loss and silent-staleness bugs in download, sync filtering, and caching#150

Merged
D-VR merged 5 commits into
masterfrom
fix-download-overwrite-bugs
Jun 30, 2026
Merged

Fix data-loss and silent-staleness bugs in download, sync filtering, and caching#150
D-VR merged 5 commits into
masterfrom
fix-download-overwrite-bugs

Conversation

@D-VR

@D-VR D-VR commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator
  • Failed update emptied the file - rename conflict mode moved the local file aside before downloading; a failed download (e.g. expired session -> HTML error page) left nothing. Now downloads to temp first, displaces only on success.
  • Resume corruption - blind Range could splice a new version onto an old partial, or clobber a user's .temp file. Now uses If-Range + hidden temp files.
  • ETag-compare failure overwrote local edits - now falls back to the timestamp/HEAD heuristic instead of overwriting.
  • Stale files never retried - failed downloads were cached optimistically, the cache now reflects what's actually on disk.
  • Sciebo files never re-downloaded - None == None skip fired before the etag check, now uses etag-based change detection.
  • Substring course matching - selecting/skipping course 12 also hit 1 and 2, now matches the parsed id exactly, and selected_courses overrides skip_courses.
  • Name-clash collision - same-named direct-link files hashed md5("None") and dropped one; now fall back to the URL.

Plus login/SSO/TOTP/wstoken hardening against unexpected page structure.

D-VR added 4 commits June 30, 2026 21:10
Files added with name_clash_id=None (direct external links, direct Moodle
content files, and embedded videojs nodes) all hashed md5("None") when
disambiguating same-named siblings, so two distinct same-named files
resolved to one path and one was silently lost.

Add Node._clash_suffix(), which falls back to the node's URL when
name_clash_id is None. At the rename branch the siblings differ precisely
because their URLs differ, so the URL is a present, distinct key.
…ding skip

Course filtering matched configured URLs with `str(course_id) in url`, a
substring test: selecting course 12 also pulled in courses 1 and 2, and
skipping 12 silently dropped 1 and 2. Add _course_id_in_filter(), which
compares the parsed `id` query parameter (or a bare numeric entry) exactly.

Also make selected_courses a true allowlist that overrides skip_courses
(and only_sync_semester), as documented; previously skip_courses was applied
first and could not be overridden. Harden shortname/idnumber access so a
missing or empty field no longer aborts the sync or creates an empty-named
semester folder.
The SSO/TOTP login and wstoken parsing assumed Moodle's exact HTML, so a
changed or error page raised AttributeError/TypeError instead of the
intended clean exit. Null-check the sesskey <script> and the maintenance
<body>; route every login form-field extraction (csrf_token, RelayState,
SAMLResponse) through a require_input_value helper that exits with a clear
message when a field is missing; and parse the mobile-launch token
defensively (isolate the token value, guard base64 decode and the ":::"
split). Also use sys.exit consistently.
…e caches

Rework download_file and the per-course cache so an update never destroys or
silently abandons local data:

- Conflict "rename" mode now downloads the replacement to a temp file first and
  only moves the local file aside on success, so a failed/aborted download (an
  expired session returning an HTML error page) never empties the canonical
  path.
- Resume partial downloads only with a validating If-Range using a recorded
  etag sidecar, and use hidden, namespaced temp files, so a blind range request
  can no longer splice a newer version onto an old partial or clobber a user's
  own *.temp file.
- A failed ETag comparison is treated as "no cached etag" (fall back to the
  timestamp/HEAD heuristic) instead of silently overwriting the local file.
- Check filetype/name exclusions before any conflict handling so excluded files
  are never displaced.
- Trust cached version markers only when the previous run actually downloaded
  the file, and have cache_root_node preserve the on-disk version's markers for
  files that were not fetched this run, so a failed update is retried cleanly
  next run instead of being skipped forever or moved aside as a spurious
  conflict.
- Skip on matching timemodified only when it is meaningful, so Sciebo files
  (which have no timemodified) fall through to etag-based change detection and
  are actually re-downloaded when they change.
@D-VR D-VR added the bug Something isn't working label Jun 30, 2026
@D-VR D-VR merged commit d172630 into master Jun 30, 2026
4 checks passed
@D-VR D-VR deleted the fix-download-overwrite-bugs branch June 30, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant