Skip to content

Releases: openzim/zimit

3.1.2

03 Feb 08:09
d94f487

Choose a tag to compare

Changed

  • Upgrade to browsertrix crawler 1.11.2 (#542)

3.1.1

24 Jan 13:55
52a992b

Choose a tag to compare

Changed

3.1.0

21 Jan 20:08
f4cd7dc

Choose a tag to compare

Added

  • Added --overwrite flag to overwrite existing ZIM file if it exists (#399)

Changed

  • Fix issues preventing interrupted crawls from being resumed. (#499)
    • Ensure build directory is used explicitly instead of a randomized subdirectory when passed, and pre-create it if it does not exist.
    • Use all warc_dirs found instead of just the latest so interrupted crawls use all collected pages across runs when an explicit collections directory is not passed.
    • Don't cleanup an explicitly passed build directory.
  • Upgrade to browsertrix crawler 1.11.1 and Python 3.14 + small other Python dep / Github actions upgrades (#532)
  • Add back publishing of arm64 Docker image (#463)

3.0.5

11 Apr 07:28
009b8b4

Choose a tag to compare

Changed

  • Upgrade to browsertrix crawler 1.6.0 (#493)

3.0.4

04 Apr 11:02
12fde3a

Choose a tag to compare

Changed

  • Upgrade to browsertrix crawler 1.5.10 (#491)

3.0.3

28 Feb 06:29
1e6748a

Choose a tag to compare

Changed

  • Upgrade to browsertrix crawler 1.5.7 (#483)

3.0.2

27 Feb 19:59
6ee053a

Choose a tag to compare

Changed

  • Upgrade to browsertrix crawler 1.5.6 (#482)

3.0.1

24 Feb 09:38
dd65902

Choose a tag to compare

Changed

  • Upgrade to browsertrix crawler 1.5.4 (#476)

3.0.0

17 Feb 10:05
e3cd12b

Choose a tag to compare

Changed

  • Change solution to report partial ZIM to the Zimfarm and other clients (#304)
  • Keep temporary folder when crawler or warc2zim fails, even if not asked for (#468)
  • Add many missing Browsertrix Crawler arguments ; drop default overrides by zimit ; drop --noMobileDevice setting (not needed anymore) (#433)
  • Document all Browsertrix Crawler default arguments values (#416)
  • Use preferred Browsertrix Crawler arguments names: (part of #471)
    • --seeds instead of --url
    • --seedFile instead of --urlFile
    • --pageLimit instead of --limit
    • --pageLoadTimeout instead of --timeout
    • --scopeIncludeRx instead of --include
    • --scopeExcludeRx instead of --exclude
    • --pageExtraDelay instead of --delay
  • Remove confusion between zimit, warc2zim and crawler stats filenames (part of #471)
    • --statsFilename is now the crawler stats file (since it is the same name, just like other arguments)
    • --zimit-progress-file is now the zimit stats location
    • --warc2zim-progress-file is the warc2zim stats location
    • all are optional values, if not set and needed temporary files are used

Fixed

  • Do not create the ZIM when crawl is incomplete (#444)

2.1.8

07 Feb 08:58
d228e9f

Choose a tag to compare

Changed

  • Upgrade to browsertrix crawler 1.5.1, Python 3.13 and others (#462 + #464)