Releases · openzim/zimit · GitHub

03 Feb 08:09

benoit74

3.1.2 Latest

Latest

Changed

Upgrade to browsertrix crawler 1.11.2 (#542)

Assets 2

24 Jan 13:55

benoit74

3.1.1

Changed

Fix publishing of arm64 Docker image (#534, #538)
Workaround upstream issue with https://dl.yarnpkg.com/debian public key (#536)

Assets 2

21 Jan 20:08

benoit74

3.1.0

Added

Added --overwrite flag to overwrite existing ZIM file if it exists (#399)

Changed

Fix issues preventing interrupted crawls from being resumed. (#499)
- Ensure build directory is used explicitly instead of a randomized subdirectory when passed, and pre-create it if it does not exist.
- Use all warc_dirs found instead of just the latest so interrupted crawls use all collected pages across runs when an explicit collections directory is not passed.
- Don't cleanup an explicitly passed build directory.
Upgrade to browsertrix crawler 1.11.1 and Python 3.14 + small other Python dep / Github actions upgrades (#532)
Add back publishing of arm64 Docker image (#463)

Assets 2

11 Apr 07:28

benoit74

3.0.5

Changed

Upgrade to browsertrix crawler 1.6.0 (#493)

Assets 2

04 Apr 11:02

benoit74

3.0.4

Changed

Upgrade to browsertrix crawler 1.5.10 (#491)

Assets 2

28 Feb 06:29

benoit74

3.0.3

Changed

Upgrade to browsertrix crawler 1.5.7 (#483)

Assets 2

27 Feb 19:59

benoit74

3.0.2

Changed

Upgrade to browsertrix crawler 1.5.6 (#482)

Assets 2

24 Feb 09:38

benoit74

3.0.1

Changed

Upgrade to browsertrix crawler 1.5.4 (#476)

Assets 2

17 Feb 10:05

benoit74

3.0.0

Changed

Change solution to report partial ZIM to the Zimfarm and other clients (#304)
Keep temporary folder when crawler or warc2zim fails, even if not asked for (#468)
Add many missing Browsertrix Crawler arguments ; drop default overrides by zimit ; drop --noMobileDevice setting (not needed anymore) (#433)
Document all Browsertrix Crawler default arguments values (#416)
Use preferred Browsertrix Crawler arguments names: (part of #471)
- --seeds instead of --url
- --seedFile instead of --urlFile
- --pageLimit instead of --limit
- --pageLoadTimeout instead of --timeout
- --scopeIncludeRx instead of --include
- --scopeExcludeRx instead of --exclude
- --pageExtraDelay instead of --delay
Remove confusion between zimit, warc2zim and crawler stats filenames (part of #471)
- --statsFilename is now the crawler stats file (since it is the same name, just like other arguments)
- --zimit-progress-file is now the zimit stats location
- --warc2zim-progress-file is the warc2zim stats location
- all are optional values, if not set and needed temporary files are used

Fixed

Do not create the ZIM when crawl is incomplete (#444)

Assets 2

07 Feb 08:58

benoit74

2.1.8

Changed

Upgrade to browsertrix crawler 1.5.1, Python 3.13 and others (#462 + #464)

Assets 2