Skip to content

Releases: GateNLP/ultimate-sitemap-parser

1.8.0

25 Jan 12:36
1.8.0
182f464

Choose a tag to compare

New Features

  • Added optional normalize_homepage_url parameter to sitemap_tree_for_homepage to optionally allow homepage normalization to be disabled (#130 by @c00k1ez)

1.7.0.post2

20 Jan 11:46
1.7.0.post2
aecfd80

Choose a tag to compare

  • Corrected missing NOTICE file and trove classifiers in the v1.7.0 release package, as a result of moving to the uv build system

1.7.0.post1

17 Jan 10:38
1.7.0.post1
2666ddd

Choose a tag to compare

  • Corrected missing LICENSE file in the v1.7.0 release package

1.7.0

11 Jan 11:42
1.7.0
660768a

Choose a tag to compare

Packaging

  • USP now includes a py.typed file to indicate its type information should be used when checking (#115 by @aran-martin)

Dependencies

  • Dropped support for Python 3.9

1.6.0

10 Sep 08:28
1.6.0
79634e0

Choose a tag to compare

New Features

  • Added recurse_callback and recurse_list_callback parameters to usp.tree.sitemap_tree_for_homepage to filter which sub-sitemaps are recursed into (#106 by @nicolas-popsize)

Bug Fixes

  • If a FileNotFoundError is encountered when cleaning up a sitemap page temporary file, it will now be caught and logged as a warning. (#108)
    • This resolves an error which we believe only occurs on Windows in complex environments (e.g. when running the full Pytest suite)

1.5.0

11 Aug 10:54
1.5.0
e61158e

Choose a tag to compare

Bug Fixes

  • Set different timeouts for HTTP request connection and read to lower maximum request length. Instead of 60s for each, it is now 9.05s for connection and 60s for read. (#95)

1.4.0

23 Apr 10:51
1.4.0
51d9479

Choose a tag to compare

New Features

  • Support parsing sitemaps when a proper XML namespace is not declared (#87)

Bug Fixes

  • Fix incorrect logic in gunzip behaviour which attempted to gunzip responses that were already gunzipped by requests (#89)
  • Change log output for gunzip failures to include the URL instead of request response object (#89)

1.3.1

31 Mar 15:30
1.3.1
b6cee1a

Choose a tag to compare

Bug Fixes

  • Fixed an issue with temporary file handling, which would cause USP to always crash on Windows (#84)

1.3.0

17 Mar 10:37
1.3.0
3eda963

Choose a tag to compare

This release drops support for Python 3.8. The minimum supported version is now Python 3.9.

New Features

  • Recursive sitemaps are detected and will return an InvalidSitemap instead (#74)
  • Known sitemap paths will be skipped if they redirect to a sitemap already found (#77)
  • The reported URL of a sitemap will now be its actual URL after redirects (#74)
  • Log level in CLI can now be changed with the -v or -vv flags, and output to a file with -l (#76)
  • When fetching known sitemap paths, 404 errors are now logged at a lower level (#78)

Bug Fixes

  • Some logging at INFO level has been changed to DEBUG (#76)

API Changes

  • Added AbstractWebClient.url() method to return the actual URL fetched after redirects. Custom web clients will need to implement this method.

1.2.0

18 Feb 10:25
1.2.0
761df8a

Choose a tag to compare

New Features

  • Support passing additional known sitemap paths to usp.tree.sitemap_tree_for_homepage (#69)
  • The requests web client now creates a session object for better performance, which can be overridden by the user (#70)

Documentation

  • Added improved documentation for customising the HTTP client.