Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,21 @@ updates:
directory: /
schedule:
interval: weekly
cooldown:
default-days: 12
ignore:
# Jetty 9.x needed for JDK8 compatibility; it still receives security updates. Only used in tests.
- dependency-name: "org.eclipse.jetty:jetty-server"
update-types: ["version-update:semver-major"]
- dependency-name: "org.eclipse.jetty:jetty-servlet"
update-types: ["version-update:semver-major"]
# Et tu, junit? Keep us on 5, as 6 has min JDK17 - https://docs.junit.org/6.0.0-RC3/release-notes/#release-notes-6.0.0-M1
- dependency-name: "org.junit.jupiter:junit-jupiter"
update-types: ["version-update:semver-major"]

- package-ecosystem: github-actions
directory: /
schedule:
interval: weekly
cooldown:
default-days: 12
6 changes: 3 additions & 3 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ jobs:
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
# choosing to run a reduced set of LTS, current, and next, to balance coverage and execution time
java: [8, 17, 21]
java: [8, 17, 25]
fail-fast: false
name: Test JDK ${{ matrix.java }}, ${{ matrix.os }}
steps:
- name: Checkout
uses: actions/checkout@v4
uses: actions/checkout@v6

- name: Set up JDK ${{ matrix.java }}
uses: actions/setup-java@v4
uses: actions/setup-java@v5
with:
java-version: ${{ matrix.java }}
distribution: 'zulu'
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/cifuzz.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
dry-run: false
language: jvm
- name: Upload Crash
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v6
if: failure() && steps.build.outcome == 'success'
with:
name: artifacts
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,19 @@ jobs:
name: "CodeQL"
steps:
- name: Checkout
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Set up JDK
uses: actions/setup-java@v4
uses: actions/setup-java@v5
with:
java-version: 17
distribution: 'temurin'
cache: 'maven'
- name: CodeQL Initialization
uses: github/codeql-action/init@v3
uses: github/codeql-action/init@v4
with:
languages: java
queries: +security-and-quality
- name: Autobuild
uses: github/codeql-action/autobuild@v3
uses: github/codeql-action/autobuild@v4
- name: CodeQL Analysis
uses: github/codeql-action/analyze@v3
uses: github/codeql-action/analyze@v4
51 changes: 50 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,69 @@
# jsoup Changelog

## 1.21.2 (PENDING)
## 1.22.2 (PENDING)

### Bug Fixes
* Android (R8/ProGuard): added a rule to ignore the optional `re2j` dependency when not present. [#2459](https://github.com/jhy/jsoup/issues/2459)

## 1.22.1 (2026-Jan-01)

### Improvements
* Added support for using the `re2j` regular expression engine for regex-based CSS selectors (e.g. `[attr~=regex]`, `:matches(regex)`), which ensures linear-time performance for regex evaluation. This allows safer handling of arbitrary user-supplied query regexes. To enable, add the `com.google.re2j` dependency to your classpath, e.g.:
```xml
<dependency>
<groupId>com.google.re2j</groupId>
<artifactId>re2j</artifactId>
<version>1.8</version>
</dependency>
```
(If you already have that dependency in your classpath, but you want to keep using the Java regex engine, you can disable re2j via `System.setProperty("jsoup.useRe2j", "false")`.) You can confirm that the re2j engine has been enabled correctly by calling `org.jsoup.helper.Regex.usingRe2j()`. [#2407](https://github.com/jhy/jsoup/pull/2407)

* Added an instance method `Parser#unescape(String, boolean)` that unescapes HTML entities using the parser's configuration (e.g. to support error tracking), complementing the existing static utility `Parser.unescapeEntities(String, boolean)`. [#2396](https://github.com/jhy/jsoup/pull/2396)
* Added a configurable maximum parser depth (to limit the number of open elements on stack) to both HTML and XML parsers. The HTML parser now defaults to a depth of 512 to match browser behavior, and protect against unbounded stack growth, while the XML parser keeps unlimited depth by default, but can opt into a limit via `org.jsoup.parser.Parser#setMaxDepth`. [#2421](https://github.com/jhy/jsoup/issues/2421)
* Build: added CI coverage for JDK 25 [#2403](https://github.com/jhy/jsoup/pull/2403)
* Build: added a CI fuzzer for contextual fragment parsing (in addition to existing full body HTML and XML fuzzers). [oss-fuzz #14041](https://github.com/google/oss-fuzz/pull/14041)

### Changes
* Set a removal schedule of jsoup 1.24.1 for previously deprecated APIs.

### Bug Fixes
* Previously cached child `Elements` of an `Element` were not correctly invalidated in `Node#replaceWith(Node)`, which could lead to incorrect results when subsequently calling `Element#children()`. [#2391](https://github.com/jhy/jsoup/issues/2391)
* Attribute selector values are now compared literally without trimming. Previously, jsoup trimmed whitespace from selector values and from element attribute values, which could cause mismatches with browser behavior (e.g. `[attr=" foo "]`). Now matches align with the CSS specification and browser engines. [#2380](https://github.com/jhy/jsoup/issues/2380)
* When using the JDK HttpClient, any system default proxy (`ProxySelector.getDefault()`) was ignored. Now, the system proxy is used if a per-request proxy is not set. [#2388](https://github.com/jhy/jsoup/issues/2388), [#2390](https://github.com/jhy/jsoup/pull/2390)
* A `ValidationException` could be thrown in the adoption agency algorithm with particularly broken input. Now logged as a parse error. [#2393](https://github.com/jhy/jsoup/issues/2393)
* Null characters in the HTML body were not consistently removed; and in foreign content were not correctly replaced. [#2395](https://github.com/jhy/jsoup/issues/2395)
* An `IndexOutOfBoundsException` could be thrown when parsing a body fragment with crafted input. Now logged as a parse error. [#2397](https://github.com/jhy/jsoup/issues/2397), [#2406](https://github.com/jhy/jsoup/issues/2406)
* When using StructuralEvaluators (e.g., a `parent child` selector) across many retained threads, their memoized results could also be retained, increasing memory use. These results are now cleared immediately after use, reducing overall memory consumption. [#2411](https://github.com/jhy/jsoup/issues/2411)
* Cloning a `Parser` now preserves any custom `TagSet` applied to the parser. [#2422](https://github.com/jhy/jsoup/issues/2422), [#2423](https://github.com/jhy/jsoup/pull/2423)
* Custom tags marked as `Tag.Void` now parse and serialize like the built-in void elements: they no longer consume following content, and the XML serializer emits the expected self-closing form. [#2425](https://github.com/jhy/jsoup/issues/2425)
* The `<br>` element is once again classified as an inline tag (`Tag.isBlock() == false`), matching common developer expectations and its role as phrasing content in HTML, while pretty-printing and text extraction continue to treat it as a line break in the rendered output. [#2387](https://github.com/jhy/jsoup/issues/2387), [#2439](https://github.com/jhy/jsoup/issues/2439)
* Fixed an intermittent truncation issue when fetching and parsing remote documents via `Jsoup.connect(url).get()`. On responses without a charset header, the initial charset sniff could sometimes (depending on buffering / `available()` behavior) be mistaken for end-of-stream and a partial parse reused, dropping trailing content. [#2448](https://github.com/jhy/jsoup/issues/2448)
* `TagSet` copies no longer mutate their template during lazy lookups, preventing cross-thread `ConcurrentModificationException` when parsing with shared sessions. [#2453](https://github.com/jhy/jsoup/pull/2453)
* Fixed parsing of `<svg>` `foreignObject` content nested within a `<p>`, which could incorrectly move the HTML subtree outside the SVG. [#2452](https://github.com/jhy/jsoup/issues/2452)

### Internal Changes
* Deprecated internal helper `org.jsoup.internal.Functions` (for removal in v1.23.1). This was previously used to support older Android API levels without full `java.util.function` coverage; jsoup now requires core library desugaring so this indirection is no longer necessary. [#2412](https://github.com/jhy/jsoup/pull/2412)

## 1.21.2 (2025-Aug-25)

### Changes
* Deprecated internal (yet visible) methods `Normalizer#normalize(String, bool)` and `Attribute#shouldCollapseAttribute(Document.OutputSettings)`. These will be removed in a future version.
* Deprecated `Connection#sslSocketFactory(SSLSocketFactory)` in favor of the new `Connection#sslContext(SSLContext)`. Using `sslSocketFactory` will force the use of the legacy `HttpUrlConnection` implementation, which does not support HTTP/2. [#2370](https://github.com/jhy/jsoup/pull/2370)

### Improvements
* When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. [#2349](https://github.com/jhy/jsoup/pull/2349).
* Updated `Connection.Response#statusMessage()` to return a simple loggable string message (e.g. "OK") when using the `HttpClient` implementation, which doesn't otherwise return any server-set status message. [#2356](https://github.com/jhy/jsoup/issues/2346)
* `Attributes#size()` and `Attributes#isEmpty()` now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. [#2369](https://github.com/jhy/jsoup/pull/2369)
* Added `Connection#sslContext(SSLContext)` to provide a custom SSL (TLS) context to requests, supporting both the `HttpClient` and the legacy `HttUrlConnection` implementations. [#2370](https://github.com/jhy/jsoup/pull/2370)
* Performance optimizations for DOM manipulation methods including when repeatedly removing an element's first child (`element.child(0).remove()`, and when using `Parser#parseBodyFragement()` to parse a large number of direct children. [#2373](https://github.com/jhy/jsoup/pull/2373).

### Bug Fixes
* When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. [#2353](https://github.com/jhy/jsoup/issues/2353).
* In `NodeTraversor`, if a last child element was removed during the `head()` call, the parent would be visited twice. [#2355](https://github.com/jhy/jsoup/issues/2355).
* Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for `Attributes#size()` and `Attributes#isEmpty()`. [#2356](https://github.com/jhy/jsoup/issues/2356)
* In a multithreaded application where multiple threads are calling `Element#children()` on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. [#2366](https://github.com/jhy/jsoup/issues/2366)
* When parsing HTML with svg:script elements in SVG elements, don't enter the Text insertion mode, but continue to parse as foreign content. Otherwise, misnested HTML could then cause an IndexOutOfBoundsException. [#2374](https://github.com/jhy/jsoup/issues/2374)
* Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. [#2377](https://github.com/jhy/jsoup/pull/2377).

## 1.21.1 (2025-Jun-23)

Expand Down
6 changes: 6 additions & 0 deletions jitpack.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
before_install:
- sdk install java 21.0.2-open
- sdk use java 21.0.2-open
- sdk install maven
install:
- mvn clean install -Djapicmp.skip=true -DskipTests
Loading
Loading