3.10.0
Download distribution zip (or tar.gz)
Full Changelog | Javadoc | Maven Central
New features
-
BrowserProcessor: Loads fetched pages in a local browser (Firefox/ChromeDriver), records all browser requests,
and runs pluggable behaviors (e.g. scrolling, link extraction). #653- Uses the WebDriver BiDi protocol for browser automation.
- The recording proxy is built on Jetty's ProxyHandler and the FetchHTTP2 module.
- Status: Working for small crawls but needs more robust error handling (browser crashes, resource limits).
-
Basic web auth: You can now switch the web interface from Digest authentication to Basic authentication with the
--web-auth basiccommand-line option. This is useful when running Heritrix behind a reverse proxy that adds external authentication. #654 -
Robots.txt wildcards: The
*and$wildcard rules from RFC 9309 are now supported. #656 -
FetchHTTP2: Added HTTP proxy support. #657
Fixes
-
Code editor: The configuration editor and script console were upgraded to CodeMirror 6. This resolves some browser incompatibilities, allowing CodeMirror’s own find function to be re-enabled for reliable text search of content far outside the viewport. #651
-
BDB shutdown interrupt handling: The thread’s interrupted flag is now cleared before some BDB interactions to reduce the likelihood of environment invalidation when requestCrawlStop() is called repeatedly. #659
-
FetchHTTP2: Fixed gzip alert log messages by configuring HttpClient to not decode gzip encoding from response.
Removals
-
Removed Apache HttpClient 3: If you have custom Heritrix modules you may need to update the following
class references in your code:Removed Replacement org.apache.commons.httpclient.URIExceptionorg.archive.url.URIExceptionorg.apache.commons.httpclient.Headerorg.archive.format.http.HttpHeaderNote that Apache HttpClient 4 (
org.apache.http) was not removed. #652
Dependency Upgrades
- codemirror: 2.23 → 6
- easymock: 5.5.0 → removed
- groovy: 4.0.26 → 4.0.27
- junit: 5.12.2 → 5.13.1
- kafka-clients: 3.9.0 → 3.9.1
- spring: 6.2.6 → 6.2.7
- webarchive-commons: 1.3.0 → 2.0.1