Releases: apache/stormcrawler
What's new in StormCrawler 2.6
Disclaimer
This is a Pre-ASF release and did not undergo a formal review by the PMC.
Highlights
- Using URLFrontier in archetype
- URLFilter becomes an abstract class
- Fixed deactivation of maxDepthFilter
- JSoupParserBolt improve performance of link extraction
- Multiple dependency upgrades
Full Changelog: storm-crawler-2.5...2.6
What's new in Stormcrawler 2.5
Disclaimer
This is a Pre-ASF release and did not undergo a formal review by the PMC.
In a nutshell
- various dependency upgrades (JSoup, CrawlerCommons, Tika, Elasticsearch)
- Java 11
- bugfix AggregationSpout does not release IsInQuery boolean sometimes
- various improvements to URLFrontier module
In more details
- FEATURE-964: custom crawl delay per page by @juli-alvarez in #967
- Issue 970 HttpProtocol doesn't consider http.content.limit in test for filesize by @wowasa in #972
- Add ChannelManager for local channel management and constants to Spout.java by @FelixEngl in #982
- Fix error when spaces in path to test-resources of StatusBoltTest in ElasticSearch-Module by @FelixEngl in #985
- Add unit test basics for URLFrontier. by @FelixEngl in #984
- Fix starvation and busy waiting of StatusUpdaterBolt.java, add Constants. by @FelixEngl in #983
- Fix starvation and busy waiting of ES StatusUpdaterBolt (Fixes #986) by @FelixEngl in #988
- Fix starvation and busy waiting of ES IndexerBolt by @FelixEngl in #989
- HttpProtocol use the md protocol.set-headers to add custom header by url by @Mikwiss in #993
New Contributors
Full Changelog: 2.4...storm-crawler-2.5
StormCrawler 2.4
Disclaimer
This is a Pre-ASF release and did not undergo a formal review by the PMC.
Upgrade to Apache Storm 2.4
Upgrade to Elasticsearch 7.17.2
bugfix Setting "maxDepth": 0 in urlfilter.json prevents ES seed injection #959
Allow compatibility.mode for rest client to connect to ES8+ #962
Full Changelog: 2.3...2.4
StormCrawler 2.3
Disclaimer
This is a Pre-ASF release and did not undergo a formal review by the PMC.
https://digitalpebble.blogspot.com/2022/03/whats-new-in-stormcrawler-23.html
What's Changed
- Bump xercesImpl from 2.12.1 to 2.12.2 in /core by @dependabot in #942
- General Code Refactoring and Good Practices by @FelixEngl in #937
- Add unified way of initializing classes via string and configuring them. by @FelixEngl in #943
- Rewrote LinkParseFUlter + added XPathFilter + tests for JSOUPFilters by @jnioche in #953
- ISSUE-954: Issue with the order of emit and emitOutlink for redirections in FetcherBolt by @juli-alvarez in #955
New Contributors
- @FelixEngl made their first contribution in #937
Full Changelog: 2.2...2.3
2.2
Disclaimer
This is a Pre-ASF release and did not undergo a formal review by the PMC.
https://digitalpebble.blogspot.com/2022/01/whats-new-in-stormcrawler-22.html
2.1
StormCrawler 1.18
StormCrawler 2.0
[maven-release-plugin] copy for tag 2.0
1.17
storm-crawler-1.17 [maven-release-plugin] copy for tag storm-crawler-1.17