Skip to content

Releases: apache/stormcrawler

What's new in StormCrawler 2.6

28 Nov 10:19
Compare
Choose a tag to compare

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

Highlights

Full Changelog: storm-crawler-2.5...2.6

What's new in Stormcrawler 2.5

31 Aug 13:28
Compare
Choose a tag to compare

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

In a nutshell

  • various dependency upgrades (JSoup, CrawlerCommons, Tika, Elasticsearch)
  • Java 11
  • bugfix AggregationSpout does not release IsInQuery boolean sometimes
  • various improvements to URLFrontier module

In more details

  • FEATURE-964: custom crawl delay per page by @juli-alvarez in #967
  • Issue 970 HttpProtocol doesn't consider http.content.limit in test for filesize by @wowasa in #972
  • Add ChannelManager for local channel management and constants to Spout.java by @FelixEngl in #982
  • Fix error when spaces in path to test-resources of StatusBoltTest in ElasticSearch-Module by @FelixEngl in #985
  • Add unit test basics for URLFrontier. by @FelixEngl in #984
  • Fix starvation and busy waiting of StatusUpdaterBolt.java, add Constants. by @FelixEngl in #983
  • Fix starvation and busy waiting of ES StatusUpdaterBolt (Fixes #986) by @FelixEngl in #988
  • Fix starvation and busy waiting of ES IndexerBolt by @FelixEngl in #989
  • HttpProtocol use the md protocol.set-headers to add custom header by url by @Mikwiss in #993

New Contributors

Full Changelog: 2.4...storm-crawler-2.5

StormCrawler 2.4

13 Apr 10:46
Compare
Choose a tag to compare

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

Upgrade to Apache Storm 2.4
Upgrade to Elasticsearch 7.17.2
bugfix Setting "maxDepth": 0 in urlfilter.json prevents ES seed injection #959
Allow compatibility.mode for rest client to connect to ES8+ #962

Full Changelog: 2.3...2.4

StormCrawler 2.3

21 Mar 15:17
Compare
Choose a tag to compare

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

https://digitalpebble.blogspot.com/2022/03/whats-new-in-stormcrawler-23.html

What's Changed

  • Bump xercesImpl from 2.12.1 to 2.12.2 in /core by @dependabot in #942
  • General Code Refactoring and Good Practices by @FelixEngl in #937
  • Add unified way of initializing classes via string and configuring them. by @FelixEngl in #943
  • Rewrote LinkParseFUlter + added XPathFilter + tests for JSOUPFilters by @jnioche in #953
  • ISSUE-954: Issue with the order of emit and emitOutlink for redirections in FetcherBolt by @juli-alvarez in #955

New Contributors

Full Changelog: 2.2...2.3

2.2

11 Jan 17:30
Compare
Choose a tag to compare
2.2

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

https://digitalpebble.blogspot.com/2022/01/whats-new-in-stormcrawler-22.html

2.1

05 May 14:04
Compare
Choose a tag to compare
2.1
[maven-release-plugin] copy for tag 2.1

StormCrawler 1.18

05 May 10:36
Compare
Choose a tag to compare

StormCrawler 2.0

20 Jul 15:33
Compare
Choose a tag to compare
[maven-release-plugin] copy for tag 2.0

1.17

20 Jul 14:24
Compare
Choose a tag to compare
storm-crawler-1.17

[maven-release-plugin] copy for tag storm-crawler-1.17

1.16

14 Jan 21:09
Compare
Choose a tag to compare