Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttling when parsing multiple sitemaps #77

Open
bsq-panagiotis opened this issue Feb 4, 2021 · 0 comments
Open

Throttling when parsing multiple sitemaps #77

bsq-panagiotis opened this issue Feb 4, 2021 · 0 comments

Comments

@bsq-panagiotis
Copy link
Contributor

Is there an option available to add an artificial delay (throttling) between requests to avoid getting blocked by firewalls?
I couldn't find any mention of this feature in the documentation.

bsq-panagiotis added a commit to bsq-panagiotis/sitemapper that referenced this issue Feb 10, 2021
# New features added

* Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object

* Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. seantomburke#77

* Added an option to have retry requests upon failure and to set the number of maximum retries per crawl.

# Documentation changes

* Updated documentation to include all the new features described above.

Co-Authored-By: Panagiotis Tzamtzis <panagiotis@baresquare.com>
Co-Authored-By: PanagiotisTzamtzis <panagiotis@tzamtzis.gr>
seantomburke pushed a commit to bsq-panagiotis/sitemapper that referenced this issue Nov 6, 2021
# New features added

* Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object

* Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. seantomburke#77

* Added an option to have retry requests upon failure and to set the number of maximum retries per crawl.

# Documentation changes

* Updated documentation to include all the new features described above.

Co-Authored-By: Panagiotis Tzamtzis <panagiotis@baresquare.com>
Co-Authored-By: PanagiotisTzamtzis <panagiotis@tzamtzis.gr>
seantomburke pushed a commit that referenced this issue Nov 11, 2021
# New features added

* Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object

* Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. #77

* Added an option to have retry requests upon failure and to set the number of maximum retries per crawl.

# Documentation changes

* Updated documentation to include all the new features described above.

Co-Authored-By: Panagiotis Tzamtzis <panagiotis@baresquare.com>
Co-Authored-By: PanagiotisTzamtzis <panagiotis@tzamtzis.gr>
seantomburke added a commit that referenced this issue Nov 11, 2021
* New features & updated documentation

# New features added

* Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object

* Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. #77

* Added an option to have retry requests upon failure and to set the number of maximum retries per crawl.

# Documentation changes

* Updated documentation to include all the new features described above.

Co-Authored-By: Panagiotis Tzamtzis <panagiotis@baresquare.com>
Co-Authored-By: PanagiotisTzamtzis <panagiotis@tzamtzis.gr>

* Fix for error on the main sitemap

In this case the errors object in the results was not an ErrorsDataArray but a single ErrorsData

* Bug fixes

* Error logging improvements with more details for `UnknownStateErrors` & errors when parsing the parent sitemap

* Retries option was not working when `debug` was set to false

* Bug fix

* Console.log statement was getting triggered when `debug` option was set to false

* Update src/examples/index.js

* 3.2.0

* Cleaning up, changing error to errors, updating Typescript, removing returnErrors option

* Removing returnErrors option

* quotes fix

* Updates

* Fixing errors array

* updating tests

Co-authored-by: PanagiotisTzamtzis <panagiotis@tzamtzis.gr>
Co-authored-by: Sean Thomas Burke <965298+seantomburke@users.noreply.github.com>
Co-authored-by: Sean Thomas Burke <seantomburke@users.noreply.github.com>
seantomburke added a commit that referenced this issue Dec 24, 2021
* New features & updated documentation

* Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object

* Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. #77

* Added an option to have retry requests upon failure and to set the number of maximum retries per crawl.

* Updated documentation to include all the new features described above.

Co-Authored-By: Panagiotis Tzamtzis <panagiotis@baresquare.com>
Co-Authored-By: PanagiotisTzamtzis <panagiotis@tzamtzis.gr>

* Fix for error on the main sitemap

In this case the errors object in the results was not an ErrorsDataArray but a single ErrorsData

* Bug fixes

* Error logging improvements with more details for `UnknownStateErrors` & errors when parsing the parent sitemap

* Retries option was not working when `debug` was set to false

* Bug fix

* Console.log statement was getting triggered when `debug` option was set to false

* Update src/examples/index.js

* 3.2.0

* Cleaning up, changing error to errors, updating Typescript, removing returnErrors option

* Removing returnErrors option

* quotes fix

* Updates

* Fixing errors array

* updating tests

Co-authored-by: PanagiotisTzamtzis <panagiotis@tzamtzis.gr>
Co-authored-by: Sean Thomas Burke <965298+seantomburke@users.noreply.github.com>
Co-authored-by: Sean Thomas Burke <seantomburke@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant