Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape action should not be performed during startup when scan interval set to 0 #359

Open
Paul-Vdp opened this issue Apr 18, 2024 · 6 comments

Comments

@Paul-Vdp
Copy link

Title says it all.
My scrape sensor relies on 'params' for some parts of the resource. Because these params have not (yet) been initialized on startup, the (unwanted and unneeded !) scrape results in errors.

@danieldotnl
Copy link
Owner

What should the value of the sensors be on startup in this case?

@Paul-Vdp
Copy link
Author

To be honest : don't care, because I don't need nor use them at that moment - that's why their scan interval is set to 0 in the first place.
Or maybe a more reasonable and acceptable answer would be : restored from their previous value, as with most other sensors ?
As the running of the scrape on startup has got nothing to do with any need of these sensors to be refreshed/updated ...

@SeanPM5
Copy link

SeanPM5 commented Apr 29, 2024

Would also like this. I use the resource_template: option in Multiscrape to form some URL's using an attribute from another integration. But because Multiscrape loads faster (and attempts to scrape) before that other integration loads and has a sensor value, the template renders a broken URL that results in a bunch of 404 and 500 errors every time on startup.

IMO keep the default behavior as-is but introduce a new optional boolean like scrape_on_startup: false and that way it can work regardless of users scan interval. Sensor state could be unknown so user knows that Multiscrape integration is loaded but just didn't perform scrape yet.

@Paul-Vdp
Copy link
Author

Glad somebody agrees with my point.
Although I beg to differ with the suggestions, and stand behind my own, because :

  1. setting scan-interval to 0 clearly is meant to indicate that one wants to perform the scraping on one's own tempo, if and when needed, under the sole control of the user and his automations. And therefore should NOT be 'externally' forced at startup.
    Any other interpretation does not make sense and therefore I see no need for an additional setting.
  2. the same reasoning goes for the sensor values on startup. Restarting Hass is not in any way an objective reason to change the values of these sensors from their previous state - which therefore should be just retained. Or why would they have to be treated differently than e.g. the state of a light, or the state of a tempature sensor, etc ?
    I fail to see what influence Hass's restart could or would have on the content of the site we're scraping from, and therefore on the values we're scraping them for.
    And in the rather unlikely case of an extremely volatile site, one can always self-initiate a scrape on startup ...

@danieldotnl
Copy link
Owner

I agree with @Paul-Vdp and I will work on implementing this. It's not a small feature request though, so it will take some time.

@Paul-Vdp
Copy link
Author

Paul-Vdp commented May 2, 2024

Much obliged @danieldotnl
I realize it is not a simple change, but I am confident you will manage ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants