New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDF5ExtError in daily update #273
Comments
This turns out to be #150 (reading raw data while the raw datastore file is being simultaneously written into by the writer on frome) after all. It turned off HDF5 1.10 file locking and the update crashed with:
This is most certainly caused by the writer on frome writing while we are trying to open the file for reading. The daily update starts at 4am local time (servers use local time) which is 2am UTC during summer/DST. I have changed this to 5am local time (3am UTC) to reduce the chance of frome writing to the datastore while the update runs. I will investigate if |
Clever idea! |
It might be as easy as two cronjobs But I'd like at add some kind of mail/alert when supervisord fails to (re)start a service: Perhaps |
Or, the update process consists of stopping the writer, running the update process, and finally starting the writer. But then we definitely need alerts. |
Yes, I thought about that... The communication between pique and frome (start/stop writer) puts me off. I don't fancy doing that with xml-rpc (but we can). We do have alerts (sentry.io) on pique/daily update, so we do get an email if the daily update fails. |
This has been fixed (temporarily) for a couple of months by turning the writer on frome on/off during the daily update using a cronjob. TODO: Add to ansible provisioning. |
We have been experience frequent (>1/week) HDF5Errors which break the daily update:
This is while OPENing an ESD file: https://github.com/HiSPARC/publicdb/blob/master/publicdb/histograms/esd.py#L121
(happens at singles/weather/events)
My analysis is that this is an HDF5 1.10 issue (we upgraded some time ago, now that HDF5 1.10 is default in anaconda). There is an issue with file locking (HDF5 1.10 supports SWMR: single writer multiple readers with a special file locking mechanism).
Fix:
export HDF5_USE_FILE_LOCKING="FALSE"
I have to figure out how to do that in django jobs.
https://stackoverflow.com/a/51735764/4965175
This is not #150 (update fails when a station writes to a raw datastore file while we are reading it)
The text was updated successfully, but these errors were encountered: