Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBM data stopped integrating #224

Closed
3 tasks done
cloneofghosts opened this issue May 14, 2024 · 14 comments
Closed
3 tasks done

NBM data stopped integrating #224

cloneofghosts opened this issue May 14, 2024 · 14 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@cloneofghosts
Copy link
Collaborator

Describe the bug

I noticed yesterday that the NBM model has stopped integrating with the last update being the 11Z on 2024-05-13. I know sometimes the data stops integrating for a few hours but then fixes itself. I checked this morning and I'm seeing the same time in the sourceTimes section so it appears something is broken.

I checked the status page that was linked in #191 and I see nothing for the NBM model so I suspect the issue is on PW's end? The NBM Fire model is integrating without any issues though I know that its separate from the main NBM model.

Expected behavior

NBM data should be integrating

Actual behavior

NBM data stopped integrating with the last update being 2024-05-13 11Z

API Endpoint

Production

Location

Ottawa, Ontario

Other details

No response

Troubleshooting steps

  • I have searched this repository and Home Assistant Repository to see if the issue has already been reported.
  • I have read through the API documentation before opening this issue.
  • I have written an informative title.
@cloneofghosts cloneofghosts added the bug Something isn't working label May 14, 2024
@alexander0042
Copy link
Collaborator

alexander0042 commented May 14, 2024

!!! Seeing these errors on my end now as well !!!
Investigating now

@cloneofghosts
Copy link
Collaborator Author

I'm guessing the investigation is causing the API to return a Internal Service Error?

@alexander0042
Copy link
Collaborator

Yea- what's happening is for some strange reason, I'm getting connection reset errors getting data from S3. Usually this isn't too much of an issue, they just repeat. However, the number of errors increased to the point it's not recovering on its own, so writing some error catching code

@alexander0042
Copy link
Collaborator

Ok, prod is back up (with new checks on ingest to fail gracefully now)

@cloneofghosts
Copy link
Collaborator Author

Was just about to make a comment that prod is back up but you beat me to it. So is this issue something that will sort itself out on its own?

Also seeing the same issue that you fixed this morning where I'm getting a mix of 2.0.5 and 2.0.6.

@alexander0042
Copy link
Collaborator

Yea- I restarted one container to addressed that -86400 issue, but waiting on NBM being ingested again before touching anything else :p

@cloneofghosts
Copy link
Collaborator Author

cloneofghosts commented May 14, 2024

@alexander0042 Don't know if it's related to this issue or something else but I started seeing "precipType": -999, in the currently and minutely sections. The minutely summary and icon are also broken

"minutely": {
  "summary": -999,
  "icon": -999,

EDIT: I see NBM is fixed but HRRR disappeared.

@alexander0042
Copy link
Collaborator

Ok, isolated the NBM Issue to an ingest file that failed in a way I hadn't thought of. Corrected now and everything seems to be moving correctly now

@cloneofghosts
Copy link
Collaborator Author

Yup, NBM seems to be working again though it looks like HRRR_subh seems to have disappeared.

@alexander0042
Copy link
Collaborator

Win some lose some... fixing it now

@cloneofghosts
Copy link
Collaborator Author

cloneofghosts commented May 15, 2024

@alexander0042 NBM seems to have gotten stuck again as the last update was the 15Z run from yesterday.

Also NBM fire seems to have gotten stuck as well.

@alexander0042
Copy link
Collaborator

Yea, either NOAA or AWS is having issues moving the files over from one side to the other for NBM today. This created a ton of issues with updating, since the data was partially there and I'd assumed it would either all be there or none of it. Regardless, good reason to improve the code, since it gave me a reason to add some additional error checking and NOMADS fallback!

Good news is that the AWS bucket seems to be re-populating now, just as I finished the fallback plan, so it should be updating again shortly, as well as more resilient in the future.

@cloneofghosts
Copy link
Collaborator Author

Can confirm that its updating again. I'll leave this open for a day or two to make sure that things are working before closing.

@cloneofghosts cloneofghosts added this to the 2.0.x milestone May 15, 2024
@cloneofghosts
Copy link
Collaborator Author

Things seem to be working so I'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

2 participants