Failed to upload to api/service/queue. Statuscode: InternalServerError #76

gregzip · 2023-04-04T14:28:31Z

Describe the bug
We're seeing multiple errors with almost no description, pretty much just Failed to upload to api/service/queue. Statuscode: InternalServerError.

To Reproduce
Let pods process requests for some time.

Expected behavior
Requests are processed correctly.

Screenshots
N/A

STDOUT/STDERR

fail: fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService[0]
Failed to upload to api/service/queue. Statuscode: InternalServerError; Response message:

(yes, the response message is empty)

POSSystem (please complete the following information):

b2460c21-f12b-4b2c-9bf4-a9487f8ede57 (Memento 8.225.0)

Cashbox Information (please complete the following information):
Hard to say, we send hundreds of requests, some of them fail because of #74 and #75, and the error message doesn't contain it.

Additional context
It's less important than #74 but still annoying as it triggers alerts on our side.

The text was updated successfully, but these errors were encountered:

gregzip · 2023-05-15T07:26:39Z

I was able to isolate this problem on a Sunday when there's almost no other load and it seemed to be caused by one of the queues. I changed it's configuration to use the newest package version and sent a null echo, and the issue is now gone.

As far as the issue is concerned, I think we're happy to close this but I'd be interested to know why this worked. Are those configuration updates something we should do with other queues as well? Or is it about updating the config but not necessarily related to the configured package version? 🤔

I got the fix idea from this follow-up line in the logs:

fail: fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader[0]
Error while pushing configuration data for queue bab0b7ef-9595-436a-bc94-f1e580da6a09

gregzip · 2023-05-16T09:31:57Z

Unfortunately this wasn't a permanent fix - after some time both errors started showing up again for the same queue / cashbox.

gregzip · 2023-05-16T11:18:58Z

One more little update: these 2 kinds of errors: Failed to upload to api/service/queue. and Error while pushing configuration data for queue seem to be related (you know that better than me probably 😬) but they don't come in pairs. We sometimes get quite a few Failed to upload to api/service/queue. for none or single instance of the latter. This seems to suggest other queues / cashboxes are affected as well but the logging message doesn't say which ones.

volllly · 2023-05-17T12:23:59Z

Hey, thanks for the update. With the added information we were able to find the problem 🎉

We'll release a fix this week or the begining of next week. I'll notify you here once its out.

volllly · 2023-05-26T10:54:02Z

Hey, my estimation was off by a bit 😅

We've just released 1.3.46-bullseye and 1.3.46-jammy including a bugfix of this issue 🚀

gregzip · 2023-05-31T12:45:51Z

Hey! Just released 1.3.46-bullseye and so far it seems there's less of these errors but they're not 100% gone:

fail: fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader[0]
      Error while pushing configuration data for queue bab0b7ef-9595-436a-bc94-f1e580da6a09 (UrlType: rest; Url: rest://localhost:1500/bab0b7ef-9595-436a-bc94-f1e580da6a09)
      fiskaltrust.Middleware.Helper.Helipad.Exceptions.HelipadException
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService.UploadDataRequestAsync(String url, Object item)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService.UploadQueueListAsync(List`1 data)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.UploadFromStreamAsync(Stream stream)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.UploadConfigurationEntities(IPOS worker)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.ProcessUploadWith(ClientConfiguration clientConfiguration, Guid queueId)
--

gregzip · 2023-06-15T13:32:19Z

Hey! Any updates on this? I was doing some cleanup work and ran a couple of queries to see how common is the issue.

This is the graph with messages matching /Error while pushing configuration data/ and /Failed to upload/ excluded:

This is the graph with those 2 included:

Not only these errors are still there, they also make up most of the errors we're seeing in logs at the moment.

volllly · 2023-06-20T06:23:44Z

Hey, we've also not forgotten about this one 😁 we just need to find the time to tackle it

gregzip · 2023-09-12T07:52:48Z

Hey! Any news on this?

mertcanyalhi · 2023-09-13T15:55:07Z

We've started to observe the same issue after upgrading from 1.3.44 (1.3-buster) to 1.3.46 (1.3-bullseye). Apart from the errors mentioned above, we're observing the following error:

fail: fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader[0]
      Error while pushing entity data (341, journalde)
      System.Net.Http.HttpRequestException: Response status code does not indicate success: 503 (Service Unavailable).
         at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService.GetPointerAsync(Guid queueId, String entityValue)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.UploadEntities(IPOS worker, Guid queueId, KeyValuePair`2 entity, Int32 uploadReadCount)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.UploadEntityAsync(ClientConfiguration clientConfiguration, Guid queueId, KeyValuePair`2 entity, Int32 startRecordCount, Action`1 setRecordCount)

After digging into this issue further, we noticed that the requests against Helipad (https://helipad.fiskaltrust.cloud/) are failing (e.g. while performing HelipadService.GetPointerAsync or HelipadService.UploadDataRequestAsync, which uses the HttpClient for communicating with Helipad).

When this issue is observed, our requests against the ByoDC middleware are timing out (our HTTP client timeout is set to 10 seconds due to operational reasons); can we please ask for your input/help on this?

volllly · 2023-09-29T08:15:32Z

Hey, can you check if these problems are fixed in the 1.3.49-bullseye/1.3.49-jammy releases?

mertcanyalhi · 2023-09-29T08:31:22Z

Hey @volllly,

Timeouts

The majority of the timeouts (coupled with the 5xx errors) we observed in 1.3.46 were fixed, but our requests are still timing out randomly, but not as frequently as before.
Before we can provide any further information, we'll need to go through the load balancer logs/trace redis locks.

5xx responses in logs

We're still observing 5xx responses in logs, but the time window does not match with any of the timeouts we're observing.
Not 100% sure, but based on the logs and traces, it looks like Helipad is responding with 500/503:

fail: fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService[0]
      Failed to upload to api/service/queuede. Statuscode: InternalServerError; Response message: 
fail: fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader[0]
      Error while pushing configuration data for queue 73257a4a-70a6-4f8d-89cd-581e82eae30d (UrlType: rest; Url: rest://localhost:1500/73257a4a-70a6-4f8d-89cd-581e82eae30d)
      fiskaltrust.Middleware.Helper.Helipad.Exceptions.HelipadException
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService.UploadDataRequestAsync(String url, Object item)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService.UploadeQueueDEListAsync(List`1 data)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.UploadFromStreamAsync(Stream stream)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.UploadConfigurationEntities(IPOS worker)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.ProcessUploadWith(ClientConfiguration clientConfiguration, Guid queueId)

fail: fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService[0]
      Failed to upload to api/service/signaturcreationunitde. Statuscode: ServiceUnavailable; Response message: 
fail: fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader[0]
      Error while pushing configuration data for queue 83b65314-d793-49e4-a56f-634f7c650c8b (UrlType: rest; Url: rest://localhost:1500/83b65314-d793-49e4-a56f-634f7c650c8b)
      fiskaltrust.Middleware.Helper.Helipad.Exceptions.HelipadException
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService.UploadDataRequestAsync(String url, Object item)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService.UploadSignatureCreationUnitDEListAsync(List`1 data)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.UploadFromStreamAsync(Stream stream)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.UploadConfigurationEntities(IPOS worker)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.ProcessUploadWith(ClientConfiguration clientConfiguration, Guid queueId)
fail: fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader[0]
      Error while pushing entity data (3, queueitem)
      System.Net.Http.HttpRequestException: Response status code does not indicate success: 503 (Service Unavailable).
         at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadService.GetPointerAsync(Guid queueId, String entityValue)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.UploadEntities(IPOS worker, Guid queueId, KeyValuePair`2 entity, Int32 uploadReadCount)
         at fiskaltrust.Middleware.Helper.Helipad.Services.HelipadUploader.UploadEntityAsync(ClientConfiguration clientConfiguration, Guid queueId, KeyValuePair`2 entity, Int32 startRecordCount, Action`1 setRecordCount)

mertcanyalhi · 2023-09-30T13:30:23Z

Hey @volllly, we tried to find a correlation between the ByoDC request timeouts and the errors, and noticed the following:

In 50% of the cases, we noticed the following error in the exact same time window: Failed to upload to api/service/queue. Statuscode: InternalServerError
In 50% of the cases, we noticed no error/warning logs in the same time window. However, for these cases, we observed the same error ^ in the past 5 to 33 minute window.

At this point, we're not sure if the timeouts are caused by the Helipad uploads, but just wanted to provide input. We'll change the log verbosity to see if we can catch any further logs.

mertcanyalhi · 2023-10-09T11:17:49Z

Hey @volllly, we observed a similar behavior; once we restarted the pods, we did not have any request timeouts against ByoDC for ~3 days. After 3 days, we started to observe timeouts, and one of the pods restarted after it became unhealthy.

Before the pod restarted, there was a small spike in the memory usage (the chart given below shows the memory usage percentage where 100% corresponds to 1Gi):

After the pod restarted, the spikes in CPU usages dropped even though the pod was receiving the exact same traffic for 13 out of 15 cashboxes:

The CPU usage seems to have increasing number of spikes over time for the affected pod:

And there is an increase in CPU usage spikes for all pods, but it depends on the # of transactions processed by the cashboxes they currently hold:

It looks like the increase in average CPU usage can also be grouped in the following clusters:

00:00-07:59 - After TSE-device TAR file downloads are triggered
08:00-23:59 - After daily operations are initiated

We also observed an increase in memory usage over time, which may be an indicator of a leak 🤔:

There are spikes in the increases that are caused by zero receipt requests (0x4445000000000002) with TSE-device TAR file download request flag (0x0000000002000000):

Attached logs for the pod that was restarted below:

byodc-7f84d56544-6k8fl.log

volllly · 2023-10-11T11:44:48Z

Hey, sorry for not getting back to you earlier, could you possibly provide us with a memory dump of a pod when it is experiencing the problems? (you can upload it here).

FYI Here is a KBA on how to get a memory dump.

volllly · 2023-10-13T08:49:46Z

@gregzip @mertcanyalhi we've released 1.3.50-rc1-jammy/1.3.50-rc1-bullseye which should fix some of the issues with timeouts and memory usage.

volllly · 2023-10-13T14:04:07Z

Also we just deployed an update to the sandbox helipad that should fix the 500/503 upload errors.

mertcanyalhi · 2023-10-26T08:46:49Z

@volllly Unfortunately, 1.3.50-rc1-bullseye did not solve the issue for us. We're still observing timeouts, increasing memory usage, and InternalServerError/ServiceUnavailable responses.

We tried to get a memory dump multiple times, but unfortunately, we're hitting the memory limit when we're getting a dump after the memory usage increases. We were able to get a dump from a pod that has the lowest amount of memory usage, and uploaded it, but I'm not sure if it'll help. We'll increase the memory limit once again, and will try to get a memory dump once we observe increasing memory usage.

mertcanyalhi · 2023-10-28T11:59:36Z

@volllly Uploaded more memory dumps for multiple pods that were captured in the past 3 days.
We're observing InternalServerError responses from Helipad in the logs, and also increased latency for the requests against ByoDC when Helipad returns InternalServerError responses.

mertcanyalhi · 2023-11-03T05:59:40Z

@volllly The following graph shows the memory usage of a ByoDC pod, and once the memory usage started to increase, we started to observe the following more frequently:

Increased latencies on all requests (e.g. GET /api/version)
MySQL connection errors
Helipad upload errors (not sure if this is due to an actual error as we're also frequently observing InternalServerError responses from Helipad)

mertcanyalhi · 2023-11-13T07:32:04Z

@volllly We noticed that the # of SELECT queries is growing with signature requests over time, and there are even huge amount of SELECT queries in idle. When the pods are restarted, the # of SELECT queries drop to 0, and start to increase when there is traffic.

The growing # of SELECT queries and the drops after pod restarts can be observed from the chart given below:

Yesterday while there was low traffic (85 signature requests in total from 3 cashboxes), we captured the SELECT queries (orange line on the chart ^).

The following breakdown shows the # of SELECT queries in a 2 minute window when there was only a single signature request:

SELECT Queries: 3903
 |_ `select * from ftCashBox`: 279
 |_ `select * from ftQueue`: 280
 |_ `select * from ftQueueAT`: 279
 |_ `select * from ftQueueDE`: 280
 |_ `select * from ftQueueFR`: 279
 |_ `select * from ftQueueME`: 279
 |_ `select * from ftQueueItem WHERE TimeStamp >= {0} ORDER BY TimeStamp LIMIT {1}`: 278
 |_ `select * from ftActionJournal WHERE TimeStamp >= {0} ORDER BY TimeStamp LIMIT {1}`: 278
 |_ `select * from ftReceiptJournal WHERE TimeStamp >= {0} ORDER BY TimeStamp LIMIT {1}`: 279
 |_ `select * from ftJournalDE WHERE TimeStamp >= {0} ORDER BY TimeStamp LIMIT {1}`: 278
 |_ `select * from ftSignaturCreationUnitAT`: 278
 |_ `select * from ftSignaturCreationUnitDE`: 278
 |_ `select * from ftSignaturCreationUnitFR`: 278
 |_ `select * from ftSignaturCreationUnitME`: 278
 |_ `Select * from OpenTransaction where cbReceiptReference = {0};`: 2

The following breakdown shows the # of SELECT queries in a 2 minute window after the pods were restarted (6 workers were initialized after the pods were restarted):

SELECT Queries:
|_ `select * from ftCashBox`: 4
|_ `select * from ftQueue`: 4
|_ `select * from ftQueueAT`: 4
|_ `select * from ftQueueDE`: 4
|_ `select * from ftQueueFR`: 4
|_ `select * from ftQueueME`: 4
|_ `select * from ftQueueItem WHERE TimeStamp >= {0} ORDER BY TimeStamp LIMIT {1}`: 4
|_ `select * from ftActionJournal WHERE TimeStamp >= {0} ORDER BY TimeStamp LIMIT {1}`: 4
|_ `select * from ftReceiptJournal WHERE TimeStamp >= {0} ORDER BY TimeStamp LIMIT {1}`: 4
|_ `select * from ftJournalDE WHERE TimeStamp >= {0} ORDER BY TimeStamp LIMIT {1}`: 4
|_ `select * from ftSignaturCreationUnitAT`: 4
|_ `select * from ftSignaturCreationUnitDE`: 4
|_ `select * from ftSignaturCreationUnitFR`: 4
|_ `select * from ftSignaturCreationUnitME`: 4

volllly · 2023-11-22T12:37:26Z

Hey, we've pushed a fix to helipad you should not see 500 and 503 errors anymore.

TSchmiedlechner · 2024-01-22T14:53:17Z

Hey @mertcanyalhi, are you still experiencing this issue, or did the fix deployed in November solve the problem?

mertcanyalhi · 2024-01-22T15:07:58Z

Hey @TSchmiedlechner

500/503 errors - we observed such responses after November 22, but did not observe them in the past 7 days
Growing number of SELECT queries/growing memory usage - we are now restarting ByoDC pods daily, but we can still see a daily increase pattern in # of SELECT queries & memory usage

We're currently using 1.3.54-bullseye.

gregzip · 2024-01-23T12:42:52Z

We're still seeing heaps of errors daily like:

fail: fiskaltrust.SignatureCloud.DE.Services.HelipadUploader[0]
      Error while pushing entity data (2, receiptjournal)
      System.Net.WebException: The remote server returned an error: (500) Internal Server Error.
         at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
         at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)

At some point I just stopped looking at them and decided to only act when clients are seriously affected, but it's something that'd be good to resolve - the very reason we monitor this is to avoid situations where a client has to call us 🙃

(we're at 1.3.55-rc1, also restarting pods daily so that they don't die one by one during the day)

gregzip added the bug Something isn't working label Apr 4, 2023

volllly mentioned this issue Apr 11, 2023

MySQL connection timeouts and pod restarts #74

Open

volllly self-assigned this Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to upload to api/service/queue. Statuscode: InternalServerError #76

Failed to upload to api/service/queue. Statuscode: InternalServerError #76

gregzip commented Apr 4, 2023 •

edited

gregzip commented May 15, 2023

gregzip commented May 16, 2023

gregzip commented May 16, 2023

volllly commented May 17, 2023

volllly commented May 26, 2023

gregzip commented May 31, 2023

gregzip commented Jun 15, 2023 •

edited

volllly commented Jun 20, 2023

gregzip commented Sep 12, 2023

mertcanyalhi commented Sep 13, 2023 •

edited

volllly commented Sep 29, 2023

mertcanyalhi commented Sep 29, 2023

mertcanyalhi commented Sep 30, 2023

mertcanyalhi commented Oct 9, 2023

volllly commented Oct 11, 2023

volllly commented Oct 13, 2023

volllly commented Oct 13, 2023

mertcanyalhi commented Oct 26, 2023 •

edited

mertcanyalhi commented Oct 28, 2023

mertcanyalhi commented Nov 3, 2023

mertcanyalhi commented Nov 13, 2023 •

edited

volllly commented Nov 22, 2023

TSchmiedlechner commented Jan 22, 2024

mertcanyalhi commented Jan 22, 2024 •

edited

gregzip commented Jan 23, 2024

Failed to upload to api/service/queue. Statuscode: InternalServerError #76

Failed to upload to api/service/queue. Statuscode: InternalServerError #76

Comments

gregzip commented Apr 4, 2023 • edited

gregzip commented May 15, 2023

gregzip commented May 16, 2023

gregzip commented May 16, 2023

volllly commented May 17, 2023

volllly commented May 26, 2023

gregzip commented May 31, 2023

gregzip commented Jun 15, 2023 • edited

volllly commented Jun 20, 2023

gregzip commented Sep 12, 2023

mertcanyalhi commented Sep 13, 2023 • edited

volllly commented Sep 29, 2023

mertcanyalhi commented Sep 29, 2023

mertcanyalhi commented Sep 30, 2023

mertcanyalhi commented Oct 9, 2023

volllly commented Oct 11, 2023

volllly commented Oct 13, 2023

volllly commented Oct 13, 2023

mertcanyalhi commented Oct 26, 2023 • edited

mertcanyalhi commented Oct 28, 2023

mertcanyalhi commented Nov 3, 2023

mertcanyalhi commented Nov 13, 2023 • edited

volllly commented Nov 22, 2023

TSchmiedlechner commented Jan 22, 2024

mertcanyalhi commented Jan 22, 2024 • edited

gregzip commented Jan 23, 2024

gregzip commented Apr 4, 2023 •

edited

gregzip commented Jun 15, 2023 •

edited

mertcanyalhi commented Sep 13, 2023 •

edited

mertcanyalhi commented Oct 26, 2023 •

edited

mertcanyalhi commented Nov 13, 2023 •

edited

mertcanyalhi commented Jan 22, 2024 •

edited