Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some job counters are prone to overflow/wraparound #143

Open
abought opened this issue Mar 13, 2024 · 0 comments
Open

Some job counters are prone to overflow/wraparound #143

abought opened this issue Mar 13, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@abought
Copy link
Collaborator

abought commented Mar 13, 2024

Summary

Certain job metrics are not being stored correctly in the database. This makes it more difficult to investigate system performance questions like "how many people would be affected if we put a limit on number of SNPs submitted".

There may be alternate methods to grab the data eventually from job logs, but not as convenient. It's not an urgent fix, but definitely a "gotcha".

Tracking in case this surprises anyone else!

Description/ root cause

  • A counter value like genotypes is calculated by multiplying two large ints, like "genotypes * samples". The result is bigger than the maximum java value for that type (2147483647 for signed ints)
  • Java represents this as a much smaller number
  • The correct value is shown in UI / job logs (which are stored separately as a pre-constructed text string), but the wrong value is stored in the DB table.

This affects both the initial calculation, and the incCounters method (which accepts an int).

Example

A recent job submitted 2.5M SNPs with 15k samples. (3.75 e 10) The Java max value for an int is ~2.1B. The resulting value is wrapped to ~3e6. The correct # of SNPs and samples are shown in the job report (where they are represented separately), but the values in the report do not match the numbers stored in the database (which are multiplied together).

In practice, this is usually not obvious until one needs to query to find big jobs. A subtler sign of an issue is that in TIS, ~10% of "genotypes" counters are < 0.

select count(*) from counters where name='genotypes' and value <0;

Note: the MySQL table definition would already support bigger numbers (counters.value = bigint column type). The issue appears to be in java.

@abought abought added the bug Something isn't working label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant