Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
we have investigated the issue with cgmls-dists in handling large input files (the error has been reported with 80k Lm samples) . The tool goes in segmentation fault.
The bug is due to an incorrect memory allocation for the distance vector. The memory size is calculated as nrownrow which generates an Integer Overflow for a large nrow and using 32 bits (line 219 on the original version).
The maximum value that can be stored in an int variable is 2147483647 (in our case, the final dist vector size might be 8000080000 = 6.400.000.000 > 2.147.483.647). This is due to the fact that the tool uses a vector and treats it as a matrix, which is a nice optimization.
We just imported the inttypes.h library to bypass the overflow using 64 bits. We have successfully tested on 80,000 samples and 1,748 loci.
We look forward to your feedback on this.
Best
Adriano