Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend sample ID length limits #224

Open
MareikeJaniak opened this issue Apr 29, 2024 · 3 comments
Open

Extend sample ID length limits #224

MareikeJaniak opened this issue Apr 29, 2024 · 3 comments

Comments

@MareikeJaniak
Copy link

Good afternoon!

We are generating PCGR reports as part of our pipeline and have occasionally run into an issue when sample IDs are longer than 35 characters:

2024-04-16 02:00:02 - pcgr-validate-arguments-input - INFO - PCGR - STEP 0: Validate input data and options
2024-04-16 02:00:02 - pcgr-validate-arguments-input - ERROR - 
2024-04-16 02:00:02 - pcgr-validate-arguments-input - ERROR - Sample name identifier ('--sample_id' = ASDF-VB-23-34-000000452384B2-15255XY) must be between 2 and 35 characters long
2024-04-16 02:00:02 - pcgr-validate-arguments-input - ERROR - 

The sample IDs are outside of our control and can't be changed due to the sample tracking system that we have in place. We have come up with a work-around that shortens sample IDs longer than 35 characters just for the purposes of running PCGR and then renames the output files back to the actual sample ID, so they can be tracked within our system.

For future releases, we were just wondering if it would be possible to increase the length limit, or perhaps set the sample ID for the report title separately from the output file prefix?

Thanks!

Best,
Mareike

@sigven
Copy link
Owner

sigven commented May 2, 2024

Hi Mareike,

Thanks for reaching out. I truly understand your need, we will experiment a bit to see how such long sample names could fit in the new version we are working on. The length limitation was set due to visual purposes primarily. I'll get back to you shortly with some examples of how it may look in the new version, ok? Out of curiosity, do you happen to know the maximal character length of your sample identifiers? I noticed the one above is 36.

kind regards,
Sigve

@MareikeJaniak
Copy link
Author

Hi Sigve,

Thanks for your quick response!

I totally understand that there are visual concerns with having very long sample names. The sample name displayed in the output itself isn't a big concern for us and we are okay with truncating it for those purposes, but we have to keep the full sample name in the output file name, for tracking purposes. Maybe a solution could be an option that allows setting the sample ID displayed in the output and the output file prefix separately?

So far, all of the problematic sample names have been just 1-2 characters over the limit of 35. We have also communicated to the project manager that such long names aren't ideal, but because of the size of the project, some of it is outside of our control.

Like I said, we have found a work-around in our pipeline for now, by truncating the sample ID for PCGR and then renaming the output files, so this isn't an urgent issue, by any means! But I appreciate that you're considering it!

Best,
Mareike

@sigven
Copy link
Owner

sigven commented May 7, 2024

Here is a glimpse of how it will look in the upcoming version (for a dummy sample), seems to work ok.

https://www.dropbox.com/scl/fi/p22jglyyzwnbaqjdj5762/ASDF-VB-23-34-000000452384B2-15255XY.pcgr.grch37.html?rlkey=d3w9erg7mqm3fkxyf1i7bd50g&dl=0

best,
Sigve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants