Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggestion for numeric fields containing commas #4

Open
jayoung opened this issue Sep 13, 2017 · 4 comments
Open

suggestion for numeric fields containing commas #4

jayoung opened this issue Sep 13, 2017 · 4 comments
Assignees

Comments

@jayoung
Copy link

jayoung commented Sep 13, 2017

Dear Eric and colleagues,

Thanks again for your really handy program - the gene name thing used to drive me crazy and now it doesn't: so nice! Here's a suggestion: this would be useful for me if it's easy to do and makes sense for others too.

I'm using R to make a table of CNV regions with one row per region. For each region I have various columns that describe the genes each CNV overlaps. If a CNV overlaps >1 gene, I concatenate the multiple gene information in each column together using commas (rather than having duplicate rows one per gene). I then use write.table to export tab-delimited data and escape_excel.pl on the output, usually including quotes.

Escape_excel does very nicely on the gene names (from yeast in this case) but I'm not seeing the numeric columns as I'd like to in the ideal world.

Here's an example - some of my fields (start coords for two genes)
look like this in the R output: "37303,38690"
look like this in the text file after escape_excel.pl: 37303,38690 (it removed the quotes)
but after Excel import it shows 3730338690 (it removed the comma).

I should probably just use another character for the concatenation, but would it be reasonable to try to handle situations like this in escape_excel?

thanks for thinking about it,

Janet Young

@welshea
Copy link
Collaborator

welshea commented Sep 14, 2017

Try the most recent escape_excel.pl in the development folder.

I added the --unstrip flag to restore fields that have had characters stripped from them, so long as they aren't escaped. If they need escaping, the characters will still remain stripped. UTF8 byte order marks are still stripped regardless, although that is an easy change if anyone feels it is necessary.

I haven't rebuilt any of the executables, plugins, updated the version in any other tool implementations, etc. just yet, and may not have time to for a few weeks.

@welshea
Copy link
Collaborator

welshea commented Mar 21, 2024

I just pushed a major update involving smarter handling of quotes in Excel. As I revisited this open issue for the first time in 7 years, I realized that none of the updates actually fix the issue you reported. Back in 2017, I was in a hurry and, for whatever reason, thought that keeping the double quotes using the --unstrip flag would be sufficient to fix it. I see now that, even keeping the double quotes, Excel still interprets 37303,38690 as one big number. I apologize for not understanding (or sufficiently testing) the issue 7 years ago. 7 years is along time to wait for a proper fix....

I will look into devising a rule to handle this weird behavior. Thanks. And again, sorry for the super long delay.

@welshea
Copy link
Collaborator

welshea commented Mar 21, 2024

OK, I think I fixed it properly this time. I now escape anything that looks like a number and contains ,#### or ####,###. This appears to cover all the cases where Excel interprets the fields as numbers when we don't want it to. 123,456 is still treated as a number and not escaped, though. If you want to escape ALL numeric-looking fields with commas in them, you can use the new --all-commas flag, which will then also escape 123,456. The new --no-commas flag will disable all of the new numeric comma escaping rules.

@jayoung
Copy link
Author

jayoung commented Mar 21, 2024

thank you! Don't apologize - I also dropped the ball on fully reading your reply 7 years ago!

I'm really not using the tool any more - now using the R package openxlsx to export directly instead of via a csv/tsv file. But I can still see it being useful in some situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants