You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks again for your really handy program - the gene name thing used to drive me crazy and now it doesn't: so nice! Here's a suggestion: this would be useful for me if it's easy to do and makes sense for others too.
I'm using R to make a table of CNV regions with one row per region. For each region I have various columns that describe the genes each CNV overlaps. If a CNV overlaps >1 gene, I concatenate the multiple gene information in each column together using commas (rather than having duplicate rows one per gene). I then use write.table to export tab-delimited data and escape_excel.pl on the output, usually including quotes.
Escape_excel does very nicely on the gene names (from yeast in this case) but I'm not seeing the numeric columns as I'd like to in the ideal world.
Here's an example - some of my fields (start coords for two genes)
look like this in the R output: "37303,38690"
look like this in the text file after escape_excel.pl: 37303,38690 (it removed the quotes)
but after Excel import it shows 3730338690 (it removed the comma).
I should probably just use another character for the concatenation, but would it be reasonable to try to handle situations like this in escape_excel?
thanks for thinking about it,
Janet Young
The text was updated successfully, but these errors were encountered:
Try the most recent escape_excel.pl in the development folder.
I added the --unstrip flag to restore fields that have had characters stripped from them, so long as they aren't escaped. If they need escaping, the characters will still remain stripped. UTF8 byte order marks are still stripped regardless, although that is an easy change if anyone feels it is necessary.
I haven't rebuilt any of the executables, plugins, updated the version in any other tool implementations, etc. just yet, and may not have time to for a few weeks.
I just pushed a major update involving smarter handling of quotes in Excel. As I revisited this open issue for the first time in 7 years, I realized that none of the updates actually fix the issue you reported. Back in 2017, I was in a hurry and, for whatever reason, thought that keeping the double quotes using the --unstrip flag would be sufficient to fix it. I see now that, even keeping the double quotes, Excel still interprets 37303,38690 as one big number. I apologize for not understanding (or sufficiently testing) the issue 7 years ago. 7 years is along time to wait for a proper fix....
I will look into devising a rule to handle this weird behavior. Thanks. And again, sorry for the super long delay.
OK, I think I fixed it properly this time. I now escape anything that looks like a number and contains ,#### or ####,###. This appears to cover all the cases where Excel interprets the fields as numbers when we don't want it to. 123,456 is still treated as a number and not escaped, though. If you want to escape ALL numeric-looking fields with commas in them, you can use the new --all-commas flag, which will then also escape 123,456. The new --no-commas flag will disable all of the new numeric comma escaping rules.
thank you! Don't apologize - I also dropped the ball on fully reading your reply 7 years ago!
I'm really not using the tool any more - now using the R package openxlsx to export directly instead of via a csv/tsv file. But I can still see it being useful in some situations.
Dear Eric and colleagues,
Thanks again for your really handy program - the gene name thing used to drive me crazy and now it doesn't: so nice! Here's a suggestion: this would be useful for me if it's easy to do and makes sense for others too.
I'm using R to make a table of CNV regions with one row per region. For each region I have various columns that describe the genes each CNV overlaps. If a CNV overlaps >1 gene, I concatenate the multiple gene information in each column together using commas (rather than having duplicate rows one per gene). I then use write.table to export tab-delimited data and escape_excel.pl on the output, usually including quotes.
Escape_excel does very nicely on the gene names (from yeast in this case) but I'm not seeing the numeric columns as I'd like to in the ideal world.
Here's an example - some of my fields (start coords for two genes)
look like this in the R output: "37303,38690"
look like this in the text file after escape_excel.pl: 37303,38690 (it removed the quotes)
but after Excel import it shows 3730338690 (it removed the comma).
I should probably just use another character for the concatenation, but would it be reasonable to try to handle situations like this in escape_excel?
thanks for thinking about it,
Janet Young
The text was updated successfully, but these errors were encountered: