Gene name hyphens get filtered out when using cob.from_table() #66

MeeshCompBio · 2017-10-06T19:35:29Z

If I have a data matrix that looks like this to serve as input into COB.from_table.

R1M-C2	R2M-C2	R3M-C2	R1T-C2
BDIBD21-3.1G0000700	0.440474	0.255481	0.312441
BDIBD21-3.1G0000800	1.41546	2.19172	2.00877
BDIBD21-3.1G0000900	0.0210054	0	0.0714931
BDIBD21-3.1G0001000	0	0	0

That command converts the gene names to something like BDIBD213.1G000290 before testing membership where the hyphen gets removed. This change ends up filtering out all of the genes since they don't match the RefGen gene names that include the hyphen.

monprin · 2017-10-06T20:16:42Z

So the place it is getting filtered out is right here at line 256:

https://github.com/schae234/Camoco/blob/master/camoco/Expr.py

I don't know if I did this or someone else, but if you change the regexp it should work, but I can't speak to the downstream effects of that, I looked and remembered having issues with column labels in bColz (see solution to that problem a couple lines above), but not specifically in the columns themselves.

…ntain unicode.

schae234 · 2017-10-10T15:42:00Z

It looks like this code was introduced because of HDF5 not bcolz. I think we can just remove this regex and be fine.

But we should add a regression test, I am not sure what bcolz can store in terms of strings.

monprin · 2017-10-10T16:32:26Z

Okay cool,

I don't recall noticing any weirdness, but may have been protected by that regex.

schae234 referenced this issue Oct 10, 2017

Fixed that fucking bug where hdf5 stores empty string columns that co…

ac2eb29

…ntain unicode.

schae234 added this to the v0.6.0 milestone Feb 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene name hyphens get filtered out when using cob.from_table() #66

Gene name hyphens get filtered out when using cob.from_table() #66

MeeshCompBio commented Oct 6, 2017

monprin commented Oct 6, 2017

schae234 commented Oct 10, 2017

monprin commented Oct 10, 2017

Gene name hyphens get filtered out when using cob.from_table() #66

Gene name hyphens get filtered out when using cob.from_table() #66

Comments

MeeshCompBio commented Oct 6, 2017

monprin commented Oct 6, 2017

schae234 commented Oct 10, 2017

monprin commented Oct 10, 2017