toGRanges doesn't (seem to) recognize strand information #4

FatihSarigol · 2020-10-21T10:04:15Z

Hi, thank you for this useful package first of all.
I wanted to apply strand information to my gene lists to reduce the effect of different strand overlaps to my permutation test.

-see:
https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-9-169

"About 10% of the genes under study are overlapping genes, the majority of which are different-strand overlaps."

Let's assume MyGeneFile is in 3 lines, tab separated as follows:

chr start end strand
1 4773206 4785739 -
1 6206197 6276648 +

Doing this using the function from regioneR:

toGRanges(read.table(file = 'MyGeneFile', header=T, check.names=FALSE))

results with this:

GRanges object with 2 ranges and 1 metadata column:
seqnames ranges strand | strand
<Rle> <IRanges> <Rle> | <character>
1 1 4773206-4785739 * | -
2 1 6206197-6276648 * | +

where you can see the GRanges object doesn't recognize the strand info but it takes it as an additional info column which I saw that doesn't have an effect on the results.

After trying different things, such as adding stranded=TRUE to toGRanges function, or using different header names, etc, I wanted to look into GenomicRanges package, and even though they dont explain how to read strand info from a file, either, I finally figured out it actually works when I try the same thing as above simply with a strand header, so doing this using the function from that package does the job:

GRanges(read.table(file = 'MyGeneFile', header=T, check.names=FALSE))

GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 1 4773206-4785739 -
[2] 1 6206197-6276648 +

And when I continued my analysis with this version of my data I did see a difference for number of overlaps I received which also affected the permutation tests. Apart from perhaps solving what seems to be bug (or if not maybe could use an explanation in the manual about how to do this), I believe strand information in general should be taken into consideration more when doing an overlap test for gene regions, which the paper or the manual don't seem to be mentioning, could be useful as a reminder for the users I think.

Thanks a lot
Fatih

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

toGRanges doesn't (seem to) recognize strand information #4

toGRanges doesn't (seem to) recognize strand information #4

FatihSarigol commented Oct 21, 2020

toGRanges doesn't (seem to) recognize strand information #4

toGRanges doesn't (seem to) recognize strand information #4

Comments

FatihSarigol commented Oct 21, 2020