Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toGRanges doesn't (seem to) recognize strand information #4

Open
FatihSarigol opened this issue Oct 21, 2020 · 0 comments
Open

toGRanges doesn't (seem to) recognize strand information #4

FatihSarigol opened this issue Oct 21, 2020 · 0 comments

Comments

@FatihSarigol
Copy link

Hi, thank you for this useful package first of all.
I wanted to apply strand information to my gene lists to reduce the effect of different strand overlaps to my permutation test.

-see:
https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-9-169

"About 10% of the genes under study are overlapping genes, the majority of which are different-strand overlaps."

Let's assume MyGeneFile is in 3 lines, tab separated as follows:

chr start end strand
1 4773206 4785739 -
1 6206197 6276648 +

Doing this using the function from regioneR:

toGRanges(read.table(file = 'MyGeneFile', header=T, check.names=FALSE))

results with this:

GRanges object with 2 ranges and 1 metadata column:
seqnames ranges strand | strand
<Rle> <IRanges> <Rle> | <character>
1 1 4773206-4785739 * | -
2 1 6206197-6276648 * | +

where you can see the GRanges object doesn't recognize the strand info but it takes it as an additional info column which I saw that doesn't have an effect on the results.

After trying different things, such as adding stranded=TRUE to toGRanges function, or using different header names, etc, I wanted to look into GenomicRanges package, and even though they dont explain how to read strand info from a file, either, I finally figured out it actually works when I try the same thing as above simply with a strand header, so doing this using the function from that package does the job:

GRanges(read.table(file = 'MyGeneFile', header=T, check.names=FALSE))

GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 1 4773206-4785739 -
[2] 1 6206197-6276648 +

And when I continued my analysis with this version of my data I did see a difference for number of overlaps I received which also affected the permutation tests. Apart from perhaps solving what seems to be bug (or if not maybe could use an explanation in the manual about how to do this), I believe strand information in general should be taken into consideration more when doing an overlap test for gene regions, which the paper or the manual don't seem to be mentioning, could be useful as a reminder for the users I think.

Thanks a lot
Fatih

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant