Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions: Why half of pairwise is not a exact mirror of the other half? And about negative values.. #35

Open
Rseq opened this issue Jul 4, 2020 · 7 comments

Comments

@Rseq
Copy link

Rseq commented Jul 4, 2020

Good morning,

Many thanks for developing this amazing tool.
I have a doubt about the pairwise mode that may seem naive, but I could not figure out.
I notice in some of my dataset and also here in this example:

intervene pairwise -i ~/dbSUPER/mm9/*.bed --filenames --compute frac --htype color
https://intervene.readthedocs.io/en/latest/_images/pairwise_color.png

In that example, half of the data is not mirrored on the other half. As the combination is the same, why the values are not mirrored?
For example, "Bone_Marrow" row and "Spleen" column should not have the same value as "Spleen" row and "Bone_Marrow" column?

I assume that this is the reason why "tribar" mode is not recommended for "count" or "frac", but why this happen?
Would you mind to clarify? And how to interpret this in the correct way?

If I plot correlation , let's say "pearson", this doesn't happen and I get mirrored values as I would expected. Would this be a solution?
If so, what negative values are saying to me in this case? Would that be that instead of being negative correlate (variable A increases while B decreases or vice-versa) it would be close to zero (-1 = 0 overlaps)?

Thank you for your time

@Rseq Rseq changed the title Questions: Why half of pairwise is not a exactly mirror of the other half? And about negative values.. Questions: Why half of pairwise is not a exact mirror of the other half? And about negative values.. Jul 4, 2020
@amizeranschi
Copy link

Hi @Rseq

I'm not a developer of Intervene, just a regular user. However, I'm just as confused as you about the results that it produces. Have a look: #34

I am guessing that both of our problems are related to what is stated in an older comment here: #27 (comment).

It doesn't make much sense to me that the order of the files should give different results. Set intersection should be commutative, even when looking at overlaps across multiple (sets of) genomic regions. This also makes the results of Intervene to end up very different from those of other tools, as I've shown in #34.

@Rseq
Copy link
Author

Rseq commented Jul 6, 2020

Thanks for sharing your doubts as well.
I'm also tracking the #34 as I could not understand how exactly is working.
Let's hope that the developers can clarify the points we made.

@amizeranschi
Copy link

Yes, I hope we'll get a reply from the developers.

@asntech
Copy link
Owner

asntech commented Jul 15, 2020

Dear @Rseq @amizeranschi,

I apologize for the late response. For some reason, this slipped off my radar.

This is quite tricky when we plot Ven diagrams for genomic regions. As you will not have always a one-to-one overlap.
For example:

(a + b + c) != (b + c + a)

This is explained well enough here by the pybedtools developer as posted by @amizeranschi #27 daler/pybedtools#45 (comment)

@Rseq you are right in your first comment. This is the reason why tribar mode is not recommended for count or frac as A interset B is not always equal to B intersect A for genomic sets.

I will push a new version soon with options to set u=True or Fase and v=True or False. But it makes sense to keep these set to True by default.

I hope this helps and thanks again for your interest!

Best,
Aziz

@Rseq
Copy link
Author

Rseq commented Jul 16, 2020

Many thanks for your reply, @asntech !
I believe my expectations would be more close to multiinter. But, it really makes sense these differences.

Although this part here is still not clear for me

If I plot correlation , let's say "pearson", this doesn't happen and I get mirrored values as I would expected. Would this be a solution?
If so, what negative values are saying to me in this case? Would that be that instead of being negative correlate (variable A increases while B decreases or vice-versa) it would be close to zero (-1 = 0 overlaps)?

Would you mind explain it or point me towards an explanation?

Thank you for your time

@amizeranschi
Copy link

+1 to implementing bedtools multiinter (or bedops --intersect, which is equivalent) in Intervene, as an alternative option to the current approach.

This way, the intersection operation would become commutative, so the order of the input files won't matter and the pairwise plot would be symmetric.

@Rohit-Satyam
Copy link

Rohit-Satyam commented Jul 16, 2020

Hi @asntech

I hope these warnings will go away too with the update you are planning

/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py:214: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  D = D.ix[cluster_order, cluster_order]
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py:150: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  series = series.ix[order]
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:307: MatplotlibDeprecationWarning:
The rowNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().rowspan.start instead.
  layout[ax.rowNum, ax.colNum] = ax.get_visible()
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:307: MatplotlibDeprecationWarning:
The colNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().colspan.start instead.
  layout[ax.rowNum, ax.colNum] = ax.get_visible()
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:313: MatplotlibDeprecationWarning:
The rowNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().rowspan.start instead.
  if not layout[ax.rowNum + 1, ax.colNum]:
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:313: MatplotlibDeprecationWarning:
The colNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().colspan.start instead.
  if not layout[ax.rowNum + 1, ax.colNum]:

Also when I try plotting the dendogram it throws me the following error

 intervene pairwise --bedtools-options f=0.50 -i *.csv --htype dendrogram
Traceback (most recent call last):
  File "/home/rohit/miniconda3/envs/intervene/bin/intervene", line 606, in <module>
    main()
  File "/home/rohit/miniconda3/envs/intervene/bin/intervene", line 426, in main
    pairwise.pairwise_intersection(label_names, options)
  File "/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py", line 478, in pairwise_intersection
    heatmap_dendrogram(matrix,outfile, options)
  File "/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py", line 304, in heatmap_dendrogram
    sns.plt.setp(sns_plot.ax_heatmap.yaxis.get_majorticklabels(), rotation=0)
AttributeError: module 'seaborn' has no attribute 'plt'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants