Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining MuData's – concat function #20

Open
ivirshup opened this issue Feb 23, 2022 · 14 comments · May be fixed by #58
Open

Combining MuData's – concat function #20

ivirshup opened this issue Feb 23, 2022 · 14 comments · May be fixed by #58
Labels
enhancement New feature or request
Milestone

Comments

@ivirshup
Copy link
Member

It's possible I'm not seeing them, but should there be concat (like anndata.concat) functionality here?

Maybe also merge (like scverse/anndata#658)? But that could be a separate issue.

@ivirshup ivirshup added the enhancement New feature or request label Feb 23, 2022
@cc36
Copy link

cc36 commented Apr 21, 2022

Hello,

I wanted to ask what is the best way to combine several samples in a MuData object and it seems like this existing issue points in that direction.

The approach I usually take for combining multiple AnnData object does not seem to work here:

holder = []

for n in folders:
    holder.append(mu.read_10x_h5("/home/jovyan/data/Multiome/DNAP/"+n+"/filtered_feature_bc_matrix.h5"))

adata = holder[0].concatenate(holder[1:], join='outer', index_unique=None)

Any help would be highly appreciated.

Thanks!

@ivirshup
Copy link
Member Author

I think the general approach would be to deconstruct the MuData into its constituent AnnData's, concatenate those with anndata.concat, and then put those into a new MuData.

@bio-la, did you have a function working here that you could share?

@cc36
Copy link

cc36 commented Apr 22, 2022

Thanks. I have tried the approach suggested, i.e. deconstructing into the constituent AnnData objects and concatenating those and it works well except that the AnnData.uns['files'] and AnnData.uns['atac'] information is lost in the concatenation.

I have tried using the uns_merge argument from the AnnData.concatenate function (https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.concatenate.html#anndata.AnnData.concatenate) but it does not seem to help in this case.

Do you have any suggestion for this?

Thank you in advance!

@ivirshup
Copy link
Member Author

I think this gets a bit more complicated. I'm unsure if there's going to be a good way to do this that plays well with muon.atac, though @mffrank or @gtca would be able to comment better.

I'm assuming you want to use the data in those fields downstream. How would you want those fields to be merged?

@bio-la
Copy link

bio-la commented Apr 24, 2022

@cc36 why are you trying to concatenate multiple atac anndata/mudata? I'm assuming you are talking about the atac.uns.xxx slots that are filled with fragments and peaks files by reading any single multiome 10x run with mu.read_10x_h5, but unless you have called peaks together on the original samples it doesn't make sense to concatenate peaks and files from separate folders.
i am not sure what would be the analytical tool that lets you call peaks from multiple samples using the same background fragment distribution and still output separate 10x-folders (samples). normally at the end of the aggregation step (joint peak calling) you would have one count matrix, one fragment matrix, one peak file and so on.

so, the behaviour you describe (losing those peaks and fragment files) is actually preventing you from doing something that would give you a false peak distribution per sample.
it may be that I'm missing something here, could you please expand on what exactly are you trying to do by concatenating multiple atac (multiome) anndata/mudata?
thanks!

@cc36
Copy link

cc36 commented Apr 25, 2022

@bio-la Thanks for your reply. You are right, I need to use the joint peak calling output, which I have not done and will now do. You can resolve this issue. Thanks a lot for your help!

@Zethson
Copy link
Member

Zethson commented May 25, 2022

(Fat fingers, sorry)

@sruthi-hub
Copy link

I am new to working with scATACseq. Would appreciate if @cc36 @bio-la @ivirshup one of you could share a few lines of code that ensures that there's no false peak distribution. Thanks!

@gtca
Copy link
Collaborator

gtca commented Dec 8, 2022

@sruthi-hub Hey, if this question is still relevant, could you elaborate on what the false peak distribution actually means?
If this is about peak properties, they can be quantified and visualised as for instance shown in this tutorial.

@ChaseTaylor939
Copy link

ChaseTaylor939 commented Mar 10, 2023

I'm having a similar issue when I try to concatenate two different multiome datasets. The RNA concatenates just fine, but the ATAC loses lots of metadata when I concatenate and the n_vars goes down to 13. I'm sorry, but I do not understand what @bio-la meant in their earlier explanation. Could someone provide some code on how they combine two or more multiome datasets?

Thank you!

9164-CT-1_Integration_01

Inked Multiome_ATAC_Concat_02

@gtca
Copy link
Collaborator

gtca commented Jun 1, 2023

Hey @ChaseTaylor939,

Concatenation is performed as described with inner join (for features) by default:

mod1 = AnnData(np.random.normal(size=(10,5)))
mod2 = AnnData(np.random.normal(size=(10,3)))
mod2.var_names
# Index(['0', '1', '2'], dtype='object')
anndata.concat([mod1, mod2]).shape
# => (20, 3)

I can assume peaks were called individually for each dataset (m9164_atac and m9412_atac), and 13 is the number of peaks that happen to have exactly the same definitions (chrN:XXX-YYY) across the samples then.
For peak-based analysis, peaks have to be either called jointly or merged across samples with special procedures.

@aichander
Copy link

+1 to having some inbuilt functionality that lets us concatenate 2 mudata objects with shared indices.

@lijxug
Copy link

lijxug commented Sep 13, 2023

Any progress on this issue? Or should we do what ivirshup sugested?

@gtca
Copy link
Collaborator

gtca commented Sep 13, 2023

Scheduled for mudata v0.3, which is in progress (#56), @lijxug!

Just to make it clear, this is about concatenation as in anndata.concat, which is not aware of genomic intervals, etc.

@gtca gtca added this to the v0.3.0 milestone Sep 13, 2023
@gtca gtca linked a pull request Sep 21, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants