You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Getting the following error when trying to read a zarr file
/hpc/projects/hca_integration/workspace/hca_pipelines/workflow/metrics/.snakemake/conda/152006085908115996d5d904bffd05fa_/lib/python3.9/site-packages/mudata/_core/mudata.py:491: UserWarning: Cannot join columns with the same name because var_names are intersecting.
warnings.warn(
/hpc/projects/hca_integration/workspace/hca_pipelines/workflow/metrics/.snakemake/conda/152006085908115996d5d904bffd05fa_/lib/python3.9/site-packages/mudata/_core/mudata.py:491: UserWarning: Cannot join columns with the same name because var_names are intersecting.
warnings.warn(
Traceback (most recent call last):
File "/hpc/projects/hca_integration/workspace/hca_pipelines/workflow/metrics/.snakemake/conda/152006085908115996d5d904bffd05fa_/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
Traceback (most recent call last):
File "/hpc/projects/hca_integration/workspace/hca_pipelines/workflow/metrics/.snakemake/conda/152006085908115996d5d904bffd05fa_/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 203, in pandas._libs.index.IndexEngine._get_loc_duplicates
File "pandas/_libs/index.pyx", line 203, in pandas._libs.index.IndexEngine._get_loc_duplicates
File "pandas/_libs/index.pyx", line 211, in pandas._libs.index.IndexEngine._maybe_get_bool_indexer
File "pandas/_libs/index.pyx", line 211, in pandas._libs.index.IndexEngine._maybe_get_bool_indexer
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index._unpack_bool_indexer
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index._unpack_bool_indexer
KeyError: 'ABHD17A'
The key 'ABHD17A' exists in the dataset and seems to be duplicated according to the part of the code that is run (although I couldn't confirm this in the anndata objects that I used to create the mudata zarr file.
The main issue here is actually ignoring the mod-order attribute in read_zarr.
The proper fix will be coming with v0.3.
More clarifications:
although I couldn't confirm this in the anndata objects
ABHD17A gene name is in all three modalities and hence is repeated three times in the global annotation.
Considering many gene name duplications across modalities, the order of global .var annotation (if there were any) would have been mixed up as there would be no way to know which modality the duplicated features really came from.
With correct mod-order the concatenation of var_names of individual modalities in the correct order matches the global var_names.
By the way, feature name duplications is something that MuData's design advises against. Is it multimodal data at all? Could the axes interface be useful here?
Describe the bug
Getting the following error when trying to read a zarr file
The key 'ABHD17A' exists in the dataset and seems to be duplicated according to the part of the code that is run (although I couldn't confirm this in the anndata objects that I used to create the mudata zarr file.
To Reproduce
example.h5mu.zarr.zip
Expected behaviour
No error when reading the file
System
Additional context
After some exploration, I found that making copies of the assigned values solved the problem for me
mudata/mudata/_core/mudata.py
Lines 789 to 790 in 29b5a11
I made those changes in PR #49
The text was updated successfully, but these errors were encountered: