Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN value when importing Visium dataset #797

Open
Lem-P opened this issue Feb 6, 2024 · 9 comments
Open

NaN value when importing Visium dataset #797

Lem-P opened this issue Feb 6, 2024 · 9 comments
Assignees

Comments

@Lem-P
Copy link

Lem-P commented Feb 6, 2024

Hi,
I am trying to import a dataset from 10X Visium H&E

I am importing the dataset with:

adata = sq.read.visium('path')
adata.var_names_make_unique()

Then pre-processing:

sc.pp.filter_cells(adata, min_counts = 1000)
sc.pp.filter_genes(adata, min_cells=5)
sc.pp.normalize_total(adata, inplace = True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor='seurat', n_top_genes=4000, inplace=True)
sc.pp.pca(adata, n_comps=50, use_highly_variable=True, svd_solver='arpack')
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.louvain(adata, key_added='clusters')

Then to calculate the image_features, I create my ImageContainer (not clear to me if I can do it before filtering or not).

library_id = "Mouse_2"
img = sq.im.ImageContainer(
    adata.uns["spatial"][library_id]["images"]["hires"],
    scale=adata.uns["spatial"][library_id]["scalefactors"]["tissue_hires_scalef"],
)

No problem until that point (except a lot of warnings about deprecated parameter in pandas)

I then do:

for scale in [1.0, 2.0]:
    feature_name = f"features_summary_scale{scale}"
    sq.im.calculate_image_features(
        adata,
        img.compute(),
        features="summary",
        key_added=feature_name,
        n_jobs=4,
        scale=scale,
    )

but get this error:

Traceback

in ImageContainer.generate_spot_crops(self, adata, spatial_key, library_id, spot_diameter_key, spot_scale, obs_names, as_array, squeeze, return_obs, **kwargs)
    820 radius = int(round(diameter // 2 * spot_scale))
    822 # get coords in image pixel space from original space
--> 823 y = int(spatial[i][1] * scale)
    824 x = int(spatial[i][0] * scale)
    826 # if CropCoords exist, need to offset y and x

ValueError: cannot convert float NaN to integer

If trying to do neighborhood enrichment with
sq.gr.spatial_neighbors(adata)

I got this
ValueError: Input X contains NaN.

Version

squidpy==1.4.1

@Lem-P Lem-P changed the title NaN value when importing Vision dataset NaN value when importing Visium dataset Feb 6, 2024
@giovp
Copy link
Member

giovp commented Feb 6, 2024

can you check whether you have nan in adata.obsm["spatial"]

@Lem-P
Copy link
Author

Lem-P commented Feb 6, 2024

Maybe a noob question, but how?
I have tried with

df = pd.DataFrame(adata.obsm["spatial"])
nan_count = df.isna().sum()

print(nan_count)

and got
0 1
1 1
dtype: int64

But not sure it is the right method

@giovp
Copy link
Member

giovp commented Feb 6, 2024

if np.isnan(adata.obsm["spatial"]).sum() return > 1 then you have nan and it's something in your data and possibly not related to squidpy

@Lem-P
Copy link
Author

Lem-P commented Feb 6, 2024

np.isnan(adata.obsm["spatial"]).sum() gives me 2 as output.
How can I found out where it's coming from? (the data comes from spaceranger-2.0.1)
How can I correct the dataset?

@giovp
Copy link
Member

giovp commented Feb 6, 2024

unfortunately I don't know, an option is also to just filter out cells that are like that, and also check in original raw data where that issue might arise.

@Lem-P
Copy link
Author

Lem-P commented Feb 7, 2024

After some testing, it is the sq.read.visium() function that create the issue.
If I create my AnnData with sc.read_visium() function, there are no NaN value in adata.obsm["spatial"] and I can go on with the rest of the analysis.
So there is indeed a bug with Squidpy, the workaround is to use Scanpy to import the Visium data

@michalk8
Copy link
Collaborator

michalk8 commented Feb 7, 2024

This can be because of this line: https://github.com/scverse/squidpy/blob/main/src/squidpy/read/_read.py#L94
@Lem-P do both of the same adata objects (from sq.read.visium() and sc.read_visium() have the same number of cells? The SquidPy function will keep all the cells in the adata and put NaNs for the coords if they are missing.

@Lem-P
Copy link
Author

Lem-P commented Feb 7, 2024

No, I have the same number of observations/cells and variables/genes in both objects.
But I found the problematic row.
In the object made with Scanpy : array([ 7335, 12140])
In the object made with Squidpy : array([nan, nan])
Would the space before the first value create the issue? Where is it coming from? Why would spaceranger suddenly add a space before a value?

@giovp
Copy link
Member

giovp commented Feb 13, 2024

ok, it seems the issue is then due to the visium reader, also reported in #746 , it has to do with space ranger versions I'm afraid. I won't have time to look at it soon but @Lem-P I would take a look at @scverse/spatialdata-io for a visium reader that should support all spaceranger versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants