Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: '72' when reading in Nanostring CosMx data #815

Open
josiejenyne opened this issue Apr 10, 2024 · 4 comments
Open

KeyError: '72' when reading in Nanostring CosMx data #815

josiejenyne opened this issue Apr 10, 2024 · 4 comments

Comments

@josiejenyne
Copy link

josiejenyne commented Apr 10, 2024

I am receiving the error below when I am loading in Nanostring data. I am following the same format I had used for the Nanostring FFPE Lung dataset used in the tutorial. I have also restructured the path to have the same exact folders and files as the Lung data. I am not sure what is causing the issue. I am using sq.read.nanostring() to read in the data. What is causing this issue? When I read in data from slide 2, I am given KeyError: '110'

KeyError                                  Traceback (most recent call last)
<ipython-input-14-ea9a562318f3> in <module>
      5     counts_file="FF_exprMat_file.csv",
      6     meta_file="FF_metadata_file.csv",
----> 7     fov_file='FF_fov_positions_file.csv'
      8 )

~/anaconda3/envs/py3_7/lib/python3.7/site-packages/squidpy/read/_read.py in nanostring(path, counts_file, meta_file, fov_file)
    238             if fname.endswith(file_extensions):
    239                 fov = str(int(pat.findall(fname)[0]))
--> 240                 adata.uns[Key.uns.spatial][fov]["images"][kind] = _load_image(path / subdir / fname)
    241 
    242     if fov_file is not None:

KeyError: '72'
@acjordan333
Copy link

Have you examined the structure of your fov positions file and the file from the FFPE lung dataset? If you compare them you might see that there is a difference between your file and the example file.

The lab I am in has been generating CosMx data from our own machine and we have had to alter the structure of our fov position file in order to get it to match the structure required for the sq.read.nanostring() function. Specifically, we had to take the 'FOV' column in the file, duplicate it to create a column named 'fov', and make that column the index column. It appears that Nanostring has been changing the structure of the flat files as they have been updating their software. We did not have this problem when we analyzed pilot data that was generated by Nanostring in late 2023.

We did not have the same error you are describing but it would not have been possible for us to upload our data without altering the fov file. I believe the scverse team will have to update the LoadNanostring function soon as more changes are coming to the structure of the files as Nanostring continues to make their updates. Hopefully this helps in some way.

@giovp
Copy link
Member

giovp commented Apr 22, 2024

hi both, thank you for raising this, indeed it's quite hard to keep track of all the changes that various companies implement on their pipeline's output format. The most up to date readers for technologies can be found in https://spatialdata.scverse.org/projects/io/en/latest/ , could you check if you can read the format with those, and if so it would be possibly easier to then use the spatialdata format in squipdy.

@josiejenyne
Copy link
Author

Hi all, I have used a more updated version of Python (3.11, previously was 3.7). I got a similar error again. I do have FOV 72 in both folders. I am not sure why, it does a similar thing with the data from the other slides but with multiple FOVs.

Here is the version of the packages:
scanpy==1.9.5 anndata==0.10.2 umap==0.5.4 numpy==1.25.2 scipy==1.11.3 pandas==2.1.1 scikit-learn==1.3.1 statsmodels==0.14.0 igraph==0.10.8 pynndescent==0.5.10 squidpy==1.4.1

WARNING: FOV `72` does not exist in CellComposite folder, skipping it.
WARNING: FOV `72` does not exist in CellLabels folder, skipping it.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 2
      1 #loading in FF
----> 2 adata = sq.read.nanostring(
      3     path = '/mnt/hpc/data/Internal_Tests/240210_CosMx_CoreTrainingData/CoreTrainingData/FF/20240127_011618_S2/CellStatsDir/test',
      4    # path="/home/genomics/genomics/data/Internal_Tests/240210_CosMx_CoreTrainingData/CoreTrainingData/FF/20240127_011618_S2/CellStatsDir/test",
      5     counts_file="FF_exprMat_new_file.csv",
      6     meta_file="FF_metadata_short_file.csv",
      7     fov_file='FF_fov_positions_file_alt.csv'
      8 )

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/squidpy/read/_read.py:267, in nanostring(path, counts_file, meta_file, fov_file)
    264                     continue
    266 if fov_file is not None:
--> 267     fov_positions = pd.read_csv(path / fov_file, header=0, index_col=fov_key)
    268     for fov, row in fov_positions.iterrows():
    269         try:

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/readers.py:948, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
    935 kwds_defaults = _refine_defaults_read(
    936     dialect,
    937     delimiter,
   (...)
    944     dtype_backend=dtype_backend,
    945 )
    946 kwds.update(kwds_defaults)
--> 948 return _read(filepath_or_buffer, kwds)

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/readers.py:617, in _read(filepath_or_buffer, kwds)
    614     return parser
    616 with parser:
--> 617     return parser.read(nrows)

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1748, in TextFileReader.read(self, nrows)
   1741 nrows = validate_integer("nrows", nrows)
   1742 try:
   1743     # error: "ParserBase" has no attribute "read"
   1744     (
   1745         index,
   1746         columns,
   1747         col_dict,
-> 1748     ) = self._engine.read(  # type: ignore[attr-defined]
   1749         nrows
   1750     )
   1751 except Exception:
   1752     self.close()

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:333, in CParserWrapper.read(self, nrows)
    330     data = {k: v for k, (i, v) in zip(names, data_tups)}
    332     names, date_data = self._do_date_conversions(names, data)
--> 333     index, column_names = self._make_index(date_data, alldata, names)
    335 return index, column_names, date_data

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:370, in ParserBase._make_index(self, data, alldata, columns, indexnamerow)
    367     index = None
    369 elif not self._has_complex_date_col:
--> 370     simple_index = self._get_simple_index(alldata, columns)
    371     index = self._agg_index(simple_index)
    372 elif self._has_complex_date_col:

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:402, in ParserBase._get_simple_index(self, data, columns)
    400 index = []
    401 for idx in self.index_col:
--> 402     i = ix(idx)
    403     to_remove.append(i)
    404     index.append(data[i])

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:397, in ParserBase._get_simple_index.<locals>.ix(col)
    395 if not isinstance(col, str):
    396     return col
--> 397 raise ValueError(f"Index {col} invalid")

ValueError: Index fov invalid

@giovp
Copy link
Member

giovp commented Apr 23, 2024

hi @josiejenyne it looks like the fov_key which is hardcoded as "fov" is not correct for your file. Again this is possibly because the company changed the spec or because your file has been modified. Either way, I would suggest to submit this issue to spatialdata-io or otherwise open a PR with a possible fix here in squidpy. For the PR, one option would be to pass the fov_key in the argument, alternatively modify the file to have the index of the fov id as fov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants