ValueError: zero-size array to reduction operation maximum which has no identity when running embed_data #200

jmrussell · 2024-05-13T20:27:28Z

Hello,

I am trying to run scg.tasks.embed_data on this dataset: https://storage.googleapis.com/linnarsson-lab-human/human_dev_GRCh38-3.0.0.h5ad from https://github.com/linnarsson-lab/developing-human-brain/

I will get to the point where the tqdm progress bar shows up for embedding cells, but after 90 or so it fails with this error message

  scGPT - INFO - match 23336/33538 genes in vocabulary of size 60697.
  /home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/nn/modules/transformer.py:282: UserWarning:
   enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer was not TransformerEncoderLay
  er
    warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_pat
  h}")
  Embedding cells:   0%|▏                                                          | 94/26031 [00:14<1:07:04,  6.44it/s]
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/scgpt/tasks/cell_emb.py", line 263, in em
  bed_data
      cell_embeddings = get_batch_cell_embeddings(
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/scgpt/tasks/cell_emb.py", line 122, in ge
  t_batch_cell_embeddings
      for data_dict in tqdm(data_loader, desc="Embedding cells"):
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/tqdm/std.py", line 1181, in __iter__
      for obj in iterable:
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 630
  , in __next__
      data = self._next_data()
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 134
  5, in _next_data
      return self._process_data(data)
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 137
  1, in _process_data
      data.reraise()
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/_utils.py", line 694, in reraise
      raise exception
  ValueError: Caught ValueError in DataLoader worker process 6.
  Original Traceback (most recent call last):
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line
  308, in _worker_loop
      data = fetcher.fetch(index)
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 5
  4, in fetch
      return self.collate_fn(data)
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/scgpt/data_collator.py", line 88, in __ca
  ll__
      expressions[self.keep_first_n_tokens :] = binning(
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/scgpt/preprocess.py", line 283, in binnin
  g
      if row.max() == 0:
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/numpy/core/_methods.py", line 41, in _ama
  x
      return umr_maximum(a, axis, None, out, keepdims, initial, where)
  ValueError: zero-size array to reduction operation maximum which has no identity

I don't think I'm hitting the RAM limit (I have 700GB of RAM) or the GPU-ram (I'm on a 40GB A100, and I only see about 8GB of GPU RAM allocated before it crashes).

I have tried subsetting the object down to 10% of it's size, and it still fails. I am able to successfully run this on data that we've generated in house. Any guidance would be appreciated.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: zero-size array to reduction operation maximum which has no identity when running embed_data #200

ValueError: zero-size array to reduction operation maximum which has no identity when running embed_data #200

jmrussell commented May 13, 2024

ValueError: zero-size array to reduction operation maximum which has no identity when running embed_data #200

ValueError: zero-size array to reduction operation maximum which has no identity when running embed_data #200

Comments

jmrussell commented May 13, 2024