Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: zero-size array to reduction operation maximum which has no identity when running embed_data #200

Open
jmrussell opened this issue May 13, 2024 · 0 comments

Comments

@jmrussell
Copy link

Hello,

I am trying to run scg.tasks.embed_data on this dataset: https://storage.googleapis.com/linnarsson-lab-human/human_dev_GRCh38-3.0.0.h5ad from https://github.com/linnarsson-lab/developing-human-brain/

I will get to the point where the tqdm progress bar shows up for embedding cells, but after 90 or so it fails with this error message

  scGPT - INFO - match 23336/33538 genes in vocabulary of size 60697.
  /home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/nn/modules/transformer.py:282: UserWarning:
   enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer was not TransformerEncoderLay
  er
    warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_pat
  h}")
  Embedding cells:   0%|▏                                                          | 94/26031 [00:14<1:07:04,  6.44it/s]
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/scgpt/tasks/cell_emb.py", line 263, in em
  bed_data
      cell_embeddings = get_batch_cell_embeddings(
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/scgpt/tasks/cell_emb.py", line 122, in ge
  t_batch_cell_embeddings
      for data_dict in tqdm(data_loader, desc="Embedding cells"):
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/tqdm/std.py", line 1181, in __iter__
      for obj in iterable:
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 630
  , in __next__
      data = self._next_data()
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 134
  5, in _next_data
      return self._process_data(data)
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 137
  1, in _process_data
      data.reraise()
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/_utils.py", line 694, in reraise
      raise exception
  ValueError: Caught ValueError in DataLoader worker process 6.
  Original Traceback (most recent call last):
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line
  308, in _worker_loop
      data = fetcher.fetch(index)
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 5
  4, in fetch
      return self.collate_fn(data)
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/scgpt/data_collator.py", line 88, in __ca
  ll__
      expressions[self.keep_first_n_tokens :] = binning(
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/scgpt/preprocess.py", line 283, in binnin
  g
      if row.max() == 0:
    File "/home/jr2396/miniconda3/envs/scgpt-0.2.1/lib/python3.8/site-packages/numpy/core/_methods.py", line 41, in _ama
  x
      return umr_maximum(a, axis, None, out, keepdims, initial, where)
  ValueError: zero-size array to reduction operation maximum which has no identity

I don't think I'm hitting the RAM limit (I have 700GB of RAM) or the GPU-ram (I'm on a 40GB A100, and I only see about 8GB of GPU RAM allocated before it crashes).

I have tried subsetting the object down to 10% of it's size, and it still fails. I am able to successfully run this on data that we've generated in house. Any guidance would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant