Not able to submit multiple jobs running scglue #119

piyushjo15 · 2024-05-09T21:30:20Z

Hi,

I am trying to submit multiple jobs to process scglue model run. The first job finishes on time but the rest of the jobs are stuck. Is this because of they are using the same pretrain directory? That is the last message I get for those runs that are stuck.

Thanks for your feedback.

Sorry for half message. I now have added the output from the run that I get int ".out" file:

Performing analysis using predicated lables, v1 approach :
[INFO] fit_SCGLUE: Pretraining SCGLUE model...
[INFO] autodevice: Using CPU as computation device.
[INFO] check_graph: Checking variable coverage...
[INFO] check_graph: Checking edge attributes...
[INFO] check_graph: Checking self-loops...
[INFO] check_graph: Checking graph symmetry...
[INFO] SCGLUEModel: Setting `graph_batch_size` = 39871
[INFO] SCGLUEModel: Setting `max_epochs` = 100
[INFO] SCGLUEModel: Setting `patience` = 9
[INFO] SCGLUEModel: Setting `reduce_lr_patience` = 5
[INFO] SCGLUETrainer: Using training directory: "glue/pretrain"

The error message in ".err" file. However, this could be the result of purposely killing the job

/software/miniconda/4.9.2/lib/python3.9/abc.py:98: FutureWarning: SparseDataset is deprecated and will be removed in late 2024. It has been replaced by the public classes CSRDataset and CSCDataset.

For instance checks, use `isinstance(X, (anndata.experimental.CSRDataset, anndata.experimental.CSCDataset))` instead.

For creation, use `anndata.experimental.sparse_dataset(X)` instead.

  return _abc_instancecheck(cls, instance)
/home/p541i/DATA/packages/scglue/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:28: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
  warnings.warn("The verbose parameter is deprecated. Please use get_last_lr() "
/software/miniconda/4.9.2/lib/python3.9/abc.py:98: FutureWarning: SparseDataset is deprecated and will be removed in late 2024. It has been replaced by the public classes CSRDataset and CSCDataset.

For instance checks, use `isinstance(X, (anndata.experimental.CSRDataset, anndata.experimental.CSCDataset))` instead.

For creation, use `anndata.experimental.sparse_dataset(X)` instead.

  return _abc_instancecheck(cls, instance)

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/software/miniconda/4.9.2/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/home/p541i/DATA/packages/scglue/lib/python3.9/site-packages/tensorboardX/event_file_writer.py", line 219, in run
    self._record_writer.flush()
  File "/home/p541i/DATA/packages/scglue/lib/python3.9/site-packages/tensorboardX/event_file_writer.py", line 69, in flush
    self._py_recordio_writer.flush()
  File "/home/p541i/DATA/packages/scglue/lib/python3.9/site-packages/tensorboardX/record_writer.py", line 193, in flush
    self._writer.flush()
OSError: [Errno 116] Stale file handle
Terminated

The text was updated successfully, but these errors were encountered:

Jeff1995 · 2024-05-10T03:27:16Z

Hi @piyushjo15. Thank you for your interest in GLUE! Did you forget to append the messages you get?

piyushjo15 · 2024-05-31T20:13:30Z

Hi @Jeff1995 I just want to follow up with this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to submit multiple jobs running scglue #119

Not able to submit multiple jobs running scglue #119

piyushjo15 commented May 9, 2024 •

edited

Jeff1995 commented May 10, 2024

piyushjo15 commented May 31, 2024

Not able to submit multiple jobs running scglue #119

Not able to submit multiple jobs running scglue #119

Comments

piyushjo15 commented May 9, 2024 • edited

Jeff1995 commented May 10, 2024

piyushjo15 commented May 31, 2024

piyushjo15 commented May 9, 2024 •

edited