Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to submit multiple jobs running scglue #119

Open
piyushjo15 opened this issue May 9, 2024 · 2 comments
Open

Not able to submit multiple jobs running scglue #119

piyushjo15 opened this issue May 9, 2024 · 2 comments

Comments

@piyushjo15
Copy link

piyushjo15 commented May 9, 2024

Hi,

I am trying to submit multiple jobs to process scglue model run. The first job finishes on time but the rest of the jobs are stuck. Is this because of they are using the same pretrain directory? That is the last message I get for those runs that are stuck.

Thanks for your feedback.

Sorry for half message. I now have added the output from the run that I get int ".out" file:

Performing analysis using predicated lables, v1 approach :
[INFO] fit_SCGLUE: Pretraining SCGLUE model...
[INFO] autodevice: Using CPU as computation device.
[INFO] check_graph: Checking variable coverage...
[INFO] check_graph: Checking edge attributes...
[INFO] check_graph: Checking self-loops...
[INFO] check_graph: Checking graph symmetry...
[INFO] SCGLUEModel: Setting `graph_batch_size` = 39871
[INFO] SCGLUEModel: Setting `max_epochs` = 100
[INFO] SCGLUEModel: Setting `patience` = 9
[INFO] SCGLUEModel: Setting `reduce_lr_patience` = 5
[INFO] SCGLUETrainer: Using training directory: "glue/pretrain"

The error message in ".err" file. However, this could be the result of purposely killing the job

/software/miniconda/4.9.2/lib/python3.9/abc.py:98: FutureWarning: SparseDataset is deprecated and will be removed in late 2024. It has been replaced by the public classes CSRDataset and CSCDataset.

For instance checks, use `isinstance(X, (anndata.experimental.CSRDataset, anndata.experimental.CSCDataset))` instead.

For creation, use `anndata.experimental.sparse_dataset(X)` instead.

  return _abc_instancecheck(cls, instance)
/home/p541i/DATA/packages/scglue/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:28: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
  warnings.warn("The verbose parameter is deprecated. Please use get_last_lr() "
/software/miniconda/4.9.2/lib/python3.9/abc.py:98: FutureWarning: SparseDataset is deprecated and will be removed in late 2024. It has been replaced by the public classes CSRDataset and CSCDataset.

For instance checks, use `isinstance(X, (anndata.experimental.CSRDataset, anndata.experimental.CSCDataset))` instead.

For creation, use `anndata.experimental.sparse_dataset(X)` instead.

  return _abc_instancecheck(cls, instance)

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/software/miniconda/4.9.2/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/home/p541i/DATA/packages/scglue/lib/python3.9/site-packages/tensorboardX/event_file_writer.py", line 219, in run
    self._record_writer.flush()
  File "/home/p541i/DATA/packages/scglue/lib/python3.9/site-packages/tensorboardX/event_file_writer.py", line 69, in flush
    self._py_recordio_writer.flush()
  File "/home/p541i/DATA/packages/scglue/lib/python3.9/site-packages/tensorboardX/record_writer.py", line 193, in flush
    self._writer.flush()
OSError: [Errno 116] Stale file handle
Terminated
@Jeff1995
Copy link
Collaborator

Hi @piyushjo15. Thank you for your interest in GLUE! Did you forget to append the messages you get?

@piyushjo15
Copy link
Author

Hi @Jeff1995 I just want to follow up with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants