Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] early termination of pyscenic ctx with no error message #528

Open
Mira0507 opened this issue Feb 9, 2024 · 3 comments
Open

[BUG] early termination of pyscenic ctx with no error message #528

Mira0507 opened this issue Feb 9, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Mira0507
Copy link

Mira0507 commented Feb 9, 2024

I've been seeing my pyscenic ctx gets terminated in the middle of the run. My mystery is that it doesn't throw any error message as shown below:

  $ singularity shell -B path/to/working/directory aertslab-pyscenic-0.12.1.sif
  singularity> bash pyscenic_23.sh
  # running..
  # running..
  # running..
  2024-02-09 12:16:56,692 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): All regulons derived.

  2024-02-09 12:16:56,692 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): All regulons derived.

  2024-02-09 12:16:56,696 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): Done.

  2024-02-09 12:16:56,696 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): Done.
  Singularity>  

It comes back to my terminal without the Writing an output file message, which I'm expecting to see if it was successful.

My script is following:

$ cat pyscenic_23.sh
#!/bin/bash

pyscenic ctx \
    scrnaseq-pyscenic-tac1-chat/WT_all_adj.csv \
    cistarget-db/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    cistarget-db/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    --annotations_fname cistarget-db/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl \
    --expression_mtx_fname scrnaseq-pyscenic-tac1-chat/WT_all.loom \
    -o scrnaseq-pyscenic-tac1-chat/WT_all_reg.csv \
    --mask_dropouts \
    --num_workers 32 \

This run ended up creating an empty (zero byte) output file shown here:

$ la | grep WT_all_reg.csv
-rw-rw----. 1 username xxxx   0 Feb  9 15:47 WT_all_reg.csv

I'm assuming that my issue is unrelated to unmatched gene symbols across the input files as I found that over 1000 gene symbols intersected across the ranking (.feather), adjacency (adj.csv), TFs (allTFs_mm.txt), and motif (.tbl) files.

# Load ranking databases
feather = f_db_names['raw'][0]
feather = pd.read_feather(feather)

# Unique gene symbols
feather_g = list(set(list(feather.columns)))  # from feather file
tfs_g = list(pd.read_csv(tfs, header=None).iloc[:,0])  # from TF list
adj_g = list(set(pd.read_csv(adj_csv).loc[:, 'target'].to_list()))  # from adjacency
anno_g = list(set(pd.read_csv(f_motif_path, sep="\t").loc[:,'gene_name']))

# All gene symbols (w duplicates)
all_g = tfs_g + adj_g + feather_g + anno_g

# Retrieve gene symbols intersecting across the feather, tfs, adj, and tbl files
count = pd.Series(all_g).value_counts().to_dict()

# Save gene symbols found in every file to a list
g4 = []
for key, value in count.items():
    if value == 4:
        g4.append(key)

# >>> len(tfs_g)
# 1860

# >>> len(adj_g)
# 20746

# >>> len(feather_g)
# 24069

# >>> len(anno_g)
# 1412

# >>> g4[:6]
# ['Cpeb1', 'Pml', 'Rcor1', 'Rad21', 'Ascc1', 'Prox1']

# >>> len(g4)
# 1054

assert len(g4) > 0, "Ensure to have gene symbols matched across all input files."

Some testings done so far:

  • skipping --no_pruning parameter
  • skipping --mask_dropouts parameter
  • using v9 ranking files for mm10 (mm10__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather, mm10__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather) and/or v9 motif file (motifs-v9-nr.mgi-m0.001-o0.0.tbl)
    • v9 ranking file + v10 motif file
    • v10 ranking file + v9 motif file
    • v9 ranking file + v9 motif file

I saw a few issue reports on empty output but at least they got Writing an output file message.

My environment was HPC being summarized here:

  • OS
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterprise
Description:    Red Hat Enterprise Linux release 8.9 (Ootpa)
Release:        8.9
Codename:       Ootpa
  • pySCENIC version:
Singularity> pyscenic -h
usage: pyscenic [-h] {grn,add_cor,ctx,aucell} ...

Single-Cell rEgulatory Network Inference and Clustering (0.12.1+0.gce41b61.dirty)
  • Installation method: singularity build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1
  • Run environment: CLI in singularity
  • Package versions: singularity-ce version 4.0.1

I think my CLI code is relatively straightforward. I've been also trying with R and python but none of them are successful. I would like to get some hints or suggestions. Thank you very much for your time!

@Mira0507 Mira0507 added the bug Something isn't working label Feb 9, 2024
@ghuls
Copy link
Member

ghuls commented Feb 14, 2024

How much memory did you assign to the job? Try lowering the number of workers.

@Mira0507
Copy link
Author

Mira0507 commented Feb 14, 2024

Interesting. I got it worked out after changing the number of cpus from 32 to 8.
(memory was 100g)

Singularity> cat pyscenic_ctx_test.sh
#!/bin/bash

pyscenic ctx \
    scrnaseq-pyscenic-tac1-chat/WT_all_adj.csv \
    cistarget-db/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    cistarget-db/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    --annotations_fname cistarget-db/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl \
    --expression_mtx_fname scrnaseq-pyscenic-tac1-chat/WT_all.loom \
    -o scrnaseq-pyscenic-tac1-chat/WT_all_reg.csv \
    --mask_dropouts \
    --num_workers 8 

ingularity> bash pyscenic_ctx_test.sh
(running...)
2024-02-14 15:44:04,062 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): All regulons derived.

2024-02-14 15:44:04,062 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): All regulons derived.

2024-02-14 15:44:04,081 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): Done.

2024-02-14 15:44:04,081 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): Done.

2024-02-14 15:44:04,147 - pyscenic.cli.pyscenic - INFO - Writing results to file.

$ la | grep WT_all_reg.csv
-rw-rw----. 1 username xxxx 2.5M Feb 14 15:44 WT_all_reg.csv

I've never thought about more cpus causing problems. Do you have any guess about this situation?

Thank you so much for the discussion, @ghuls!

@ghuls
Copy link
Member

ghuls commented Feb 16, 2024

Each worker loads the databases again, so you need memory for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants