[BUG] early termination of pyscenic ctx with no error message #528

Mira0507 · 2024-02-09T22:09:40Z

I've been seeing my pyscenic ctx gets terminated in the middle of the run. My mystery is that it doesn't throw any error message as shown below:

  $ singularity shell -B path/to/working/directory aertslab-pyscenic-0.12.1.sif
  singularity> bash pyscenic_23.sh
  # running..
  # running..
  # running..
  2024-02-09 12:16:56,692 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): All regulons derived.

  2024-02-09 12:16:56,692 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): All regulons derived.

  2024-02-09 12:16:56,696 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): Done.

  2024-02-09 12:16:56,696 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): Done.
  Singularity>

It comes back to my terminal without the Writing an output file message, which I'm expecting to see if it was successful.

My script is following:

$ cat pyscenic_23.sh
#!/bin/bash

pyscenic ctx \
    scrnaseq-pyscenic-tac1-chat/WT_all_adj.csv \
    cistarget-db/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    cistarget-db/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    --annotations_fname cistarget-db/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl \
    --expression_mtx_fname scrnaseq-pyscenic-tac1-chat/WT_all.loom \
    -o scrnaseq-pyscenic-tac1-chat/WT_all_reg.csv \
    --mask_dropouts \
    --num_workers 32 \

This run ended up creating an empty (zero byte) output file shown here:

$ la | grep WT_all_reg.csv
-rw-rw----. 1 username xxxx   0 Feb  9 15:47 WT_all_reg.csv

I'm assuming that my issue is unrelated to unmatched gene symbols across the input files as I found that over 1000 gene symbols intersected across the ranking (.feather), adjacency (adj.csv), TFs (allTFs_mm.txt), and motif (.tbl) files.

# Load ranking databases
feather = f_db_names['raw'][0]
feather = pd.read_feather(feather)

# Unique gene symbols
feather_g = list(set(list(feather.columns)))  # from feather file
tfs_g = list(pd.read_csv(tfs, header=None).iloc[:,0])  # from TF list
adj_g = list(set(pd.read_csv(adj_csv).loc[:, 'target'].to_list()))  # from adjacency
anno_g = list(set(pd.read_csv(f_motif_path, sep="\t").loc[:,'gene_name']))

# All gene symbols (w duplicates)
all_g = tfs_g + adj_g + feather_g + anno_g

# Retrieve gene symbols intersecting across the feather, tfs, adj, and tbl files
count = pd.Series(all_g).value_counts().to_dict()

# Save gene symbols found in every file to a list
g4 = []
for key, value in count.items():
    if value == 4:
        g4.append(key)

# >>> len(tfs_g)
# 1860

# >>> len(adj_g)
# 20746

# >>> len(feather_g)
# 24069

# >>> len(anno_g)
# 1412

# >>> g4[:6]
# ['Cpeb1', 'Pml', 'Rcor1', 'Rad21', 'Ascc1', 'Prox1']

# >>> len(g4)
# 1054

assert len(g4) > 0, "Ensure to have gene symbols matched across all input files."

Some testings done so far:

skipping --no_pruning parameter
skipping --mask_dropouts parameter
using v9 ranking files for mm10 (mm10__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather, mm10__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather) and/or v9 motif file (motifs-v9-nr.mgi-m0.001-o0.0.tbl)
- v9 ranking file + v10 motif file
- v10 ranking file + v9 motif file
- v9 ranking file + v9 motif file

I saw a few issue reports on empty output but at least they got Writing an output file message.

My environment was HPC being summarized here:

OS

LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterprise
Description:    Red Hat Enterprise Linux release 8.9 (Ootpa)
Release:        8.9
Codename:       Ootpa

pySCENIC version:

Singularity> pyscenic -h
usage: pyscenic [-h] {grn,add_cor,ctx,aucell} ...

Single-Cell rEgulatory Network Inference and Clustering (0.12.1+0.gce41b61.dirty)

Installation method: singularity build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1
Run environment: CLI in singularity
Package versions: singularity-ce version 4.0.1

I think my CLI code is relatively straightforward. I've been also trying with R and python but none of them are successful. I would like to get some hints or suggestions. Thank you very much for your time!

The text was updated successfully, but these errors were encountered:

ghuls · 2024-02-14T09:29:37Z

How much memory did you assign to the job? Try lowering the number of workers.

Mira0507 · 2024-02-14T21:19:52Z

Interesting. I got it worked out after changing the number of cpus from 32 to 8.
(memory was 100g)

Singularity> cat pyscenic_ctx_test.sh
#!/bin/bash

pyscenic ctx \
    scrnaseq-pyscenic-tac1-chat/WT_all_adj.csv \
    cistarget-db/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    cistarget-db/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    --annotations_fname cistarget-db/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl \
    --expression_mtx_fname scrnaseq-pyscenic-tac1-chat/WT_all.loom \
    -o scrnaseq-pyscenic-tac1-chat/WT_all_reg.csv \
    --mask_dropouts \
    --num_workers 8 

ingularity> bash pyscenic_ctx_test.sh
(running...)
2024-02-14 15:44:04,062 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): All regulons derived.

2024-02-14 15:44:04,062 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): All regulons derived.

2024-02-14 15:44:04,081 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): Done.

2024-02-14 15:44:04,081 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): Done.

2024-02-14 15:44:04,147 - pyscenic.cli.pyscenic - INFO - Writing results to file.

$ la | grep WT_all_reg.csv
-rw-rw----. 1 username xxxx 2.5M Feb 14 15:44 WT_all_reg.csv

I've never thought about more cpus causing problems. Do you have any guess about this situation?

Thank you so much for the discussion, @ghuls!

ghuls · 2024-02-16T16:11:27Z

Each worker loads the databases again, so you need memory for this.

Mira0507 added the bug Something isn't working label Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] early termination of pyscenic ctx with no error message #528

[BUG] early termination of pyscenic ctx with no error message #528

Mira0507 commented Feb 9, 2024

ghuls commented Feb 14, 2024

Mira0507 commented Feb 14, 2024 •

edited

ghuls commented Feb 16, 2024

[BUG] early termination of pyscenic ctx with no error message #528

[BUG] early termination of pyscenic ctx with no error message #528

Comments

Mira0507 commented Feb 9, 2024

ghuls commented Feb 14, 2024

Mira0507 commented Feb 14, 2024 • edited

ghuls commented Feb 16, 2024

Mira0507 commented Feb 14, 2024 •

edited