Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible solutions for GRNBoost2/GENIE3 Dask issues #163

Open
cflerin opened this issue May 1, 2020 · 2 comments
Open

Possible solutions for GRNBoost2/GENIE3 Dask issues #163

cflerin opened this issue May 1, 2020 · 2 comments

Comments

@cflerin
Copy link
Contributor

cflerin commented May 1, 2020

A recurring problem is that the GRN inference step of pySCENIC (using Arboreto's GRNBoost2/GENIE3 implementation) fails to complete successfully. This seems to be due to issues with newer Dask releases being incompatible with the existing GRNBoost2/GENIE3 implementation.

Possible errors

  • ValueError: Metadata mismatch found in from_delayed
  • Expected partition of type DataFrame but got NoneType
  • ValueError: tuple is not allowed for map key
  • ...

Possible solutions

  1. In many cases using an older version of the dask/distributed packages can help to fix this. This is ideally accomplished using the Docker images, which already contain the stable versions of these packages (see here for usage details). Or, to install these via pip:
    pip install dask==1.0.0 distributed'>=1.21.6,<2.0.0'
    
  1. Another option is to use a helper script (arboreto_with_multiprocessing.py) that runs the Arboreto GRN algorithms (GRNBoost2, GENIE3) without Dask for compatibility.
    See here, or the basic usage is:

    arboreto_with_multiprocessing.py \
        expr_mat.loom \
        allTFs_hg38.txt \
        --output adj.tsv \
        --num_workers 20 \
    
@hyjforesight
Copy link

hyjforesight commented May 9, 2022

Hello @cflerin
BUG report, may be caused by Dask.
pyscenic grn {EXP_MTX_QC_FNAME} {HUMAN_TFS_FNAME} -o {ADJACENCIES_FNAME} --num_workers 16 only works at --num_workers 16. If num_workers is more than 16, whatever the cell numbers or gene numbers, GRN hangs on forever or generates an error Worker exceeded 95% memory budget. Restarting . We tested this bug in the situations that cell numbers from 2000 to 40000, CPU cores from 16 to 40, memory from 64GB to 128GB both on Mac and Windows, this bug can be reproduced.
Similar issue is here #314
Thanks!
Best,
YJ

@cflerin
Copy link
Contributor Author

cflerin commented May 10, 2022

Hi @hyjforesight , You should create a new bug report and include all of the requested info on package versions. Having this info will make it much easier to address your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants