Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enet/lasso fails with "ValueError: negative dimensions are not allowed" #165

Open
mgalardini opened this issue Jul 21, 2021 · 2 comments
Open

Comments

@mgalardini
Copy link
Owner

$ pyseer --version
pyseer 1.3.9

When running a lasso classification on some E. coli genomes I get the following error (the value of --cor-filter is irrelevant)

$ pyseer --phenotypes phenotypes.tsv --phenotype-column phenotype --kmers unitigs.txt --uncompressed --distances distances.tsv --wg enet --alpha 1 --cpu 1 --cor-filter 0.99 --load-vars vars
Read 910 phenotypes
Detected binary phenotype
Structure matrix has dimension (910, 910)
Analysing 910 samples found in both phenotype and structure matrix
Reading all variants
Analysing 910 samples found in both phenotype and loaded npy
Applying correlation filtering
100%|██████████| 2230352/2230352 [14:53<00:00, 2495.12variants/s]
Fitting elastic net to top 22301 variants
Warning: Non-fatal error in glmnet library call: error code =  -1
Check results for accuracy. Partial or no results returned.
Traceback (most recent call last):
  File "/fast-storage/miniconda3/envs/pyseer/bin/pyseer", line 10, in <module>
    sys.exit(main())
  File "/fast-storage/miniconda3/envs/pyseer/lib/python3.7/site-packages/pyseer/__main__.py", line 655, in main
    options.cpu)
  File "/fast-storage/miniconda3/envs/pyseer/lib/python3.7/site-packages/pyseer/enet.py", line 174, in fit_enet
    nfolds = n_folds, alpha = alpha, parallel = n_cpus, weights = weights)
  File "/fast-storage/miniconda3/envs/pyseer/lib/python3.7/site-packages/glmnet_python/cvglmnet.py", line 286, in cvglmnet
    newFit = doCV(i, x, y, family, foldid, nfolds, is_offset, **options)
  File "/fast-storage/miniconda3/envs/pyseer/lib/python3.7/site-packages/glmnet_python/cvglmnet.py", line 353, in doCV
    newFit = glmnet(x = xr, y = yr, family = family, **opts)    
  File "/fast-storage/miniconda3/envs/pyseer/lib/python3.7/site-packages/glmnet_python/glmnet.py", line 456, in glmnet
    thresh, isd, intr, maxit, kopt, family)
  File "/fast-storage/miniconda3/envs/pyseer/lib/python3.7/site-packages/glmnet_python/lognet.py", line 311, in lognet
    beta = numpy.zeros([nvars,lmu], dtype = numpy.float64)
ValueError: negative dimensions are not allowed

Note the "Warning: Non-fatal error in glmnet library call: error code = -1 Check results for accuracy. Partial or no results returned." message.

There is no error with ridge classification (using the default value for --alpha). I am trying to come up with a minimal example to reproduce this, but I think at any rate we could handle this by catching the exception whenever cvglmnet is called from pyseer.

@johnlees
Copy link
Collaborator

Hmm interesting, haven't seen that one before. I would think, looking at some of the glmnet code, that this might be non-convergence and maxiter could be increased

@mgalardini
Copy link
Owner Author

I see! I'll try setting maxit to a larger number when calling cvglmnet and see what happens. If that is really the culprit we could use a larger default either upstream or through the function calls in pyseer. At any rate we should also catch the exception to fail more gracefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants