Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handling the case λmax = 0. #53

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Correctly handling the case λmax = 0. #53

wants to merge 4 commits into from

Conversation

barankarakus
Copy link
Contributor

@barankarakus barankarakus commented Sep 6, 2020

Fixes #51.

Two changes:

  1. Change to computeλ to ensure λmax = 0 leads to an output of [0] and
    not [NaN, ..., NaN].
  2. Change to fit! to ensure the case where autoλ = true and λmax = 0 is
    handled correctly (rather than throwing an error).

Two changes:
1) Change to computeλ to ensure λmax = 0 leads to an output of [0] and 
not [NaN, ..., NaN].
2) Change to fit! to ensure the case where autoλ = true and λmax = 0 is 
handled correctly (rather than throwing an error).
@barankarakus barankarakus changed the title Correctly handling the case λmax = 0. Correctly handling the case λmax = 0; fixes #51 Sep 6, 2020
@barankarakus barankarakus changed the title Correctly handling the case λmax = 0; fixes #51 Correctly handling the case λmax = 0. Sep 6, 2020
@coveralls
Copy link

coveralls commented Sep 6, 2020

Pull Request Test Coverage Report for Build 202

  • 19 of 19 (100.0%) changed or added relevant lines in 2 files are covered.
  • 7 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+6.6%) to 91.049%

Files with Coverage Reduction New Missed Lines %
src/segselect.jl 1 89.23%
src/coordinate_descent.jl 3 93.63%
src/Lasso.jl 3 88.05%
Totals Coverage Status
Change from base Build 198: 6.6%
Covered Lines: 885
Relevant Lines: 972

💛 - Coveralls

Copy link
Collaborator

@AsafManela AsafManela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this change.
It seems like the test case is basically one where there is no variation in y.
Do you think you could add a test for this case?

src/Lasso.jl Outdated Show resolved Hide resolved
@barankarakus
Copy link
Contributor Author

Added tests (and some more minor changes). Let me know if anything else needs done!

@@ -209,6 +209,10 @@ const MAX_DEV_FRAC = 0.999
# Compute automatic λ values based on λmax and λminratio
function computeλ(λmax, λminratio, α, nλ)
λmax /= α
if isapprox(λmax, 0; atol=1e-10) # then assuming λmax = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because I think lambda is not unitless, so if it is small or not depends on the data given.
How does glmnet in R (or GLMNet.jl) handle this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I've changed the equality check to an isapprox() check is due to floating point arithmetic leading to a lambdamax that should actually be zero being very close to zero but non-zero instead. Simple example when this happens is a design matrix X with entries sampled from U[0, 1] and y a non-zero vector with identical entries.

I agree, the data could be such that lambdamax is genuinely very small but non-zero.

That said, I think it would be very rare to encounter such data in practice... especially since lambdamax (for the linear model) scales linearly with X and y, and we tend to standardise these.

I see two approaches going forward:

  1. Keep this check as is - the case where it would fail to produce correct output basically never occurs anyway.

  2. Revert back to the equality check. The real case in which the package failed was the case where lambdamax was exactly zero, anyway. Moreover, even if lambdamax should be zero but instead is a very small number, there is no major problem: the solver works very fast and it is clear from the output that every value of lambda yields zero active coefficients.

I'll leave it to you to decide 😃.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally: I'm not sure how glmnet in R or Julia handles this.

return true
end

@test zero_variation_test() == true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use @test_log instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, any idea why the tests stopped working in julia v1.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use @test_log instead?

I agree. Will implement tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, any idea why the tests stopped working in julia v1.0?

Unfortunately nope!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Package fails when response is zero vector.
3 participants