Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results with HODLRSolver #128

Open
astrozot opened this issue Jun 7, 2020 · 2 comments
Open

Inconsistent results with HODLRSolver #128

astrozot opened this issue Jun 7, 2020 · 2 comments

Comments

@astrozot
Copy link

astrozot commented Jun 7, 2020

I am experimenting the use of the HODLR solver, but unfortunately I have seen some strange behaviour that at the moment prevents me from using it. Apparently, the computation of the log likelihood shows in various situations some apparent errors that I cannot really understand.

Consider the code (very similar to the tutorial on the big data):

import numpy as np
import george

np.random.seed(123)
n = 200
x = np.random.uniform(0, 10, n)
yerr = 0.1 * np.random.rand(n)
y = np.sin(x) + yerr * np.random.randn(n)

kernel = 1.0 * george.kernels.ExpSquaredKernel(1.0)

gp_basic = george.GP(kernel)
gp_basic.compute(x, yerr)
print(gp_basic.log_likelihood(y))

The printed result is 319.7841482562233. However, when using the the HODLRSolver:

gp_hodlr = george.GP(kernel, solver=george.HODLRSolver, seed=42)
gp_hodlr.compute(x, yerr)
print(gp_hodlr.log_likelihood(y))

the result is 296.8650016184164, that is quite off. Strangely enough, if I set in the code above n = 199 everything is back to normal and the HODLR solver gives results very close to the basic solver.

Is this expected and normal? In more complicated cases, it seems to me that this erratic behaviour can affect the convergence of the algorithm during the optimization of the hyperparameters.

@dfm
Copy link
Owner

dfm commented Jun 8, 2020

The HODLR solver is stochastic so it can be quite inaccurate especially when the system is not well conditioned. Generally this doesn't have a huge effect on hyperparameter inference, but these days I'd probably recommend using a different library if you need scalability because there has been a lot of work done on developing better tools. For example:

  1. If your problem is 1D (like that sample code), I'd recommend celerite. The interface is about the same, but it'll be orders of magnitude faster and numerically stable.
  2. If you need to work in higher dimensions, the best library that I know about is GPyTorch. It has several algorithms that should be fast and scale well, although I haven't used it much myself.

Hope this helps!

@astrozot
Copy link
Author

Thank you for your quick reply! I am still surprised by the behaviour oof the HODLR solver: I understand it is stochastic, but find it hard to comprehend why if fails completely when the number of points goes from 199 to 200 (in a case that, to me, looks quite well conditioned).

Anyway, I will follow your suggestions and give a try to GPyTorch (my problem is 2D). Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants