Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NKT_mean output Nan, when the number of training sample is increased #198

Open
zhangbububu opened this issue Jan 28, 2024 · 3 comments
Open
Labels
question Further information is requested

Comments

@zhangbububu
Copy link

zhangbububu commented Jan 28, 2024

hi, i meet a confuse problem

init_fn, apply_fn, kernel_fn = stax.serial(
    stax.Dense(512, W_std=1.5, b_std=0.05), stax.Relu(do_stabilize=True),
    stax.Dense(512, W_std=1.5, b_std=0.05), stax.Relu(do_stabilize=True),
    stax.Dense(1, W_std=1.5, b_std=0.05)
)

s = 10
l = jnp.pi * -s
r = jnp.pi * s 
N_tr = 100
N_te = 5
train_xs = jnp.linspace(l, r , N_tr).reshape(-1, 1).astype(jnp.float64)
train_ys = jnp.sin(train_xs) + jnp.sin(2*train_xs).astype(jnp.float64)
test_xs = jnp.linspace(l, r, N_te).reshape(-1, 1).astype(jnp.float64)

predict_fn = nt.predict.gradient_descent_mse_ensemble(kernel_fn, train_xs,
                                                      train_ys, diag_reg=1e-4)
nkt_mean, nkt_covariance = predict_fn(x_test=test_xs, get='ntk',
                                        compute_cov=True)
print(f'{N_tr=}, {nkt_mean=}')



if i increate the number of training samples (N_tr), i will get a all NaN nkt_mean

image
image

@romanngg
Copy link
Contributor

I think the reason is that this 1D function is hard to fit with a Relu kernel, but sampling only 15 points makes it a simpler training objective, so it fits it with a lower diagonal regularizer. You can avoid NaNs by increasing diag_reg which I did below, but as you can see it's a poor fit in any case. (NTK prediction is orange with 1000 test points sampled).

1000 training points, diag_reg=1e-2:
download3
100 training points, diag_reg=1e-3:
download
15 training points, diag_reg=1e-4:
download2

@romanngg romanngg added the question Further information is requested label Jan 28, 2024
@zhangbububu
Copy link
Author

zhangbububu commented Jan 29, 2024

@romanngg

Thank you very much for your careful answer.

I am currently doing similar experiments. Can you tell me some ways to make NKT fit better for complex time series?

@romanngg
Copy link
Contributor

I guess for this particular example, knowing your training targets, a periodic nonlinearity would fit better (stax.Sin(), diag_reg=1e-4):

sin

Otherwise trying different architectures and plotting predictions or draws from the prior would be good to gain intuition for what works best. Note that for time series data of shape [batch_size, time_duration, n_features], I imagine you may want to use 1D-convolution stax.Conv/stax.ConvLocal over the time_duration axis, to incorporate time locality into your model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants