Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use batch to gradient_descent_mse_ensemble ? #196

Closed
yanglebupt opened this issue Dec 11, 2023 · 1 comment
Closed

How to use batch to gradient_descent_mse_ensemble ? #196

yanglebupt opened this issue Dec 11, 2023 · 1 comment
Labels
question Further information is requested

Comments

@yanglebupt
Copy link

Here is my code, I use a simple CNN for classification. Size of data is

train (1772, 45)
test (596, 45)
val (595, 45)

But I get an OOM Error, I want to know how to use batch in function gradient_descent_mse_ensemble for predict_fn

maxvalues = np.max(np.abs(train_features),axis=0)

def process_features(features, norm=True, conv1d=False):
    x = ((features / maxvalues) - 0.5) / 0.5 if norm else features
    n = x.shape[0]
    if conv1d:
        x = x.reshape((n,1,-1,1))
    return x

def ConvolutionalNetwork(dropout=None, isHalf=True, W_std=1.0, b_std=0.0):
    layers = []
        
    layers+=[
        stax.Conv(4, filter_shape=(3, 1), strides=(2, 1), padding='SAME', W_std=W_std, b_std=b_std),
        stax.LayerNorm(),
        stax.Relu()
    ]
        
    layers+=[
        stax.Conv(16, filter_shape=(3, 1), strides=(2, 1), padding='SAME', W_std=W_std, b_std=b_std),
        stax.LayerNorm(),
        stax.Relu()
    ]
        
    layers+=([
        stax.Conv(64, filter_shape=(3, 1), strides=(2, 1), padding='SAME', W_std=W_std, b_std=b_std),
        stax.LayerNorm(),
        stax.Relu()
    ] if not isHalf else [])
    
    layers+=([
        stax.Conv(64, filter_shape=(3, 1), strides=(2, 1), padding='SAME', W_std=W_std, b_std=b_std),
        stax.LayerNorm(),
        stax.Relu()
    ] if not isHalf else [])
        
        
    layers+=[stax.GlobalAvgPool()]
        
    if not isHalf:
        fcList = [
            stax.Dense(16,W_std, b_std),
            stax.LeakyRelu(0.01),
        ]
        if dropout is not None:
            fcList.append(stax.Dropout(dropout))
        fcList+=[
            stax.Dense(4,W_std, b_std),
            stax.LeakyRelu(0.01),
        ]
        if dropout is not None:
            fcList.append(stax.Dropout(dropout))
        fcList.append(stax.Dense(2,W_std, b_std))
    else:
        fcList = [
            stax.Dense(4,W_std, b_std),
            stax.LeakyRelu(0.01),
        ]
        if dropout is not None:
            fcList.append(stax.Dropout(dropout))
        fcList.append(stax.Dense(2,W_std, b_std))
        
    layers+=fcList    
    return stax.serial(*layers)


init_fn, apply_fn, kernel_fn = ConvolutionalNetwork()
kernel_fn = nt.batch(kernel_fn, batch_size=10)

learning_rate=1
predict_fn = predict.gradient_descent_mse_ensemble(kernel_fn, process_features(train_features, norm=True, conv1d=True), 
                                                   train_labels.reshape((-1,1)), learning_rate=learning_rate)


y_train_nngp, y_train_ntk = predict_fn(x_test=process_features(train_features[:10,:], norm=True, conv1d=True), get=('nngp', 'ntk'))
y_test_nngp, y_test_ntk = predict_fn(x_test=process_features(test_features[:10,:], norm=True, conv1d=True), get=('nngp', 'ntk'))
y_val_nngp, y_val_ntk = predict_fn(x_test=process_features(val_features[:10,:], norm=True, conv1d=True), get=('nngp', 'ntk'))

Here is error,

ValueError: Number of rows of kernel must divide batch size. Found n1 = 1772 and batch size = 10.

If I donot use batch, I get a OOM Error, I want to know to handle OOM error for CNN. If can show me some code for solution, I would be extremely grateful!

@romanngg
Copy link
Contributor

Sorry for the late reply!

Per the error message (which I assume is triggered here due to train_features having 1772 rows)

predict_fn = predict.gradient_descent_mse_ensemble(kernel_fn, process_features(train_features, norm=True, conv1d=True), 
                                                   train_labels.reshape((-1,1)), learning_rate=learning_rate)

our current implementation only supports batching when the batch size divides the number of training (and test/val) points.

I think the easiest workaround would be to

  • Use the nt.predict.gp_inferece API which performs equivalent computation but accepts k_train_train/test/val covariance matrices as inputs.
  • Compute these input k_train_train (k_train_test, k_train_val) matrices by calling nt.batch(kernel_fn, batch_size=10) on pairs of train, train (train, test, train, val), where each train/test/val are first padded with dummy rows so that their sizes are divisible by 10, and the resulting matrices k_train_train (k_train_test, k_train_val) are then truncated to remove the dummy covariance entries.

Sorry for the inconvenience, hope this helps!

@romanngg romanngg added the question Further information is requested label Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants