How to use batch to gradient_descent_mse_ensemble ? #196

yanglebupt · 2023-12-11T14:56:56Z

Here is my code, I use a simple CNN for classification. Size of data is

train (1772, 45)
test (596, 45)
val (595, 45)

But I get an OOM Error, I want to know how to use batch in function gradient_descent_mse_ensemble for predict_fn

maxvalues = np.max(np.abs(train_features),axis=0)

def process_features(features, norm=True, conv1d=False):
    x = ((features / maxvalues) - 0.5) / 0.5 if norm else features
    n = x.shape[0]
    if conv1d:
        x = x.reshape((n,1,-1,1))
    return x

def ConvolutionalNetwork(dropout=None, isHalf=True, W_std=1.0, b_std=0.0):
    layers = []
        
    layers+=[
        stax.Conv(4, filter_shape=(3, 1), strides=(2, 1), padding='SAME', W_std=W_std, b_std=b_std),
        stax.LayerNorm(),
        stax.Relu()
    ]
        
    layers+=[
        stax.Conv(16, filter_shape=(3, 1), strides=(2, 1), padding='SAME', W_std=W_std, b_std=b_std),
        stax.LayerNorm(),
        stax.Relu()
    ]
        
    layers+=([
        stax.Conv(64, filter_shape=(3, 1), strides=(2, 1), padding='SAME', W_std=W_std, b_std=b_std),
        stax.LayerNorm(),
        stax.Relu()
    ] if not isHalf else [])
    
    layers+=([
        stax.Conv(64, filter_shape=(3, 1), strides=(2, 1), padding='SAME', W_std=W_std, b_std=b_std),
        stax.LayerNorm(),
        stax.Relu()
    ] if not isHalf else [])
        
        
    layers+=[stax.GlobalAvgPool()]
        
    if not isHalf:
        fcList = [
            stax.Dense(16,W_std, b_std),
            stax.LeakyRelu(0.01),
        ]
        if dropout is not None:
            fcList.append(stax.Dropout(dropout))
        fcList+=[
            stax.Dense(4,W_std, b_std),
            stax.LeakyRelu(0.01),
        ]
        if dropout is not None:
            fcList.append(stax.Dropout(dropout))
        fcList.append(stax.Dense(2,W_std, b_std))
    else:
        fcList = [
            stax.Dense(4,W_std, b_std),
            stax.LeakyRelu(0.01),
        ]
        if dropout is not None:
            fcList.append(stax.Dropout(dropout))
        fcList.append(stax.Dense(2,W_std, b_std))
        
    layers+=fcList    
    return stax.serial(*layers)


init_fn, apply_fn, kernel_fn = ConvolutionalNetwork()
kernel_fn = nt.batch(kernel_fn, batch_size=10)

learning_rate=1
predict_fn = predict.gradient_descent_mse_ensemble(kernel_fn, process_features(train_features, norm=True, conv1d=True), 
                                                   train_labels.reshape((-1,1)), learning_rate=learning_rate)


y_train_nngp, y_train_ntk = predict_fn(x_test=process_features(train_features[:10,:], norm=True, conv1d=True), get=('nngp', 'ntk'))
y_test_nngp, y_test_ntk = predict_fn(x_test=process_features(test_features[:10,:], norm=True, conv1d=True), get=('nngp', 'ntk'))
y_val_nngp, y_val_ntk = predict_fn(x_test=process_features(val_features[:10,:], norm=True, conv1d=True), get=('nngp', 'ntk'))

Here is error,

ValueError: Number of rows of kernel must divide batch size. Found n1 = 1772 and batch size = 10.

If I donot use batch, I get a OOM Error, I want to know to handle OOM error for CNN. If can show me some code for solution, I would be extremely grateful!

The text was updated successfully, but these errors were encountered:

romanngg · 2024-01-28T06:19:33Z

Sorry for the late reply!

Per the error message (which I assume is triggered here due to train_features having 1772 rows)

predict_fn = predict.gradient_descent_mse_ensemble(kernel_fn, process_features(train_features, norm=True, conv1d=True), 
                                                   train_labels.reshape((-1,1)), learning_rate=learning_rate)

our current implementation only supports batching when the batch size divides the number of training (and test/val) points.

I think the easiest workaround would be to

Use the nt.predict.gp_inferece API which performs equivalent computation but accepts k_train_train/test/val covariance matrices as inputs.
Compute these input k_train_train (k_train_test, k_train_val) matrices by calling nt.batch(kernel_fn, batch_size=10) on pairs of train, train (train, test, train, val), where each train/test/val are first padded with dummy rows so that their sizes are divisible by 10, and the resulting matrices k_train_train (k_train_test, k_train_val) are then truncated to remove the dummy covariance entries.

Sorry for the inconvenience, hope this helps!

romanngg added the question Further information is requested label Jan 28, 2024

yanglebupt closed this as completed May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use batch to gradient_descent_mse_ensemble ? #196

How to use batch to gradient_descent_mse_ensemble ? #196

yanglebupt commented Dec 11, 2023

romanngg commented Jan 28, 2024

How to use batch to gradient_descent_mse_ensemble ? #196

How to use batch to gradient_descent_mse_ensemble ? #196

Comments

yanglebupt commented Dec 11, 2023

romanngg commented Jan 28, 2024