You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have follow the solution in RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm #8506
and High GPU memory during empirical NTK calculation #100
But it doesn't work for me.
This code can reproduce the problem:
import numpy as np
import cupy as cp
import torch
import torchvision.datasets as datasets
import torch.nn.functional as F
import jax
from jax import random
import jax.numpy as jnp
from jax.example_libraries import optimizers
from jax import jit, grad, vmap, pmap
import functools
import neural_tangents as nt
from neural_tangents import stax
from tqdm import tqdm
import gc
mempool = cp.get_default_memory_pool()
pinned_mempool = cp.get_default_pinned_memory_pool()
%env XLA_PYTHON_CLIENT_MEM_FRACTION=0.8
IIUC every for every(j, k) your batch size is (j + 2, k + 2), so it grows every step, hence there may not be a memory leak, but rather each step you perform a larger and larger computation. Could you double check that inputs[j:(j+1)*2], inputs[k:(k+1)*2] is the correct indices you want to compute?
kkeevin123456
changed the title
Question: I can release the memory in gpu during execute nt.empirical_ntk_fn
Question: I can't release the memory in gpu during execute nt.empirical_ntk_fn
Mar 25, 2022
I have follow the solution in RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm #8506
and High GPU memory during empirical NTK calculation #100
But it doesn't work for me.
This code can reproduce the problem:
And after it run several iteration, the error happen.
The text was updated successfully, but these errors were encountered: