Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape mismatch error when training with DLCDataGenerator #35

Open
monajalal opened this issue Dec 10, 2019 · 1 comment
Open

Shape mismatch error when training with DLCDataGenerator #35

monajalal opened this issue Dec 10, 2019 · 1 comment
Labels
bug Something isn't working

Comments

@monajalal
Copy link

monajalal commented Dec 10, 2019

Hi Jake, your DLC train notebook crashes when I use my own moth dataset. Here is my saved Jupyter notebook which also has the complete error log: https://colab.research.google.com/drive/1yr5YybbAtnSCdkOC4Gbw9GKhrIGATEGS?authuser=1

Could you please guide how to fix this error?

(/scratch3/3d_pose/DeepPoseKitEnv) [jalal@goku examples]$ pwd
/scratch3/3d_pose/animalpose/dpk/DeepPoseKit/examples


(/scratch3/3d_pose/DeepPoseKitEnv) [jalal@goku examples]$ python dlc_train.py 
1.15.0
{'Task': 'moth-filtered', 'scorer': 'Mona', 'date': 'Dec6', 'project_path': '/projectnb/ivcgroup/jalal/moth-filtered-Mona-2019-12-06', 'video_sets': {'/projectnb/ivcgroup/jalal/moth-filtered-Mona-2019-12-06/videos/moth.avi': {'crop': '0, 800, 0, 600'}}, 'bodyparts': ['head', 'rightWingTip', 'leftWingTip', 'abdomenTip'], 'start': 0, 'stop': 1, 'numframes2pick': 100, 'skeleton': [['bodypart1', 'bodypart2'], ['objectA', 'bodypart3']], 'skeleton_color': 'black', 'pcutoff': 0.1, 'dotsize': 12, 'alphavalue': 0.7, 'colormap': 'jet', 'TrainingFraction': [0.95], 'iteration': 0, 'resnet': None, 'snapshotindex': -1, 'batch_size': 8, 'cropping': False, 'x1': 0, 'x2': 640, 'y1': 277, 'y2': 624, 'corner2move2': [50, 50], 'move2corner': True, 'default_net_type': 'resnet_50', 'default_augmenter': 'default'}
WARNING:tensorflow:From /scratch3/3d_pose/DeepPoseKitEnv/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2019-12-10 15:04:24.550212: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-12-10 15:04:24.595099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:05:00.0
2019-12-10 15:04:24.596212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:06:00.0
2019-12-10 15:04:24.596536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-12-10 15:04:24.597751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-12-10 15:04:24.598866: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-12-10 15:04:24.599135: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-12-10 15:04:24.600569: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-12-10 15:04:24.601678: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-12-10 15:04:24.604910: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-10 15:04:24.610145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2019-12-10 15:04:24.610503: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-10 15:04:24.616205: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3597855000 Hz
2019-12-10 15:04:24.616695: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559cf7c5d7b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2019-12-10 15:04:24.616720: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2019-12-10 15:04:24.855663: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559cf7cf0e30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2019-12-10 15:04:24.855736: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-12-10 15:04:24.855764: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-12-10 15:04:24.864118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:05:00.0
2019-12-10 15:04:24.866477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:06:00.0
2019-12-10 15:04:24.866581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-12-10 15:04:24.866635: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-12-10 15:04:24.866683: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-12-10 15:04:24.866730: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-12-10 15:04:24.866777: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-12-10 15:04:24.866825: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-12-10 15:04:24.866872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-10 15:04:24.875663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2019-12-10 15:04:24.875759: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-12-10 15:04:24.883670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-10 15:04:24.883718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1 
2019-12-10 15:04:24.883749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N Y 
2019-12-10 15:04:24.883773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   Y N 
2019-12-10 15:04:24.889842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9972 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
2019-12-10 15:04:24.892445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10479 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1)
WARNING:tensorflow:From /scratch3/3d_pose/DeepPoseKitEnv/lib/python3.6/site-packages/deepposekit/models/backend/utils.py:35: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2019-12-10 15:04:44.034373: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-10 15:04:45.322586: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-12-10 15:04:48.539242: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
500/500 [==============================] - 13s 26ms/sample
37.74724634267033
/scratch3/3d_pose/DeepPoseKitEnv/lib/python3.6/site-packages/deepposekit/models/engine.py:145: UserWarning: 
Automatically compiling with default settings: model.compile('adam', 'mse')
Call model.compile() manually to use non-default settings.

  """\nAutomatically compiling with default settings: model.compile('adam', 'mse')\n"""
Epoch 1/100
Traceback (most recent call last):
  File "dlc_train.py", line 204, in <module>
    steps_per_epoch=200,
  File "/scratch3/3d_pose/DeepPoseKitEnv/lib/python3.6/site-packages/deepposekit/models/engine.py", line 174, in fit
    **kwargs
  File "/scratch3/3d_pose/DeepPoseKitEnv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 1296, in fit_generator
    steps_name='steps_per_epoch')
  File "/scratch3/3d_pose/DeepPoseKitEnv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/scratch3/3d_pose/DeepPoseKitEnv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 991, in train_on_batch
    extract_tensors_from_dataset=True)
  File "/scratch3/3d_pose/DeepPoseKitEnv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 2537, in _standardize_user_data
    y, self._feed_loss_fns, feed_output_shapes)
  File "/scratch3/3d_pose/DeepPoseKitEnv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_utils.py", line 741, in check_loss_and_target_compatibility
    ' while using as loss `' + loss_name + '`. '
ValueError: A target array with shape (5, 75, 100, 10) was passed for an output of shape (None, 74, 100, 10) while using as loss `mean_squared_error`. This loss expects targets to have the same shape as the output.
terminate called without an active exception
Aborted
(/scratch3/3d_pose/DeepPoseKitEnv) [jalal@goku examples]$ 

Here is a list of all packages I have installed:

(/scratch3/3d_pose/DeepPoseKitEnv) [jalal@goku examples]$ pip list
Package                Version            
---------------------- -------------------
absl-py                0.8.1              
astor                  0.8.0              
attrs                  19.3.0             
backcall               0.1.0              
bleach                 3.1.0              
certifi                2019.11.28         
chardet                3.0.4              
Click                  7.0                
cycler                 0.10.0             
decorator              4.4.1              
deeplabcut             2.1.4              
deepposekit            0.3.4              
defusedxml             0.6.0              
easydict               1.9                
entrypoints            0.3                
gast                   0.2.2              
google-pasta           0.1.8              
grpcio                 1.25.0             
h5py                   2.10.0             
idna                   2.8                
imageio                2.6.1              
imageio-ffmpeg         0.3.0              
imgaug                 0.3.0              
importlib-metadata     1.2.0              
intel-openmp           2020.0.133         
ipykernel              5.1.3              
ipython                7.10.1             
ipython-genutils       0.2.0              
ipywidgets             7.5.1              
jedi                   0.15.1             
Jinja2                 2.10.3             
joblib                 0.14.0             
jsonschema             3.2.0              
jupyter                1.0.0              
jupyter-client         5.3.4              
jupyter-console        6.0.0              
jupyter-core           4.6.1              
Keras-Applications     1.0.8              
Keras-Preprocessing    1.1.0              
kiwisolver             1.1.0              
Markdown               3.1.1              
MarkupSafe             1.1.1              
matplotlib             3.0.3              
mistune                0.8.4              
mock                   3.0.5              
more-itertools         8.0.2              
moviepy                1.0.1              
msgpack                0.6.2              
msgpack-numpy          0.4.4.3            
nbconvert              5.6.1              
nbformat               4.4.0              
networkx               2.4                
notebook               6.0.2              
numexpr                2.7.0              
numpy                  1.17.4             
opencv-python          3.4.5.20           
opencv-python-headless 4.1.2.30           
opt-einsum             3.1.0              
pandas                 0.25.3             
pandocfilters          1.4.2              
parso                  0.5.1              
patsy                  0.5.1              
pexpect                4.7.0              
pickleshare            0.7.5              
Pillow                 6.2.1              
pip                    19.3.1             
proglog                0.1.9              
prometheus-client      0.7.1              
prompt-toolkit         3.0.2              
protobuf               3.11.1             
psutil                 5.6.7              
ptyprocess             0.6.0              
Pygments               2.5.2              
pyparsing              2.4.5              
Pypubsub               4.0.3              
pyrsistent             0.15.6             
python-dateutil        2.8.1              
pytz                   2019.3             
PyWavelets             1.1.1              
PyYAML                 5.2                
pyzmq                  18.1.1             
qtconsole              4.6.0              
requests               2.22.0             
ruamel.yaml            0.16.5             
ruamel.yaml.clib       0.2.0              
scikit-image           0.16.2             
scikit-learn           0.22               
scipy                  1.3.3              
Send2Trash             1.5.0              
setuptools             42.0.2.post20191203
Shapely                1.6.4.post2        
six                    1.13.0             
statsmodels            0.10.1             
tables                 3.4.3              
tabulate               0.8.6              
tensorboard            1.15.0             
tensorflow-estimator   1.15.1             
tensorflow-gpu         1.15.0             
tensorpack             0.9.8              
termcolor              1.1.0              
terminado              0.8.3              
testpath               0.4.4              
tornado                6.0.3              
tqdm                   4.40.1             
traitlets              4.3.3              
urllib3                1.25.7             
wcwidth                0.1.7              
webencodings           0.5.1              
Werkzeug               0.16.0             
wheel                  0.33.6             
widgetsnbextension     3.5.1              
wrapt                  1.11.2             
wxPython               4.0.3              
zipp                   0.6.0              

Here's the Python code:

(/scratch3/3d_pose/DeepPoseKitEnv) [jalal@goku examples]$ cat dlc_train.py 
import sys
import tensorflow as tf
print(tf.__version__)
import numpy as np
import matplotlib.pyplot as plt
import glob

from deepposekit.io import TrainingGenerator, DLCDataGenerator
from deepposekit.augment import FlipAxis
import imgaug.augmenters as iaa
import imgaug as ia

from deepposekit.models import (StackedDenseNet,
                                DeepLabCut,
                                StackedHourglass,
                                LEAP)
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping

from deepposekit.callbacks import Logger, ModelCheckpoint
from deepposekit.models import load_model

import time
from os.path import expanduser

try:
    import google.colab
    IN_COLAB = True
except:
    IN_COLAB = False

data_generator = DLCDataGenerator(
    project_path='/scratch3/3d_pose/animalpose/experiments/moth-filtered-Mona-2019-12-06_95p_DONE/'
)

print(data_generator.dlcconfig)

data_generator.graph = np.array([-1, 0, 0, 0])

data_generator.swap_index = np.array([-1, 2, 1, -1])


image, keypoints = data_generator[0]

plt.figure(figsize=(5,5))
image = image[0] if image.shape[-1] is 3 else image[0, ..., 0]
cmap = None if image.shape[-1] is 3 else 'gray'
plt.imshow(image, cmap=cmap, interpolation='none')
for idx, jdx in enumerate(data_generator.graph):
    if jdx > -1:
        plt.plot(
            [keypoints[0, idx, 0], keypoints[0, jdx, 0]],
            [keypoints[0, idx, 1], keypoints[0, jdx, 1]],
            'r-'
        )

plt.scatter(keypoints[0, :, 0], keypoints[0, :, 1], c=np.arange(data_generator.keypoints_shape[0]), s=50, cmap=plt.cm.hsv, zorder=3)
plt.xlim(0, data_generator.image_shape[1])
plt.ylim(0, data_generator.image_shape[0])

plt.show()


augmenter = []

augmenter.append(FlipAxis(data_generator, axis=0))  # flip image up-down
augmenter.append(FlipAxis(data_generator, axis=1))  # flip image left-right 

sometimes = []
sometimes.append(iaa.Affine(scale={"x": (0.9, 1.1), "y": (0.9, 1.1)},
                            translate_percent={'x': (-0.5, 0.5), 'y': (-0.5, 0.5)},
                            shear=(-8, 8),
                            order=ia.ALL,
                            cval=ia.ALL)
                 )
sometimes.append(iaa.Affine(scale=(0.5, 1.5),
                            order=ia.ALL,
                            cval=ia.ALL)
                 )
augmenter.append(iaa.Sometimes(0.5, sometimes))
augmenter.append(iaa.Sometimes(0.5, iaa.Affine(rotate=(-180, 180),
                            order=ia.ALL,
                            cval=ia.ALL))
                 )
augmenter = iaa.Sequential(augmenter)







image, keypoints = data_generator[0]
image, keypoints = augmenter(images=image, keypoints=keypoints)
plt.figure(figsize=(5,5))
image = image[0] if image.shape[-1] is 3 else image[0, ..., 0]
cmap = None if image.shape[-1] is 3 else 'gray'
plt.imshow(image, cmap=cmap, interpolation='none')
for idx, jdx in enumerate(data_generator.graph):
    if jdx > -1:
        plt.plot(
            [keypoints[0, idx, 0], keypoints[0, jdx, 0]],
            [keypoints[0, idx, 1], keypoints[0, jdx, 1]],
            'r-'
        )

plt.scatter(keypoints[0, :, 0], keypoints[0, :, 1], c=np.arange(data_generator.keypoints_shape[0]), s=50, cmap=plt.cm.hsv, zorder=3)
plt.xlim(0, data_generator.image_shape[1])
plt.ylim(0, data_generator.image_shape[0])

plt.show()


train_generator = TrainingGenerator(generator=data_generator,
                                    downsample_factor=3,
                                    augmenter=augmenter,
                                    sigma=5,
                                    validation_split=0.1,
                                    use_graph=True,
                                    random_seed=1,
                                    graph_scale=1)
train_generator.get_config()


n_keypoints = data_generator.keypoints_shape[0]
batch = train_generator(batch_size=1, validation=False)[0]
inputs = batch[0]
outputs = batch[1]

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(10,10))
ax1.set_title('image')
ax1.imshow(inputs[0,...,0], cmap='gray', vmin=0, vmax=255)

ax2.set_title('posture graph')
ax2.imshow(outputs[0,...,n_keypoints:-1].max(-1))

ax3.set_title('keypoints confidence')
ax3.imshow(outputs[0,...,:n_keypoints].max(-1))

ax4.set_title('posture graph and keypoints confidence')
ax4.imshow(outputs[0,...,-1], vmin=0)
plt.show()

train_generator.on_epoch_end()


from deepposekit.models import DeepLabCut, StackedDenseNet, LEAP

#model = StackedDenseNet(train_generator, n_stacks=1, growth_rate=32, pretrained=True)
#model = DeepLabCut(train_generator, backbone="resnet50")
#model = DeepLabCut(train_generator, backbone="mobilenetv2", alpha=1.0) # Increase alpha to improve accuracy
model = DeepLabCut(train_generator, backbone="densenet121")
#model = LEAP(train_generator)
model.get_config()


data_size = (500,) + data_generator.image_shape
x = np.random.randint(0, 255, data_size, dtype="uint8")
y = model.predict(x[:100], batch_size=50) # make sure the model is in GPU memory
t0 = time.time()
y = model.predict(x, batch_size=50, verbose=1)
t1 = time.time()
print(x.shape[0] / (t1 - t0))


logger = Logger(validation_batch_size=10
    # filepath saves the logger data to a .h5 file
    # filepath=HOME + "/deeplabcut_log_dlcdensenet.h5", validation_batch_size=10
)



reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.2, verbose=1, patience=20)


model_checkpoint = ModelCheckpoint(
    "../../deeplabcut_best_model_dlcdensenet_moth.h5",
    monitor="val_loss",
    # monitor="loss" # use if validation_split=0
    verbose=1,
    save_best_only=True,
    optimizer=True, # Set this to True if you wish to resume training from a saved model
)

early_stop = EarlyStopping(
    monitor="val_loss",
    # monitor="loss" # use if validation_split=0
    min_delta=0.001,
    patience=100,
    verbose=1
)



callbacks = [early_stop, reduce_lr, model_checkpoint, logger]


model.fit(
    batch_size=5,
    validation_batch_size=10,
    callbacks=callbacks,
    #epochs=1000, # Increase the number of epochs to train the model longer
    epochs=100,
    n_workers=8,
    steps_per_epoch=200,
)


model = load_model(
    "../../deeplabcut_best_model_dlcdensenet_moth.h5",
    augmenter=augmenter,
    generator=data_generator,
)


model.fit(
    batch_size=5,
    validation_batch_size=10,
    callbacks=callbacks,
    #epochs=1000, # Increase the number of epochs to train the model longer
    epochs=100,
    n_workers=8,
    steps_per_epoch=200,

@jgraving
Copy link
Owner

jgraving commented Dec 11, 2019

What image resolution are you using? Both dimensions should be divisible by 2 multiple times.
Also I notice you seem to be using a 3d dataset from DeepLabCut, so I'm not sure if this will be compatible with the DLCDataGenerator

@jgraving jgraving changed the title ValueError: A target array with shape (5, 75, 100, 10) was passed for an output of shape (None, 74, 100, 10) while using as loss mean_squared_error. This loss expects targets to have the same shape as the output. terminate called without an active exception Aborted Shape mismatch error when training with DLCDataGenerator Dec 11, 2019
@jgraving jgraving added the bug Something isn't working label Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants