Feature request: Downsample inputs for faster analysis #26

richard-warren · 2019-11-11T23:24:37Z

Hi,

Many people collect videos at much higher spatial resolution than is necessary to perform accurate tracking (myself included). It would be great to have optional MaxPooling2D layer(s) at the input of DPK, which would downsample the input and cause the inference to be (way) faster. The output coordinates would need to be scaled up, etc. I think many would really benefit from the increased speed. What do you think?

Thanks,
Rick

DenisPolygalov · 2019-11-12T00:47:03Z

What type of downsampling you talking about? If it is about raw video and spatial or temporal downsampling - then it might be better to use OpenCV for that...

richard-warren · 2019-11-12T02:21:49Z

I'm suggesting spatial downsampling. Yes, it would definitely work with OpenCV, or ffmpeg. A maxpooling layer in the network itself may be faster (is this accurate?) and more convenient, but definitely not the only way to make it happen. Thanks!

jgraving · 2019-11-12T12:33:23Z

This is definitely possible but we would want to avoid adding too much complexity to the code. The easiest approach is probably to add an option for the TrainingGenerator that tells the model to downsample the input images to some specified resolution or by some factor (with a corresponding adjustment to the confidence maps).

A lot of the overhead of the processing time during inference is actually transferring the images into GPU memory, so I'm not sure how much faster this would be compared to preprocessing the frames with opencv. However, even if this isn't faster it would make using the code much simpler as everything is self-contained within the model.

That being said, MaxPooling2D is probably not the best option to accomplish this as it would add local artefactual distortions. Ideally you would want a custom layer DownSampling2D or ResizeImage similar to UpSampling2D that uses tf.image.resize, which includes proper image interpolation algorithms like bilinear interpolation.

This would also be useful for adjusting image resolution to a power of 2 (for downsampling and upsampling within the model), and could allow for variably sized images. I originally thought zero padding was the best way, but this seems like the better option.

richard-warren · 2019-11-12T16:35:29Z

Thanks Jake. So you prefer incorporating the downsampling in the model? If transferring to GPU is major bottleneck, would downsampling (with opencv) in the generator before transferring to GPU increase the speed?

One point on MaxPooling2D vs. more clever layers: those tracking mouse whiskers (or anything approaching 1 pixel thickness) might prefer max pooling, as it is more likely to preserve very thin features. Probably not super important, but perhaps worth considering.

Would the pooling layer automatically result in power-of-2 dimensions? I implemented zero padding in my branch. It would be nice to get rid of this, as it slows things down a bit.

Thanks again!

DenisPolygalov · 2019-11-13T01:49:19Z

(Below is a shameless self-PR)
If someone want just to reduce resolution of arbitrary length single or multiple video file(s) or stack of puctures and save result into lossless-compressed avi or multi-page tiff file you may consider to try my CaFFlow framework: https://github.com/DenisPolygalov/CaFFlow
In addition or instead of spatial downsampling one can perform any frame-wise operation available in OpenCV, such as crop, color conversion, flipping, filtering, also PCA removal, etc.

jgraving · 2019-11-13T08:19:48Z

I ran some tests and it looks like this is probably not worth implementing. The opencv resize function appears to be significantly faster on all counts. There's just a ton of overhead to move the images into GPU memory. Zero padding is cheap, so it's probably best to implement padding as the solution for odd sized images.

import cv2
cv2.setNumThreads(1) # test without parallelism
import tensorflow as tf
import numpy as np
tfl = tf.keras.layers

ORIGINAL = (1024, 1024)
RESIZED = (512, 512)

class CVResize:
    def __init__(self):
        inputs = tfl.Input((None, None, 3), dtype=tf.uint8)
        outputs = inputs[:, :32, :3, 0] # simulate keypoint outputs
        self.tf_model = tf.keras.Model(inputs, outputs)
    def __call__(self, images, size=RESIZED, batch_size=1):
        images = np.stack([cv2.resize(image, size, interpolation=cv2.INTER_NEAREST) for image in images])
        return self.tf_model.predict(images, batch_size=batch_size)

cv_resize = CVResize()

inputs = tfl.Input((None, None, 3), dtype=tf.uint8)
resized = tf.image.resize(inputs, RESIZED, method='nearest')
outputs = resized[:, :32, :3, 0] # simulate keypoint outputs

tf_resize = tf.keras.Model(inputs, outputs)

inputs = tfl.Input((None, None, 3), dtype=tf.uint8)
resized = tfl.MaxPooling2D(ORIGINAL[0] // RESIZED[0])(inputs)
outputs = resized[:, :32, :3, 0] # simulate keypoint outputs

tf_maxpool = tf.keras.Model(inputs, outputs)

images = np.random.randint(0, 255, (256, ORIGINAL[0], ORIGINAL[1], 3), dtype=np.uint8)

%timeit cv_resize(images, batch_size=1)

437 ms ± 15.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit cv_resize(images, batch_size=128)

455 ms ± 709 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit tf_resize.predict(images, batch_size=1)

1.36 s ± 27.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit tf_resize.predict(images, batch_size=128)

1.27 s ± 13.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit tf_maxpool.predict(images, batch_size=1)

5.46 s ± 164 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit tf_maxpool.predict(images, batch_size=128)

1.27 s ± 7.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

richard-warren · 2019-11-13T16:16:59Z

This is great. Thanks for running these tests. Do you have any plans on implementing an opencv resizing option, e.g. in the DataGenerator, along with automatic rescaling of the network outputs? If not I'll hack something together on my end.

Relatedly, I'm finding that deepposekit underperforms deeplabcut when there are long range spatial contingencies. See the image here, where the left and right paw in the top view get swapped. The bottom view is useful here for resolving ambiguities in the top view; I think the deeper networks may have an easier time with these long range contingencies due to greater receptive field size at the outputs. I'm thinking spatial downsampling of the inputs may actually increase accuracy for deepposekit by effectively increasing receptive field size... Lmk if there are any other parameters I can play with that may help deepposekit perform better under conditions like these. Thanks again!

jgraving · 2019-11-13T16:44:37Z

Shouldn't be too difficult to add, but it's not high priority at the moment. I'll need to think about how best to accomplish this. If you want to submit a PR I'm happy to work on it with you.

Do you mean performance between networks within DPK or between the two software packages? Swapping issues might be due to erroneous or overly-aggressive augmentation, especially if the FlipAxis augmenter is being used. If you could open another issue and provide more details such as the augmentation pipeline you're using and the network hyperparameters (i.e. model.get_config()) I can help troubleshoot.

richard-warren · 2019-11-13T19:11:34Z

Thanks! I'll open a new issue and let you know if I end up implementing the resizing.

richard-warren · 2019-11-15T17:46:51Z

I may try to implement a re-scaling option. If you have time (this isn't super high priority for me either), can you let me know if the following strategy seems alright?

Modify DataGenerator to accept an optional target dimensions parameter. Apply opencv resizing to get_images.
Add a resizing attribute to the BaseModel, which can be inherited from the DataGenerator.
Pass this parameter to Maxima2D and SubpixelMaxima2D, which then passes it along to find_maxima and ultimately to _find_maxima, where it can be incorporated into the actual predictions...

jgraving · 2019-11-18T10:17:46Z

Using opencv to resize images doesn't require any interaction with the BaseModel or the maxima layers. I would just modify the BaseGenerator with a resize kwarg (with resize=None as a default value). Then resize the images (and rescale the keypoints to match) within the generator methods (and adjust compute_image_shape) if the resize kwarg is passed. This allows the resize code to be used for any generator that inherits from BaseGenerator as long as kwargs are passed using super() as they are for DataGenerator and DLCGenerator with **kwargs. It would also be useful to add the same code within the VideoReader so it's easy to downscale video frames to the same size when running inference.

jgraving added the enhancement New feature or request label Nov 12, 2019

richard-warren closed this as completed Nov 13, 2019

richard-warren reopened this Nov 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Downsample inputs for faster analysis #26

Feature request: Downsample inputs for faster analysis #26

richard-warren commented Nov 11, 2019

DenisPolygalov commented Nov 12, 2019

richard-warren commented Nov 12, 2019

jgraving commented Nov 12, 2019

richard-warren commented Nov 12, 2019

DenisPolygalov commented Nov 13, 2019

jgraving commented Nov 13, 2019 •

edited

richard-warren commented Nov 13, 2019

jgraving commented Nov 13, 2019

richard-warren commented Nov 13, 2019

richard-warren commented Nov 15, 2019 •

edited

jgraving commented Nov 18, 2019

Feature request: Downsample inputs for faster analysis #26

Feature request: Downsample inputs for faster analysis #26

Comments

richard-warren commented Nov 11, 2019

DenisPolygalov commented Nov 12, 2019

richard-warren commented Nov 12, 2019

jgraving commented Nov 12, 2019

richard-warren commented Nov 12, 2019

DenisPolygalov commented Nov 13, 2019

jgraving commented Nov 13, 2019 • edited

richard-warren commented Nov 13, 2019

jgraving commented Nov 13, 2019

richard-warren commented Nov 13, 2019

richard-warren commented Nov 15, 2019 • edited

jgraving commented Nov 18, 2019

jgraving commented Nov 13, 2019 •

edited

richard-warren commented Nov 15, 2019 •

edited