DeepNewton is a convolution-RNN which can effectively predict the future frames of a given simple video. The collusions, momentum and speed is taken care by the conv-rnn inspired by Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting.
Below are the results for one and multiple objects: