Initialization of LSTM layers #3

xlliu7 · 2016-03-11T05:23:54Z

How did you initialize the cell state and the hidden state of the LSTM layers?
You gave an equation but didn't explain much. I wonder what the f_init function is. I read the code and guess it is a tanh function. How did you do that separately for the 3 layers? And I don't know what the X meant. Is it the feature of a single sample or a batch?

frajem · 2016-03-14T13:23:37Z

Hi @limingqishi
Can you replicate the results in the paper?

xlliu7 · 2016-04-12T01:44:59Z

Hi @frajem I have finished the experiment. I got an accuracy of 94.24% on the UCF11 test set and 39.5% on the HMDB51 split1. How are your results?

GerardoHH · 2016-04-12T22:01:30Z

Hi @limingqishi

Have you replicated the experiments of the paper ? I'm wondering, if you improved the acc. % of the paper. The reported UCF-11 accu % on paper are:

Softmax Regression (full CNN feature cube) 82.37
Avg pooled LSTM (@ 30 fps) 82.56
Max pooled LSTM (@ 30 fps) 81.60
Soft attention model (@ 30 fps, λ = 0) 84.96
Soft attention model (@ 30 fps, λ = 1) 83.52
Soft attention model (@ 30 fps, λ = 10) 81.44

I'm working on replicate the results, but I'm having lots of troubles.

xlliu7 · 2016-04-13T01:42:45Z

Hi @GerardoHH @kracwarlock
I randomly selected samples for training and testing on the UCF11 dataset. I noticed that many samples in the UCF11 set actually came from one long video. In that case the training set and test set can be similar. So overfitting can lead to high accuracy on the testset.
I think it would be better to do experiment on the HMDB51 set, where the train-test split files are included.
I have only tested soft attention model(@30fps, lambda=0) so far. I can't replicate the reported HMDB51 acc. %. The best performance currently is 39.5%. I wonder if I preprocessed data improperly.

rishabh135 · 2016-04-13T05:08:05Z

@limingqishi @GerardoHH I had a query how can we test this code on UCF-11 dataset , when I downloaded the dataset from http://crcv.ucf.edu/data/UCF_YouTube_Action.php there are no .h5 files , so can you please help me in running this code , moreover what kind of computer specs are required to run this thing ( ram , os) ? Please do respond .

xlliu7 · 2016-04-13T10:58:49Z

Hi @rishabh135
h5 is the filename extension for HDF5 database, which is designed to store and organize large amounts of data. You need to create .h5 files yourself. This link may help you.
Good luck!

rishabh135 · 2016-04-13T11:14:44Z

@limingqishi can you please also help with what kind of computer will be sufficient to run this code , I have 4GB RAM with 2GB Nvidia Geforce 820M graphic card in windows 7 , will it suffice ? As I was reading other answers , I saw that it has been run with 48GB ram previously , so will I not be able to run this on my pc ?

xlliu7 · 2016-04-13T11:48:37Z

@rishabh135
if you don't have enough RAM, you should use h5create and h5write to create h5 files. In that case, I guess 4GB RAM is enough.
I ran the code on a server with 3GB GPU memory. I think it's okay with 2GB memory, cuz the memory usage is less than 2GB most of the time. You can try to reduce batchsize if it doesn't work.
But the 820M card might be slow. If possible, you can use a server.

GerardoHH · 2016-04-13T16:15:37Z

@rishabh135 @limingqishi
I have a laptop with 16GB in RAM, and GPU 980M with 4GB in memory, and I have to reduce the test batch size from 256 to 128 because the out of memory error. The script scripts.evaluate_ucf11 takes like 2-3 hours to complete. And I'm sure that my preprocessing is wrong (h5 file).

rishabh135 · 2016-04-13T17:17:15Z

@GerardoHH @kracwarlock @limingqishi I am facing issue while running the script , I get "No GPU board available" error , any idea what is causing this , and also I am not entirely clear what features I have to extract (SIFT , HOG , SURF) from videos and then stack them to get the .h5 file , any help in this will be tremendously useful .

kyuusaku · 2016-04-14T13:58:34Z

Hi @limingqishi
I am training and testing on the UCF11 dataset. It takes over 1 day to run one training epoch. I do not know why it is so slow. Can you share your running time and setting information with me?

My running environment is,
System: ubuntu 12.04.5 LTS GPU: Tesla K10.G2.8GB

My config file of theano .theanorc is,

[global]
floatX = float32
device = gpu
optimizer = fast_run

[lib]
cnmem = 0

[dnn]
enabled = True

[nvcc]
fastmath = True

[blas]
ldflags = -L/home/anaconda/lib -L/usr/lib -lf77blas -lcblas -latlas

kracwarlock · 2016-04-17T21:32:17Z

Hi everyone. I am sorry for all the delay. I was very busy with my thesis and graduation. I am no longer at the University of Toronto but will try to reply regularly here.

kracwarlock · 2016-04-17T21:32:21Z

@limingqishi The cell state and hidden state initialization happens in these lines: https://github.com/kracwarlock/action-recognition-visual-attention/blob/master/src/actrec.py#L368-L369
It is basically done by:

your features for a batch are of shape (no of timesteps, batch size, 7x7, 1024)
you take the mean along dimension 0 and 2 and get a mean context (batch size, 1024)
this connects to a dense layer with the same size as the states and with tanh activation

I see that I did not release the multi-layer LSTM code. I will try to do that as soon as I have time. Till then this is how it is done https://github.com/kelvinxu/arctic-captions/blob/master/capgen.py#L542-L548. In the paper the X means the feature of a single sample. In the code everything is done on a batch.

kracwarlock · 2016-04-17T21:35:44Z

@frajem @limingqishi @GerardoHH Yes UCF-11 has no standard train test split and the accuracy will depend on the split. That's why we didn't report any further results on it. You can overfit and perform very well.

kracwarlock · 2016-04-17T21:36:33Z

@rishabh135 Also take a look at my comments two posts above this one. I will try to make this easier as soon as possible.

kracwarlock · 2016-04-17T21:39:22Z

@rishabh135 https://github.com/kracwarlock/action-recognition-visual-attention/blob/5e3d0ab792195594cd422252cbac3f01333eb7ee/util/README.md#gpu-locking
You should remove these lines. The GPU locking code was intended only for University of Toronto ML server users.

kracwarlock · 2016-04-17T21:41:29Z

@limingqishi Did you use a 3-layer LSTM for the experiments on HMDB-51? If not, that would do the trick. If yes, let me know all your hyperparams.

rishabh135 · 2016-04-18T12:52:28Z

@kracwarlock how do we get the .h5 file from the youtube action dataset videos , do we need to first extract its features ("hog") and then stack them in a matrix , can anyone please mention a simple program to reduce the dataset to .h5 file and also what does train_labels.txt contain ?

kracwarlock · 2016-04-24T03:33:42Z

@rishabh135 If you can ask this on the relevant issue (#6) that would be great. If that issue does not cover your questions please open a separate issue.

jacopocavazza · 2016-10-03T14:06:16Z

Hi @kracwarlock! I am also trying to reproduce your results on HMDB-51 and Hollywood2 (after reading this post I think I will skip the UCF-11). Can you please share the files valid_labels.txt, train_labels.txt, test_labels.txt, train_filenames.txt, test_filenames.txt and valid_filenames.txt for that two datasets? I will appreciate it a lot :) :) :)

kracwarlock · 2016-10-04T05:23:24Z

@jacopocavazza hey can you open a new issue for that since this is not related

xlliu7 changed the title ~~The initialization of LSTM layers~~ Initialization of LSTM layers Mar 12, 2016

kracwarlock self-assigned this Apr 17, 2016

kracwarlock added the question label Apr 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialization of LSTM layers #3

Initialization of LSTM layers #3

xlliu7 commented Mar 11, 2016

frajem commented Mar 14, 2016

xlliu7 commented Apr 12, 2016

GerardoHH commented Apr 12, 2016

xlliu7 commented Apr 13, 2016

rishabh135 commented Apr 13, 2016

xlliu7 commented Apr 13, 2016

rishabh135 commented Apr 13, 2016

xlliu7 commented Apr 13, 2016

GerardoHH commented Apr 13, 2016

rishabh135 commented Apr 13, 2016

kyuusaku commented Apr 14, 2016

kracwarlock commented Apr 17, 2016

kracwarlock commented Apr 17, 2016

kracwarlock commented Apr 17, 2016

kracwarlock commented Apr 17, 2016

kracwarlock commented Apr 17, 2016

kracwarlock commented Apr 17, 2016

rishabh135 commented Apr 18, 2016

kracwarlock commented Apr 24, 2016

jacopocavazza commented Oct 3, 2016

kracwarlock commented Oct 4, 2016

Initialization of LSTM layers #3

Initialization of LSTM layers #3

Comments

xlliu7 commented Mar 11, 2016

frajem commented Mar 14, 2016

xlliu7 commented Apr 12, 2016

GerardoHH commented Apr 12, 2016

xlliu7 commented Apr 13, 2016

rishabh135 commented Apr 13, 2016

xlliu7 commented Apr 13, 2016

rishabh135 commented Apr 13, 2016

xlliu7 commented Apr 13, 2016

GerardoHH commented Apr 13, 2016

rishabh135 commented Apr 13, 2016

kyuusaku commented Apr 14, 2016

kracwarlock commented Apr 17, 2016

kracwarlock commented Apr 17, 2016

kracwarlock commented Apr 17, 2016

kracwarlock commented Apr 17, 2016

kracwarlock commented Apr 17, 2016

kracwarlock commented Apr 17, 2016

rishabh135 commented Apr 18, 2016

kracwarlock commented Apr 24, 2016

jacopocavazza commented Oct 3, 2016

kracwarlock commented Oct 4, 2016