Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialization of LSTM layers #3

Open
xlliu7 opened this issue Mar 11, 2016 · 21 comments
Open

Initialization of LSTM layers #3

xlliu7 opened this issue Mar 11, 2016 · 21 comments
Assignees
Labels

Comments

@xlliu7
Copy link

xlliu7 commented Mar 11, 2016

How did you initialize the cell state and the hidden state of the LSTM layers?
You gave an equation but didn't explain much. I wonder what the f_init function is. I read the code and guess it is a tanh function. How did you do that separately for the 3 layers? And I don't know what the X meant. Is it the feature of a single sample or a batch?

@xlliu7 xlliu7 changed the title The initialization of LSTM layers Initialization of LSTM layers Mar 12, 2016
@frajem
Copy link

frajem commented Mar 14, 2016

Hi @limingqishi
Can you replicate the results in the paper?

@xlliu7
Copy link
Author

xlliu7 commented Apr 12, 2016

Hi @frajem I have finished the experiment. I got an accuracy of 94.24% on the UCF11 test set and 39.5% on the HMDB51 split1. How are your results?

@GerardoHH
Copy link

Hi @limingqishi

Have you replicated the experiments of the paper ? I'm wondering, if you improved the acc. % of the paper. The reported UCF-11 accu % on paper are:

Softmax Regression (full CNN feature cube) 82.37
Avg pooled LSTM (@ 30 fps) 82.56
Max pooled LSTM (@ 30 fps) 81.60
Soft attention model (@ 30 fps, λ = 0) 84.96
Soft attention model (@ 30 fps, λ = 1) 83.52
Soft attention model (@ 30 fps, λ = 10) 81.44

I'm working on replicate the results, but I'm having lots of troubles.

@xlliu7
Copy link
Author

xlliu7 commented Apr 13, 2016

Hi @GerardoHH @kracwarlock
I randomly selected samples for training and testing on the UCF11 dataset. I noticed that many samples in the UCF11 set actually came from one long video. In that case the training set and test set can be similar. So overfitting can lead to high accuracy on the testset.
I think it would be better to do experiment on the HMDB51 set, where the train-test split files are included.
I have only tested soft attention model(@30fps, lambda=0) so far. I can't replicate the reported HMDB51 acc. %. The best performance currently is 39.5%. I wonder if I preprocessed data improperly.

@rishabh135
Copy link

@limingqishi @GerardoHH I had a query how can we test this code on UCF-11 dataset , when I downloaded the dataset from http://crcv.ucf.edu/data/UCF_YouTube_Action.php there are no .h5 files , so can you please help me in running this code , moreover what kind of computer specs are required to run this thing ( ram , os) ? Please do respond .

@xlliu7
Copy link
Author

xlliu7 commented Apr 13, 2016

Hi @rishabh135
h5 is the filename extension for HDF5 database, which is designed to store and organize large amounts of data. You need to create .h5 files yourself. This link may help you.
Good luck!

@rishabh135
Copy link

@limingqishi can you please also help with what kind of computer will be sufficient to run this code , I have 4GB RAM with 2GB Nvidia Geforce 820M graphic card in windows 7 , will it suffice ? As I was reading other answers , I saw that it has been run with 48GB ram previously , so will I not be able to run this on my pc ?

@xlliu7
Copy link
Author

xlliu7 commented Apr 13, 2016

@rishabh135
if you don't have enough RAM, you should use h5create and h5write to create h5 files. In that case, I guess 4GB RAM is enough.
I ran the code on a server with 3GB GPU memory. I think it's okay with 2GB memory, cuz the memory usage is less than 2GB most of the time. You can try to reduce batchsize if it doesn't work.
But the 820M card might be slow. If possible, you can use a server.

@GerardoHH
Copy link

@rishabh135 @limingqishi
I have a laptop with 16GB in RAM, and GPU 980M with 4GB in memory, and I have to reduce the test batch size from 256 to 128 because the out of memory error. The script scripts.evaluate_ucf11 takes like 2-3 hours to complete. And I'm sure that my preprocessing is wrong (h5 file).

@rishabh135
Copy link

@GerardoHH @kracwarlock @limingqishi I am facing issue while running the script , I get "No GPU board available" error , any idea what is causing this , and also I am not entirely clear what features I have to extract (SIFT , HOG , SURF) from videos and then stack them to get the .h5 file , any help in this will be tremendously useful .

@kyuusaku
Copy link

Hi @limingqishi
I am training and testing on the UCF11 dataset. It takes over 1 day to run one training epoch. I do not know why it is so slow. Can you share your running time and setting information with me?

My running environment is,
System: ubuntu 12.04.5 LTS GPU: Tesla K10.G2.8GB

My config file of theano .theanorc is,

[global]
floatX = float32
device = gpu
optimizer = fast_run

[lib]
cnmem = 0

[dnn]
enabled = True

[nvcc]
fastmath = True

[blas]
ldflags = -L/home/anaconda/lib -L/usr/lib -lf77blas -lcblas -latlas

@kracwarlock kracwarlock self-assigned this Apr 17, 2016
@kracwarlock
Copy link
Owner

Hi everyone. I am sorry for all the delay. I was very busy with my thesis and graduation. I am no longer at the University of Toronto but will try to reply regularly here.

@kracwarlock
Copy link
Owner

@limingqishi The cell state and hidden state initialization happens in these lines: https://github.com/kracwarlock/action-recognition-visual-attention/blob/master/src/actrec.py#L368-L369
It is basically done by:

  • your features for a batch are of shape (no of timesteps, batch size, 7x7, 1024)
  • you take the mean along dimension 0 and 2 and get a mean context (batch size, 1024)
  • this connects to a dense layer with the same size as the states and with tanh activation

I see that I did not release the multi-layer LSTM code. I will try to do that as soon as I have time. Till then this is how it is done https://github.com/kelvinxu/arctic-captions/blob/master/capgen.py#L542-L548. In the paper the X means the feature of a single sample. In the code everything is done on a batch.

@kracwarlock
Copy link
Owner

@frajem @limingqishi @GerardoHH Yes UCF-11 has no standard train test split and the accuracy will depend on the split. That's why we didn't report any further results on it. You can overfit and perform very well.

@kracwarlock
Copy link
Owner

@rishabh135 Also take a look at my comments two posts above this one. I will try to make this easier as soon as possible.

@kracwarlock
Copy link
Owner

@rishabh135 https://github.com/kracwarlock/action-recognition-visual-attention/blob/5e3d0ab792195594cd422252cbac3f01333eb7ee/util/README.md#gpu-locking
You should remove these lines. The GPU locking code was intended only for University of Toronto ML server users.

@kracwarlock
Copy link
Owner

@limingqishi Did you use a 3-layer LSTM for the experiments on HMDB-51? If not, that would do the trick. If yes, let me know all your hyperparams.

@rishabh135
Copy link

@kracwarlock how do we get the .h5 file from the youtube action dataset videos , do we need to first extract its features ("hog") and then stack them in a matrix , can anyone please mention a simple program to reduce the dataset to .h5 file and also what does train_labels.txt contain ?

@kracwarlock
Copy link
Owner

@rishabh135 If you can ask this on the relevant issue (#6) that would be great. If that issue does not cover your questions please open a separate issue.

@jacopocavazza
Copy link

Hi @kracwarlock! I am also trying to reproduce your results on HMDB-51 and Hollywood2 (after reading this post I think I will skip the UCF-11). Can you please share the files valid_labels.txt, train_labels.txt, test_labels.txt, train_filenames.txt, test_filenames.txt and valid_filenames.txt for that two datasets? I will appreciate it a lot :) :) :)

@kracwarlock
Copy link
Owner

@jacopocavazza hey can you open a new issue for that since this is not related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants