Skip to content

Working on custom datasets.

yjxiong edited this page May 18, 2017 · 4 revisions

If you need to work with your own dataset using the TSN codebase. You will need to extend the code. But this is easy. The steps are summarized as follows.

1. Write a dataset parser

Dataset representation

Datasets and their annotations come in all kinds of formats. To keep the following stages uniform, we ask the dataset to provide a unified data structure of annotations, that is similar to the one use to represent UCF101 and HMDB51 in TSN.

The basic information for a video is represented as a Python tuple of (filename, class_label). Here class_label should be an integer. Usually, the dataset is separated into two sets of videos, "train" and "test". Each set is represented by a list of video tuples. One combination of "train" and "test" set forms a "split", which is again a tuple of the two corresponding lists.

A dataset can provide multiple splits. For example, the UCF101 and HMDB51 datasets both have 3 standard splits to conduct cross-validation in reporting system performance. Thus the parsers for these datasets return a list of 3 splits.

Here is an abstract way of illustrating the data structure

[ # dataset XX
    (
         [(filename_1, label_1), (filename_2, label_2),...], # train subset
         [(filename_10, label_10), (filename_20, label_20])...] # test subset
    ), #split 1
    (
         [(filename_3, label_3), (filename_7, label_7),...], # train subset
         [(filename_10, label_10), (filename_15, label_15])...] # test subset
    ) #split 2
]

Write a parser

Given the above data structure, the task is to write a parser. The examples can be seen in here and here.

In general, any function that translates your dataset annotation files to the above data structure will work in the TSN codebase.

2. Register the parser

With a well-written parser that conforms the above requirements, now it is time to make the framework recognize it. For this, we can add the parser function to a dict of parsers, its key will be the "name" of the dataset used by the framework, such as ucf101 and hmdb51. See here for how to do it.

3. Generate file lists

The training of TSN models relies on a set of file lists. Once we added the parser to the framework, we can use it to generate the file lists. The command for generating the list files is as simple as

bash scripts/build_file_list.sh DATASET_NAME FRAME_PATH

where the DATASET_NAME is what we used to register the parser in step 2.

Or you can use the more detailed command

python tools/build_file_list.py ${DATASET} ${FRAME_PATH} --shuffle --num_split 1

where num_split specify how many types of split this dataset has. shuffle indicates whether we should shuffle the file list for training.

Of course one has to extract the frames and optical flow images before this (see here).

4. Training and testing

With all the steps above completed, we can use the custom dataset just like how we deal with the provided one (UCF101 and HMDB51). The training and testing are all the same.

For example, ActivityNet v1.2 and v1.3 datasets both have 2 "splits". The first is to train on training subset and test on validation subset. The second is to train on training+validation subset and test on testing subset. One can submit the test results of the second splits to the official test server to get the performance metrics. We have implemented an example parser for ActivityNet dataset at

ActivityNet Parser Sample.