Skip to content

A Description of the Proposal Files

yjxiong edited this page Oct 23, 2017 · 1 revision

A proposal file in the SSN codebase is similar to those used in RCNN.

Here is a snippet from a real proposal file

# 1
3HHAEmr0Q34
1
1
1
86 0.0718 0.9654
11
86 0.9045 0.9625 0.1277 1.0000
86 0.8943 0.8943 0.0000 1.0000
86 0.8349 0.9595 0.1915 1.0000
86 0.3650 0.9121 0.6277 1.0000
86 0.3302 0.9037 0.6596 1.0000
86 0.2954 0.8936 0.6915 1.0000
86 0.2606 0.8810 0.7234 1.0000
86 0.2886 1.0000 0.1277 0.3856
86 0.2345 1.0000 0.1436 0.3537
86 0.1804 1.0000 0.1596 0.3218
86 0.1263 1.0000 0.1915 0.3059

This file can be described using the following tempolate

# INDEX
VIDEO_ID
NUM_UNITS
FPS
NUM_GT
(CLASS START END) x NUM_GT
NUM_PROP
(CLASS MAX_IOU MAX_OVERLAP START END) x NUM_PROP

In plain language, this file has a list of videos. Each video entry contains:

  • INDEX the index of this video starting from 1 on the first line.
  • VIDEO_ID the ID of the video, on the second line.
  • NUM_UNITS the next line has a number indicating the total units of time for this video. The unit can be a frame, a second, or a 1 for the normalized proposal files.
  • FPS The next line is for the frames per second (FPS) of this video, if the unit is frame, then this line will be 1. If the unit is
  • NUM_GT number of ground truth action instances in this video. This number can be set to 0 for testing videos where we do not have annotations.
  • (CLASS START END) x NUM_GT Then go NUM_GT lines of groundtruth action instances. Each instance has a CLASS id, the START and END in the unit used by this proposal file. For example, in SSN we use the frame unit. So here START and END will denote the starting and ending frame of the instance. In the provided normalized proposal files, the unit is 1. So the START and END will be a decimal number from 0 to 1. Actually, the [gen_proporal_list.py][https://github.com/yjxiong/action-detection/blob/master/gen_proposal_list.py] script is translating between these two units based on the actual number of frames extracted for each video on your machine.
  • NUM_PROP After the groundtruth instances come the proposals, lead by the total number of proposals for this video.
  • (CLASS MAX_IOU MAX_OVERLAP START END) x NUM_PROP similarly, proposals a recorded one per line for NUM_PROP lines. Compared with a groundtruth instance, a proposal has two more fields. MAX_IOU stands for the maximal intersection over union (IoU) of this proposal w.r.t. all groundtruth instances. MAX_OVERLAP stands for the maximal overlap with groundtruth proportional to the length of this proposal. Here the CLASS is for a proposal is the one from the groundtruth instance with the max IoU.
Clone this wiki locally