Prediction logs #358

fmigneault · 2021-01-18T18:29:24Z

I would like to contribute some piece of code I added to provide logging capabilities of video predictions to file.
This is useful to extract raw actions (class labels and confidence) over video segments.

The idea is simple. I override the draw_clip_range method of the demo visualizer to report predicted actions and bounding boxes to a predictions.log file instead of written onto output video frames.

To preserve original behavior of the demo visualizer, I add a option DEMO.OUTPUT_DISPLAY that is by default running the original demo code (eg: display video in window or written to file). When set to False, the logging override is used instead. This will write what would otherwise be drawn on the frames into text form in the log file under OUTPUT_DIR.

The resulting log will be similar to the following :

0000 [00000000, 00000063]:
    bbox: [490.73, 83.06, 664.0, 415.62], is predicted to class [0.96] stand, top-k=5: ['stand', 'listen to (a person)', 'watch (a person)', 'talk to (e.g., self, a person, a group)', 'carry/hold (an object)'], [0.9636, 0.5954, 0.5248, 0.3115, 0.2907]
    bbox: [152.0, 125.62, 306.25, 307.59], is predicted to class [0.32] talk to (e.g., self, a person, a group), top-k=5: ['talk to (e.g., self, a person, a group)', 'carry/hold (an object)', 'stand', 'watch (a person)', 'listen to (a person)'], [0.3207, 0.2872, 0.2723, 0.2469, 0.2244]
    bbox: [384.1, 90.81, 522.25, 281.81], is predicted to class [0.85] stand, top-k=5: ['stand', 'talk to (e.g., self, a person, a group)', 'listen to (a person)', 'carry/hold (an object)', 'watch (a person)'], [0.8543, 0.503, 0.503, 0.433, 0.3979]
0001 [00000000, 00000063]:
    bbox: [56.78, 62.19, 359.23, 418.54], is predicted to class [0.42] stand, top-k=5: ['stand', 'watch (a person)', 'sit', 'listen to (a person)', 'talk to (e.g., self, a person, a group)'], [0.4152, 0.3692, 0.3509, 0.2042, 0.1815]
    bbox: [300.56, 73.32, 658.27, 400.85], is predicted to class [0.72] stand, top-k=5: ['stand', 'watch (a person)', 'listen to (a person)', 'carry/hold (an object)', 'talk to (e.g., self, a person, a group)'], [0.7185, 0.4697, 0.4342, 0.4063, 0.2994]
0002 [00000064, 00000127]:
    bbox: [73.31, 63.27, 367.26, 430.84], is predicted to class [0.67] stand, top-k=5: ['stand', 'listen to (a person)', 'talk to (e.g., self, a person, a group)', 'sit', 'watch (a person)'], [0.6749, 0.3374, 0.2068, 0.2064, 0.1532]
    bbox: [325.06, 80.53, 645.83, 420.41], is predicted to class [0.77] sit, top-k=5: ['sit', 'talk to (e.g., self, a person, a group)', 'listen to (a person)', 'watch (a person)', 'stand'], [0.7728, 0.4179, 0.2485, 0.2396, 0.1557]
0003 [00000064, 00000127]:
    bbox: [299.5, 107.28, 712.75, 428.47], is predicted to class [0.55] stand, top-k=5: ['stand', 'watch (a person)', 'sit', 'walk', 'carry/hold (an object)'], [0.5498, 0.3916, 0.18, 0.154, 0.1356]
[...]

Each new sampled "clip section" is marked with <clip/task-id> [<start-frame>, <end-frame>], and then provides the predicted actions for each detected bounding box.

Above results where obtained using AVA checkpoint & classes, and Detectron2 predictor for bounding boxes.
The top-k mode and k=5 were used to generate these results, but outputs will adjust accordingly with thres mode or other values of k, in the same manner the original visualizer did.

…xes formatting of displayed results

doursand · 2021-03-01T13:15:57Z

@fmigneault thanks a lot , this is precisely what I was doing as well on my side , as I also need this functionality :-)
Just one suggestion though, i might be a good idea to organize the output in a more csv-ish format so that it could be easily fed into a panda dataframe. Also , I think it could be interesting to have an option to have both the demo output file AND the preds stored in a csv at the same time. But regardless, this is already good stuff, so thanks !

fmigneault · 2021-03-01T22:37:13Z

@doursand

i might be a good idea to organize the output in a more csv-ish format

I agree. The format could be adjusted to facilitate parsing. The format I proposed was good enough for my needs but it should be very straightforward to add an option to select the output format and have it log items line by line with the corresponding values of each prediction.

…e to PR facebookresearch/SlowFast#358 for SlowFast predictions

fmigneault added 5 commits January 8, 2021 13:58

ignore files

5b2cc90

code to only log predictions to file, no display

69d3a6e

log predictions by themselves in separate file + main stdout log + fi…

04a5dd2

…xes formatting of displayed results

fix clip-id/frame-range considering multi-device frame buffers indices

e74c414

update task-id and add description in docstring

b734b78

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 18, 2021

fmigneault added a commit to crim-ca/FrVD that referenced this pull request Aug 18, 2021

add description in English corresponding to French one + add referenc…

266b5de

…e to PR facebookresearch/SlowFast#358 for SlowFast predictions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction logs #358

Prediction logs #358

fmigneault commented Jan 18, 2021

doursand commented Mar 1, 2021

fmigneault commented Mar 1, 2021

Prediction logs #358

Are you sure you want to change the base?

Prediction logs #358

Conversation

fmigneault commented Jan 18, 2021

doursand commented Mar 1, 2021

fmigneault commented Mar 1, 2021