Skip to content

Latest commit

 

History

History

ltr

LTR

A general PyTorch based framework for learning tracking representations.

Table of Contents

Quick Start

The installation script will automatically generate a local configuration file "admin/local.py". In case the file was not generated, run admin.environment.create_default_local_file() to generate it. Next, set the paths to the training workspace, i.e. the directory where the checkpoints will be saved. Also set the paths to the datasets you want to use. If all the dependencies have been correctly installed, you can train a network using the run_training.py script in the correct conda environment.

conda activate pytracking
python run_training.py train_module train_name

Here, train_module is the sub-module inside train_settings and train_name is the name of the train setting file to be used.

For example, you can train using the included default ATOM settings by running:

python run_training bbreg atom_default

Overview

The framework consists of the following submodules.

  • actors: Contains the actor classes for different trainings. The actor class is responsible for passing the input data through the network can calculating losses.
  • admin: Includes functions for loading networks, tensorboard etc. and also contains environment settings.
  • dataset: Contains integration of a number of training datasets, namely TrackingNet, GOT-10k, LaSOT, ImageNet-VID, DAVIS, YouTube-VOS, MS-COCO, SBD, LVIS, ECSSD, MSRA10k, and HKU-IS. Additionally, it includes modules to generate synthetic videos from image datasets.
  • data_specs: Information about train/val splits of different datasets.
  • data: Contains functions for processing data, e.g. loading images, data augmentations, sampling frames from videos.
  • external: External libraries needed for training. Added as submodules.
  • models: Contains different layers and network definitions.
  • trainers: The main class which runs the training.
  • train_settings: Contains settings files, specifying the training of a network.

Trackers

The framework currently contains the training code for the following trackers.

TaMOs

The following setting files can be used to train the TaMOs tracker. In addition to the typical tracking datasets used for single object trackers we further include TAO, YoutubeVOS and ImagenetVid training data. When training with TAO we use the BURST annotations since they provide a higher annotation frame rate. We converted those annotations to our own format TaoBurst.json.

  • tamos.tamos_resnet50: The default setting use for training with ResNet50 backbone.
  • tamos.tamos_swin_base: The default setting use for training with SwinBase backbone. If needed, the weights of the SwinBase backbone can be downloaded here.

RTS

Three steps are required to train RTS:

  • Download lasot_got10k_pregenerated_masks.zip.

    Unzip the archive in the pregenerated_masks set in ltr/admin/local.py.

  • Download the pretrained LWL weights lwl_stage2.pth.

    Save the weights in the pretrained_networks set in ltr/admin/local.py.

  • Use this setting for training with ResNet50 backbone: rts.rts50

ToMP

The following setting files can be used to train the ToMP tracker. We omit training with a separate test encoding since the training is more stable but leads to comparable performance. Set the flag to false to use the same setup as in the paper.

  • tomp.tomp50: The default setting use for training with ResNet50 backbone.
  • tomp.tomp101: The default setting use for training with ResNet101 backbone.

KeepTrack

In order to train KeepTrack the following three steps are required.

LWL

The following setting files can be used to train the LWL networks, or to know the exact training details.

  • lwl.lwl_stage1: The default settings used for initial network training with fixed backbone weights. We initialize the backbone ResNet with pre-trained Mask-RCNN weights. These weights can be obtained from here. Download and save these weights in env_settings().pretrained_networks directory.
  • lwl.lwl_stage2: The default settings used for training the final LWL model. This setting fine-tunes all layers in the model trained using lwl_stage1.
  • lwl.lwl_boxinit: The default settings used for training the bounding box encoder network in order to enable VOS with box initialization.

KYS

The following setting file can be used to train the KYS networks, or to know the exact training details.

  • kys.kys: The default settings used for training the KYS model with ResNet-50 backbone.

PrDiMP

The following setting files can be used to train the DiMP networks, or to know the exact training details.

  • dimp.prdimp18: The default settings used for training the PrDiMP model with ResNet-18 backbone.
  • dimp.prdimp50: The default settings used for training the PrDiMP model with ResNet-50 backbone.
  • dimp.super_dimp: Combines the bounding-box regressor of PrDiMP with the standard DiMP classifier and better training and inference settings.

DiMP

The following setting files can be used to train the DiMP networks, or to know the exact training details.

  • dimp.dimp18: The default settings used for training the DiMP model with ResNet-18 backbone.
  • dimp.dimp50: The default settings used for training the DiMP model with ResNet-50 backbone.

ATOM

The following setting file can be used to train the ATOM network, or to know the exact training details.

  • bbreg.atom: The settings used in the paper for training the network in ATOM.
  • bbreg.atom: Newer settings used for training the network in ATOM, also utilizing the GOT10k dataset.
  • bbreg.atom: Settings for ATOM with the probabilistic bounding box regression proposed in this paper.
  • bbreg.atom: The baseline ATOM* setting evaluated in this paper.

Training your own networks

To train a custom network using the toolkit, the following components need to be specified in the train settings. For reference, see atom.py.

  • Datasets: The datasets to be used for training. A number of standard tracking datasets are already available in dataset module.
  • Processing: This function should perform the necessary post-processing of the data, e.g. cropping of target region, data augmentations etc.
  • Sampler: Determines how the frames are sampled from a video sequence to form the batches.
  • Network: The network module to be trained.
  • Objective: The training objective.
  • Actor: The trainer passes the training batch to the actor who is responsible for passing the data through the network correctly, and calculating the training loss.
  • Optimizer: Optimizer to be used, e.g. Adam.
  • Trainer: The main class which runs the epochs and saves checkpoints.