Code for paper "Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories"

Code for paper "Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories"
Abstract: Tracking pixels in videos is typically studied as an optical flow estimation problem, where every pixel is described with a displacement vector that locates it in the next frame. Even though wider temporal context is freely available, prior efforts to take this into account have yielded only small gains over 2-frame methods. In this paper, we revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames. We re-build this classic approach using components that drive the current state-of-the-art in flow and object tracking, such as dense cost maps, iterative optimization, and learned appearance updates. We train our models using long-range amodal point trajectories mined from existing optical flow data that we synthetically augment with multi-frame occlusions. We test our approach in trajectory estimation benchmarks and in keypoint label propagation tasks, and compare favorably against state-of-the-art optical flow and feature tracking methods.

Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories

This is the official code release for our ECCV2022 paper on tracking particles through occlusions, presenting our new "particle video" method, Persistent Independent Particles (PIPs).

[Paper] [Project Page]


The lines below should set up a fresh environment with everything you need:

conda create --name pips
source activate pips
conda install pytorch=1.12.0 torchvision=0.13.0 cudatoolkit=11.3 -c pytorch
conda install pip
pip install -r requirements.txt


To download our reference model, run this:


To run this model on a sample video, run this:


This will run the model on a sequence included in demo_images/.

For each 8-frame subsequence, the model will return trajs_e. This is estimated trajectory data for the particles, shaped B,S,N,2, where S is the sequence length and N is the number of particles, and 2 is the x and y coordinates. The script will also produce tensorboard logs with visualizations, which go into logs_demo/, as well as a few gifs in ./*.gif.

In the tensorboard for logs_demo/ you should be able to find visualizations like this:

To track points across arbitrarily-long videos, run this:


In the tensorboard for logs_chain_demo/ you should be able to find visualizations like this:

This type of tracking is much more challenging, so you can expect to see more failures here. In particular, here we are using our visibility-aware chaining method, so mistakes tend to propagate into the future.

The original video is The file demo_images/ shows the ffmpeg command we used to export frames from the mp4.

Model implementation

To inspect our PIPs model implementation, the main file to look at is nets/

FlyingThings++ dataset

To create our FlyingThings++ dataset, first download FlyingThings. The data should look like:


Once you have the flows and masks, you can run python This will put 80G of trajectory data into:


In parallel, you can run python This will put 537M of occlusion data into:


This data will be loaded and joined with corresponding rgb by the FlyingThingsDataset class in, when training and testing.

(The suffixes "ad" and "al" are version counters.)

Once loaded by the dataloader (, the RGB will look like this:

The corresponding trajectories will look like this:


To train a model on the flyingthings++ dataset:


First it should print some diagnostic information about the model and data. Then, it should print a message for each training step, indicating the model name, progress, read time, iteration time, and loss.

model_name 1_8_128_I6_3e-4_A_tb89_21:34:46
loading FlyingThingsDataset [...] found 13085 samples in ../flyingthings (dset=TRAIN, subset=all, version=ad)
loading occluders [...] found 7856 occluders in ../flyingthings (dset=TRAIN, subset=all, version=al)
not using augs in val
loading FlyingThingsDataset [...] found 2542 samples in ../flyingthings (dset=TEST, subset=all, version=ad)
loading occluders...found 1631 occluders in ../flyingthings (dset=TEST, subset=all, version=al)
warning: updated load_fails (on this worker): 1/13085...
1_8_128_I6_3e-4_A_tb89_21:34:46; step 000001/100000; rtime 9.79; itime 20.24; loss = 40.30593
1_8_128_I6_3e-4_A_tb89_21:34:46; step 000002/100000; rtime 0.01; itime 0.37; loss = 43.12448
1_8_128_I6_3e-4_A_tb89_21:34:46; step 000003/100000; rtime 0.01; itime 0.36; loss = 36.60324
1_8_128_I6_3e-4_A_tb89_21:34:46; step 000004/100000; rtime 0.01; itime 0.38; loss = 40.91223
1_8_128_I6_3e-4_A_tb89_21:34:46; step 000005/100000; rtime 0.01; itime 0.35; loss = 79.32227
1_8_128_I6_3e-4_A_tb89_21:34:46; step 000006/100000; rtime 0.01; itime 0.53; loss = 22.14469
1_8_128_I6_3e-4_A_tb89_21:34:46; step 000007/100000; rtime 0.04; itime 0.46; loss = 24.75386

Occasional load_fails warnings are typically harmless. They indicate when the dataloader fails to get N trajectories for a given video, which simply causes a retry. If you greatly increase N (the number of trajectories), or reduce the crop size, you can expect this warning to occur more frequently, since these constraints make it more difficult to find viable samples. As long as your rtime (read time) is small, then things are basically OK.

To reproduce the result in the paper, you should train with 4 gpus, with horizontal and vertical flips, with a command like this:

python --horz_flip=True --vert_flip=True --device_ids=[0,1,2,3]


We provide evaluation scripts for all of the datasets reported in the paper. By default, these scripts will evaluate a PIPs model, with a checkpoint folder specified by the --init_dir argument.

You can also try a baseline, with --modeltype='raft' or --modeltype='dino'. To do this, you will also want to download a RAFT model. The DINO model should download itself, since torch makes this easy.


The CroHD head tracking data comes from the "Get all data" link on the Head Tracking 21 MOT Challenge page. Downloading and unzipping that should give you the folders HT21 and HT21Labels, which our dataloader relies on. After downloading the data (and potentially editing the dataset_location in, you can evaluate the model in CroHD with: python


To evaluate the model in Flyingthings++, first set up the data as described in the earlier section of this readme, then: python


The DAVIS dataset comes from the "TrainVal - Images and Annotations - Full-Resolution" link, on the DAVIS Challenge page. After downloading the data (and potentially editing the data_path in, you can visualize the model's outputs in DAVIS with: python


To evaluate the model in BAJDA, first follow the instructions at the BADJA repo. This will involve downloading DAVIS trainval full-resolution data. After downloading the data (and potentially editing BADJA_PATH in, you can evaluate the model in BADJA with: python


If you use this code for your research, please cite:

Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories. Adam W. Harley, Zhaoyuan Fang, Katerina Fragkiadaki. In ECCV 2022.


  title={Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories},
  author={Adam W Harley and Zhaoyuan Fang and Katerina Fragkiadaki},

Download Source Code

Download ZIP

Paper Preview

Aug 19, 2022