Abstract: Estimating accurate 3D locations of objects from monocular images is a challenging problem because of lacking depth. Previous work shows that utilizing the object's keypoint projection constraints to estimate multiple depth candidates boosts the detection performance. However, the existing methods can only utilize vertical edges as projection constraints for depth estimation. So these methods only use a small number of projection constraints and produce insufficient depth candidates, leading to inaccurate depth estimation. In this paper, we propose a method that utilizes dense projection constraints from edges of any direction. In this way, we employ much more projection constraints and produce considerable depth candidates. Besides, we present a graph matching weighting module to merge the depth candidates. The proposed method DCD (Densely Constrained Detector) achieves state-of-the-art performance on the KITTI and WOD benchmarks. Code is released at this https URL.
DCD
Released code for Densely Constrained Depth Estimator for Monocular 3D Object Detection (ECCV22). arxiv Yingyan Li, Yuntao Chen, Jiawei He, Zhaoxiang Zhang
Code still needs to be cleaned, please wait a few days.
Environment
This repo is tested with Ubuntu 16.04, python==3.8, pytorch==1.7.0 and cuda==10.1.
conda create -n dcd python=3.8
conda activate dcd
conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.1 -c pytorch
pip install -r requirements.txt
You also need ot build DCNv2 and this project as:
cd DGDE/models/backbone/DCNv2
python setup.py develop
cd ../../..
python setup.py develop
Directory Structure
We need KITTI dataset and keypoints annotation Google Drive.
After download them, please organize as:
|DGDE
|dataset
|kitti
|training/
|calib/
|image_2/
|label/
|ImageSets/
|testing/
|calib/
|image_2/
|ImageSets/
|kpts_ann
|kpts_ann_train.json
|kpts_ann_val.json
Training and evaluation pipeline
The whole pipeline including 3 parts: a) training DGDE first. b) using DGDE to generate needed data for GMW. c) training GMW and evaluate.
a) training DGDE Training with 2 GPUs.
cd DGDE
CUDA_VISIBLE_DEVICES=0,1 \
python tools/plain_train_net.py --batch_size 8 --config runs/DGDE.yaml \
--output output/DGDE --num_gpus 2 \
b) using DGDE to generate needed data for GMW. Finishing training for DGDE, please generate data on 1 GPU as:
cd DGDE
CUDA_VISIBLE_DEVICES=0 \
python tools/plain_train_net.py --batch_size 8 --config runs/DGDE.yaml \
--output output/DGDE --num_gpus 1 \
--generate_for_GMW \
--ckpt output/DGDE/model_final.pth
after this step, you could see gen_data_train.json and gen_data_infer.json in DGDE/gen_data/
c) training GMW and evaluate.
cd GMW
python -m torch.distributed.launch --master_port 33521 --nproc_per_node=4 \
main.py --log-dir ./logs/GMW \
-b 8 --lr 1e-4 --epoch 100 --val_freq 5 \
--train_data_path ../DGDE/gen_data/gen_data_train.json \
--val_data_path ../DGDE/gen_data/gen_data_infer.json
It will be evaluated periodically. You can also run the following command for evaluation:
python -m torch.distributed.launch --master_port 24281 --nproc_per_node=4 \
main.py --log-dir ./logs/GMW/
-b 36 -e \
--resume logs/GMW/checkpoint_epoch_100.pth.tar
You can also use the pre-trained weights of DGDE,WGM (Google Drive).
Acknowlegment
The code is mainly based on MonoFlex and BPnP. Thanks for their great work.