Code for paper "PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images"

Code for paper "PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images"
Abstract: We present PyMAF-X, a regression-based approach to recovering a full-body parametric model from a single image. This task is very challenging since minor parametric deviation may lead to noticeable misalignment between the estimated mesh and the input image. Moreover, when integrating part-specific estimations to the full-body model, existing solutions tend to either degrade the alignment or produce unnatural wrist poses. To address these issues, we propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop in our regression network for well-aligned human mesh recovery and extend it as PyMAF-X for the recovery of expressive full-body models. The core idea of PyMAF is to leverage a feature pyramid and rectify the predicted parameters explicitly based on the mesh-image alignment status. Specifically, given the currently predicted parameters, mesh-aligned evidence will be extracted from finer-resolution features accordingly and fed back for parameter rectification. To enhance the alignment perception, an auxiliary dense supervision is employed to provide mesh-image correspondence guidance while spatial alignment attention is introduced to enable the awareness of the global contexts for our network. When extending PyMAF for full-body mesh recovery, an adaptive integration strategy is proposed in PyMAF-X to produce natural wrist poses while maintaining the well-aligned performance of the part-specific estimations. The efficacy of our approach is validated on several benchmark datasets for body-only and full-body mesh recovery, where PyMAF and PyMAF-X effectively improve the mesh-image alignment and achieve new state-of-the-art results. The project page with code and video results can be found at this https URL.

PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images

Hongwen Zhang · Yating Tian · Yuxiang Zhang · Mengcheng Li · Liang An · Zhenan Sun · Yebin Liu

Project Page | Video | Paper


Frame by frame reconstruction. Video clipped from here.

Reconstruction result on a COCO validation image.
Click Here for More Results

Installation

  • Python 3.8
conda create --no-default-packages -n pymafx python=3.8
conda activate pymafx

packages

conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install "git+https://github.com/facebookresearch/[email protected]"
  • other packages listed in requirements.txt
pip install -r requirements.txt

necessary files

smpl_downsampling.npz & mano_downsampling.npz

  • Run the following script to fetch necessary files.
bash fetch_data.sh

SMPL & SMPL-X model files

Download the partial_mesh files and put it into the ./data/partial_mesh directory.

Download the pre-trained model and put it into the ./data/pretrained_model directory.

After collecting the above necessary files, the directory structure of ./data is expected as follows.

./data
├── J_regressor_extra.npy
├── smpl_mean_params.npz
├── smpl_downsampling.npz
├── mano_downsampling.npz
├── partial_mesh
│   └── ***_vids.npz
├── pretrained_model
│   └── PyMAF-X_model_checkpoint.pt
└── smpl
    ├── SMPLX_NEUTRAL.npz
    ├── SMPL_NEUTRAL.pkl
    └── model_transfer
        └── smplx_to_smpl.pkl

Demo

You can first give it a try on Google Colab using the notebook we have prepared, which is no need to prepare the environment yourself: Open In Colab

Run the demo code.

For image folder input:

python -m apps.demo_smplx --image_folder examples/coco_images --detection_threshold 0.3 --pretrained_model data/pretrained_model/PyMAF-X_model_checkpoint.pt --misc TRAIN.BHF_MODE full_body MODEL.EVAL_MODE True MODEL.PyMAF.HAND_VIS_TH 0.1

For video input:

python -m apps.demo_smplx --vid_file examples/dancer_short.mp4 --pretrained_model data/pretrained_model/PyMAF-X_model_checkpoint.pt --misc TRAIN.BHF_MODE full_body MODEL.EVAL_MODE True MODEL.PyMAF.HAND_VIS_TH 0.1

Results will be saved at ./output. You can set different hyperparamters in the scripts, e.g., --detection_threshold for the person detection threshold and MODEL.PyMAF.HAND_VIS_TH for the hand visibility threshold.

Citation

If this work is helpful in your research, please cite the following papers.

@article{pymafx2022,
  title={PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images},
  author={Zhang, Hongwen and Tian, Yating and Zhang, Yuxiang and Li, Mengcheng and An, Liang and Sun, Zhenan and Liu, Yebin},
  journal={arXiv preprint arXiv:2207.06400},
  year={2022}
}

@inproceedings{pymaf2021,
  title={PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop},
  author={Zhang, Hongwen and Tian, Yating and Zhou, Xinchi and Ouyang, Wanli and Liu, Yebin and Wang, Limin and Sun, Zhenan},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Acknowledgments

Part of the code is borrowed from the following projects, including DaNet, SPIN, VIBE, SPEC, MeshGraphormer, PIFu, DensePose, HMR, HRNet, pose_resnet. Many thanks to their contributions.

Download Source Code

Download ZIP

Paper Preview

Aug 20, 2022