Abstract: We present PyMAF-X, a regression-based approach to recovering a full-body parametric model from a single image. This task is very challenging since minor parametric deviation may lead to noticeable misalignment between the estimated mesh and the input image. Moreover, when integrating part-specific estimations to the full-body model, existing solutions tend to either degrade the alignment or produce unnatural wrist poses. To address these issues, we propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop in our regression network for well-aligned human mesh recovery and extend it as PyMAF-X for the recovery of expressive full-body models. The core idea of PyMAF is to leverage a feature pyramid and rectify the predicted parameters explicitly based on the mesh-image alignment status. Specifically, given the currently predicted parameters, mesh-aligned evidence will be extracted from finer-resolution features accordingly and fed back for parameter rectification. To enhance the alignment perception, an auxiliary dense supervision is employed to provide mesh-image correspondence guidance while spatial alignment attention is introduced to enable the awareness of the global contexts for our network. When extending PyMAF for full-body mesh recovery, an adaptive integration strategy is proposed in PyMAF-X to produce natural wrist poses while maintaining the well-aligned performance of the part-specific estimations. The efficacy of our approach is validated on several benchmark datasets for body-only and full-body mesh recovery, where PyMAF and PyMAF-X effectively improve the mesh-image alignment and achieve new state-of-the-art results. The project page with code and video results can be found at this https URL.
PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images
Hongwen Zhang · Yating Tian · Yuxiang Zhang · Mengcheng Li · Liang An · Zhenan Sun · Yebin Liu
Project Page | Video | Paper
Frame by frame reconstruction. Video clipped from here.
Reconstruction result on a COCO validation image.
Click Here for More Results
Installation
- Python 3.8
conda create --no-default-packages -n pymafx python=3.8
conda activate pymafx
packages
- PyTorch tested on version 1.9.0
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install "git+https://github.com/facebookresearch/[email protected]"
- other packages listed in
requirements.txt
pip install -r requirements.txt
necessary files
smpl_downsampling.npz & mano_downsampling.npz
- Run the following script to fetch necessary files.
bash fetch_data.sh
SMPL & SMPL-X model files
- Collect SMPL and SMPL-X model files from https://smpl.is.tue.mpg.de and https://smpl-x.is.tue.mpg.de. Rename model files and put them into the
./data/smpl
directory.
Download the partial_mesh files and put it into the
./data/partial_mesh
directory.
Download the pre-trained model and put it into the
./data/pretrained_model
directory.
After collecting the above necessary files, the directory structure of ./data
is expected as follows.
./data
├── J_regressor_extra.npy
├── smpl_mean_params.npz
├── smpl_downsampling.npz
├── mano_downsampling.npz
├── partial_mesh
│ └── ***_vids.npz
├── pretrained_model
│ └── PyMAF-X_model_checkpoint.pt
└── smpl
├── SMPLX_NEUTRAL.npz
├── SMPL_NEUTRAL.pkl
└── model_transfer
└── smplx_to_smpl.pkl
Demo
You can first give it a try on Google Colab using the notebook we have prepared, which is no need to prepare the environment yourself:
Run the demo code.
For image folder input:
python -m apps.demo_smplx --image_folder examples/coco_images --detection_threshold 0.3 --pretrained_model data/pretrained_model/PyMAF-X_model_checkpoint.pt --misc TRAIN.BHF_MODE full_body MODEL.EVAL_MODE True MODEL.PyMAF.HAND_VIS_TH 0.1
For video input:
python -m apps.demo_smplx --vid_file examples/dancer_short.mp4 --pretrained_model data/pretrained_model/PyMAF-X_model_checkpoint.pt --misc TRAIN.BHF_MODE full_body MODEL.EVAL_MODE True MODEL.PyMAF.HAND_VIS_TH 0.1
Results will be saved at ./output
. You can set different hyperparamters in the scripts, e.g., --detection_threshold
for the person detection threshold and MODEL.PyMAF.HAND_VIS_TH
for the hand visibility threshold.
Citation
If this work is helpful in your research, please cite the following papers.
@article{pymafx2022,
title={PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images},
author={Zhang, Hongwen and Tian, Yating and Zhang, Yuxiang and Li, Mengcheng and An, Liang and Sun, Zhenan and Liu, Yebin},
journal={arXiv preprint arXiv:2207.06400},
year={2022}
}
@inproceedings{pymaf2021,
title={PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop},
author={Zhang, Hongwen and Tian, Yating and Zhou, Xinchi and Ouyang, Wanli and Liu, Yebin and Wang, Limin and Sun, Zhenan},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
year={2021}
}
Acknowledgments
Part of the code is borrowed from the following projects, including DaNet, SPIN, VIBE, SPEC, MeshGraphormer, PIFu, DensePose, HMR, HRNet, pose_resnet. Many thanks to their contributions.