Code for paper "DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation"

Code for paper "DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation"
Abstract: One key challenge of exemplar-guided image generation lies in establishing fine-grained correspondences between input and guided images. Prior approaches, despite the promising results, have relied on either estimating dense attention to compute per-point matching, which is limited to only coarse scales due to the quadratic memory cost, or fixing the number of correspondences to achieve linear complexity, which lacks flexibility. In this paper, we propose a dynamic sparse attention based Transformer model, termed Dynamic Sparse Transformer (DynaST), to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Specifically, DynaST leverages the multi-layer nature of Transformer structure, and performs the dynamic attention scheme in a cascaded manner to refine matching results and synthesize visually-pleasing outputs. In addition, we introduce a unified training objective for DynaST, making it a versatile reference-based image translation framework for both supervised and unsupervised scenarios. Extensive experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details, outperforming the state of the art while reducing the computational cost significantly. Our code is available at this https URL

DynaST

This is the pytorch implementation of the following ECCV 2022 paper:

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

Songhua Liu, Jingwen Ye, Sucheng Ren, and Xinchao Wang.

Installation

git clone https://github.com/Huage001/DynaST.git
cd DynaST
conda create -n DynaST python=3.6
conda activate DynaST
pip install -r requirements.txt

Inference

  1. Prepare DeepFashion dataset following the instruction of CoCosNet.

  2. Create a directory for checkpoints if there is not:

    mkdir -p checkpoints/deepfashion/
  3. Download pre-trained model from here and move the file to the directory 'checkpoints/deepfashion/'.

  4. Edit the file 'test_deepfashion.sh' and set the argument 'dataroot' to the root of the DeepFashion dataset.

  5. Run:

    bash test_deepfashion.sh
  6. Check the results in the directory 'checkpoints/deepfashion/test/'.

Training

  1. Create a directory for the pre-trained VGG model if there is not:

    mkdir vgg
  2. Download pre-trained VGG model used for loss computation from here and move the file to the directory 'vgg'.

  3. Edit the file 'train_deepfashion.sh' and set the argument 'dataroot' to the root of the DeepFashion dataset.

  4. Run:

    bash train_deepfashion.sh
  5. Checkpoints and intermediate results are saved in the directory 'checkpoints/deepfashion/'.

Citation

If you find this project useful in your research, please consider cite:

@Article{liu2022dynast,
    author  = {Songhua Liu, Jingwen Ye, Sucheng Ren, Xinchao Wang},
    title   = {DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation},
    journal = {European Conference on Computer Vision},
    year    = {2022},
}

Acknowledgement

This code borrows heavily from CoCosNet. We also thank the implementation of Synchronized Batch Normalization.

Download Source Code

Download ZIP

Paper Preview

Aug 18, 2022