Layer 0 | Layer 1 | Layer 2 | Layer 3 | |
EG3D |
|
|
|
|
EVA3D |
|
|
|
|
Rodin |
|
|
|
|
Humanliff (Ours) |
|
|
|
|
Layer 0 | Layer 1 | Layer 2 | Layer 3 | |
EG3D |
|
|
|
|
EVA3D |
|
|
|
|
Rodin |
|
|
|
|
Humanliff (Ours) |
|
|
|
|
a) The first stage reconstructs 3D representations, i.e., tri-planes, from multi-view images with a shared decoder. In order to spatially align 3D features, inverse LBS is used to transform humans with different shapes or poses to the canonical spaces. b) In the second stage, we learn layer-wise human diffusion models using the reconstructed tri-planes from the first stage. To facilitate sequential conditional generation, we apply a UNet encoder to hierarchically incorporate tri-plane features from previous steps with diffused tri-plane features as inputs to the denoising network.
@article{HumanLiff, title={HumanLiff: Layer-wise 3D Human Generation with Diffusion Model}, author={Hu, Shoukang and Hong, Fangzhou and Hu, Tao and Pan, Liang and Mei, Haiyi and Xiao, Weiye and Yang, Lei and Liu, Ziwei}, journal={arXiv preprint}, year={2023} }