Shoukang Hu

I am a Senior Researcher at Microsoft Research Asia. Before that, I was a Research Scientist at Sony AI and a Research Fellow at MMLab@NTU, working with Prof. Ziwei Liu. I obtained the Ph.D. degree from The Chinese Univeristy of Hong Kong under the supervision of Prof. Xunying Liu, and the B.E. Degree from the University of Electronic Science and Technology of China.

Email: shoukang [dot] hu [at] gmail.com

Google Scholar | GitHub | CV

Research Interests

I am interested in exploring multiple modalities to enhance the advancement of perception, reconstruction, and generation, including 3D Human Modelling, Audio/Speech Processing and Automated Machine Learning, i.e.,

3D Human/Object Reconstruction/Generation
- [HumanGif] [GauHuman] [HumanLiff] [SHERF] [ConsistentNeRF]
Automated Speech Recognition (ASR)
- [Bayesian LF-MMI ASR] [NAS LF-MMI ASR] [CUHK Dysarthric ASR]
Automated Machine Learning (AutoML)
- [GM-NAS][NAS-Analysis][DSNAS]

News

[Jun. 2025] Free4D (ICCV 2025) (proj page, code, and demo) is released.
[May. 2025] We are organizing 2nd workshop on Audio-Visual Generation and Learning at ICCV 2025.
[May. 2025] HumanLiff is accepted by IJCV.
[Feb. 2025] WildAvatar (CVPR 2025) (proj page, code, and demo) is released.
[Feb. 2025] HumanGif (proj page, code, and demo) is released.

Publications

(* indicates equal contribution, + denotes corresponding author)

	Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency Tianqi Liu^, Zihao Huang^, Zhaoxi Chen, Guangcong Wang, Shoukang Hu, Liao Shen, Huiqiang Sun, Zhiguo Cao, Wei Li⁺, Ziwei Liu⁺ International Conference on Computer Vision (ICCV), 2025. [Paper] [Project Page] [Code] Free4D is a tuning-free framework for 4D scene generation.
	HumanGif: Single-View Human Diffusion with Generative Prior Shoukang Hu, Takuya Narihira, Kazumi Fukuda, Ryosuke Sawata, Takashi Shibuya, Yuki Mitsufuji Preprint [Paper] [Project Page] [Code] HumanGif formulates the single-view-based 3D human novel view and pose synthesis as a single-view-conditioned human diffusion process, utilizing generative priors from foundational diffusion models.
	WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation Zihao Huang, Shoukang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu. Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [Paper] [Project Page] [Code] WildAvatar is a large-scale dataset from YouTube with 10,000+ human subjects, designed to address the limitations of existing laboratory datasets for avatar creation.
	MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu. European Conference on Computer Vision (ECCV), 2024 [Paper] [Project Page] [Code] MVSGaussian is a Gaussian-based method designed for efficient reconstruction of unseen scenes from sparse views in a single forward pass. It also offers high-quality initialization for fast training and real-time rendering.
	GauHuman: Articulated Gaussian Splatting from Monocular Human Videos Shoukang Hu, Ziwei Liu Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [Paper] [Project Page] [Code] GauHuman learns articulated Gaussian Splatting from monocular videos with both fast training* (1~2 minutes) and real-time rendering (up to 189 FPS).*
	HumanLiff: Layer-wise 3D Human Generation with Diffusion Model Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Weiye Xiao, Haiyi Mei, Lei Yang, Ziwei Liu International Journal of Computer Vision (IJCV). [Paper] [Project Page] [Code] HumanLiff learns the layer-wise 3D human generative model with a unified diffusion process.
	ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis Shoukang Hu, Kaichen Zhou, Kaiyu Li, Longhui Yu, Lanqing Hong, Tianyang Hu, Zhenguo Li, Gim Hee Lee, Ziwei Liu Preprint [Paper] [Project Page] [Code] ConsistentNeRF Enhances Neural Radiance Fields with 3D Consistency for Sparse View Synthesis.
	SHERF: Generalizable Human NeRF from a Single Image Shoukang Hu, Fangzhou Hong, Liang Pan, Haiyi Mei, Lei Yang, Ziwei Liu International Conference on Computer Vision (ICCV), 2023. [Paper] [Project Page] [code] SHERF learns a Generalizable Human NeRF* to animate 3D humans from a single image.*
	Generalizing Few-Shot NAS with Gradient Matching Shoukang Hu, Ruocheng Wang, Lanqing Hong, Zhenguo Li, Cho-Jui Hsieh, Jiashi Feng International Conference on Learning Representations (ICLR), 2022. [Paper] [Code] [Zhihu] GM-NAS formulates supernet partitioning as a graph clustering problem and utilizes gradient matching score as the splitting criterion. Notably, we achieve 80.6% accuracy on ImageNet under 600 flops constraint.
	Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP). [Paper] [Code] We achieve 9.9%/11.1%* WER on Hub5'00/Rt03 test sets of 300-Hour Switchboard Task with 10.8M parameters.*
	Understanding the wiring evolution in differentiable neural architecture search Sirui Xie, Shoukang Hu, Xinjiang Wang, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin International Conference on Artificial Intelligence and Statistics (AISTATS), 2021. [Paper] [Code] [Zhihu] Our analysis focuses on three observed searching patterns of differentiable NAS: 1) they search by growing instead of pruning; 2) wider networks are more preferred than deeper ones; 3) no edges are selected in bi-level optimization.
	DSNAS: Direct Neural Architecture Search without Parameter Retraining Shoukang Hu, Sirui Xie, Hehui Zheng, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [Paper] [Code] [Zhihu] We propose a new problem definition for NAS, i.e., task-specific end-to-end NAS. Our DSNAS got a final 122* review score.*
	Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition Shoukang Hu, Xurong Xie, Max W. Y. Lam, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng International Speech Communication Association (INTERSPEECH), 2018. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. International Speech Communication Association (INTERSPEECH), 2019. ISCA Yajie Miao Memorial Grant Winner In IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP). [Paper] [Code] We improve generalization ability of LF-MMI ASR with 10.4%/11.8%* WER on Hub5'00/Rt03 test sets of 300-Hour Switchboard task.*
	Recent Progress in the CUHK Dysarthric Speech Recognition System Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng In IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP). [Paper] [Demo] [Demo Paper] We report our recent progress in CUHK Dysarthric Speech Recognition System.
	On the Use of Pitch Features for Disordered Speech Recognition Shansong Liu, Shoukang Hu, Xunying Liu, Helen Meng International Speech Communication Association (INTERSPEECH), 2019. [Paper] [Demo] [Demo Paper] We investigate the use of pitch features in Disordered Speech Recognition.
	BLHUC: Bayesian learning of hidden unit contributions for deep neural network speaker adaptation Xurong Xie, Xunying Liu, Tan Lee, Shoukang Hu, Lan Wang International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. Best Student Paper Award [Paper] [Code] BLHUC achieves 9.7%/10.7%* WER on Hub5'00/Rt03 test sets of 300-Hour Switchboard task.*
	Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition Shansong Liu, Shoukang Hu, Yi Wang, Jianwei Yu, Rongfeng Su, Xunying Liu, Helen Meng International Speech Communication Association (INTERSPEECH), 2019. Best Student Paper Award Nomination [Paper] [Demo] [Demo Paper] Bayesian Gated Neural Networks achieves 25.7%* WER on UASpeech corpus.*

Services

Conference PC Member: ICASSP 22-24, INTERSPEECH 21-23, NeurIPS 22-23, ICML 22-23, AAAI 22-23, IJCAI 23, AISTATS 21, SIGGRAPH 23
Journal Reviewer: TASLP, TPAMI, TOG, TVCG, JMLR, IJCV, TNNLS, Neural Networks

Rewards

CUHK Postgraduate Student Scholarship
National Scholarship awarded by the Ministry of Education of China in 2015 & 2016

Good artists copy.