Shoukang Hu
I am a Research Scientist at Sony AI.
Before that, I was a Research Fellow at MMLab@NTU, working with Prof. Ziwei Liu.
I obtained the Ph.D. degree from The Chinese Univeristy of Hong Kong under the supervision of Prof. Xunying Liu,
and the B.E. Degree from the University of Electronic Science and Technology of China.
Email: shoukang [dot] hu [at] gmail.com
Google Scholar  | 
GitHub  | 
CV
Quote: "The two most powerful warriors are patience and time" - Leo Tolstoy
|
|
Research Interests
I am interested in exploring multiple modalities to enhance the advancement of perception, reconstruction, and generation, including 3D Human Modelling, Automatic Speech Recognition and Automated Machine Learning, i.e.,
- 3D Human/Object Reconstruction/Generation
- Automatic Speech Recognition (ASR)
- Automated Machine Learning (AutoML)
Internship advice:
I'm happy to collaborate with enthusiastic and talented PhD students. Currently, we have opportunities for research interns. If you are interested in working with me, e-mail me your CV and a brief description of your desired focus during the internship.
|
News
- [Jul. 2024] Web-scale human video dataset WildAvatar (Arxiv'24) is released.
- [Jul. 2024] MVSGaussian (ECCV 2024) is released.
- [Jun. 2024] All-in-one (INTERSPEECH 2024) and AMD (INTERSPEECH 2024) are released.
- [May. 2024] GenWarp (NeurIPS 2024) is released.
- [Apr. 2024] GSTalker (Arxiv'24) is released.
- [Mar. 2024] I joined Sony AI to start a new journey!
|
Publications
(* indicates equal contribution)
|
|
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Shoukang Hu,
Ziwei Liu
Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper] [Project Page] [Code]
GauHuman learns articulated Gaussian Splatting from monocular videos with both fast training (1~2 minutes) and real-time rendering (up to 189 FPS).
|
|
HumanLiff: Layer-wise 3D Human Generation with Diffusion Model
Shoukang Hu,
Fangzhou Hong,
Tao Hu,
Liang Pan,
Weiye Xiao,
Haiyi Mei,
Lei Yang,
Ziwei Liu
Preprint
[Paper] [Project Page] [Code]
HumanLiff learns the layer-wise 3D human generative model with a unified diffusion process.
|
|
ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis
Shoukang Hu,
Kaichen Zhou,
Kaiyu Li,
Longhui Yu,
Lanqing Hong,
Tianyang Hu,
Zhenguo Li,
Gim Hee Lee,
Ziwei Liu
Preprint
[Paper] [Project Page] [Code]
ConsistentNeRF Enhances Neural Radiance Fields with 3D Consistency for Sparse View Synthesis.
|
|
SHERF: Generalizable Human NeRF from a Single Image
Shoukang Hu*,
Fangzhou Hong*,
Liang Pan,
Haiyi Mei,
Lei Yang,
Ziwei Liu
International Conference on Computer Vision (ICCV), 2023.
[Paper] [Project Page] [code]
SHERF learns a Generalizable Human NeRF to animate 3D humans from a single image.
|
|
Generalizing Few-Shot NAS with Gradient Matching
Shoukang Hu*,
Ruocheng Wang*,
Lanqing Hong,
Zhenguo Li,
Cho-Jui Hsieh,
Jiashi Feng
International Conference on Learning Representations (ICLR), 2022.
[Paper] [Code] [Zhihu]
GM-NAS formulates supernet partitioning as a graph clustering problem and utilizes gradient matching score as the splitting criterion. Notably, we achieve 80.6% accuracy on ImageNet under 600 flops constraint.
|
|
Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks
Shoukang Hu,
Xurong Xie,
Mingyu Cui*,
Jiajun Deng*,
Shansong Liu,
Jianwei Yu,
Mengzhe Geng,
Xunying Liu,
Helen Meng
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021.
IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP).
[Paper] [Code]
We achieve 9.9%/11.1% WER on Hub5'00/Rt03 test sets of 300-Hour Switchboard Task with 10.8M parameters.
|
|
Understanding the wiring evolution in differentiable neural architecture search
Sirui Xie*,
Shoukang Hu*,
Xinjiang Wang,
Chunxiao Liu,
Jianping Shi,
Xunying Liu,
Dahua Lin
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
[Paper] [Code] [Zhihu]
Our analysis focuses on three observed searching patterns of differentiable NAS: 1) they search by growing instead of pruning; 2) wider networks are more preferred than deeper ones; 3) no edges are selected in bi-level optimization.
|
|
DSNAS: Direct Neural Architecture Search without Parameter Retraining
Shoukang Hu*,
Sirui Xie*,
Hehui Zheng,
Chunxiao Liu,
Jianping Shi,
Xunying Liu,
Dahua Lin
Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[Paper] [Code] [Zhihu]
We propose a new problem definition for NAS, i.e., task-specific end-to-end NAS. Our DSNAS got a final 122 review score.
|
|
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition
Shoukang Hu,
Xurong Xie,
Max W. Y. Lam,
Shansong Liu,
Jianwei Yu,
Zi Ye,
Mengzhe Geng,
Xunying Liu,
Helen Meng
International Speech Communication Association (INTERSPEECH), 2018.
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
International Speech Communication Association (INTERSPEECH), 2019. ISCA Yajie Miao Memorial Grant Winner
In IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP).
[Paper] [Code]
We improve generalization ability of LF-MMI ASR with 10.4%/11.8% WER on Hub5'00/Rt03 test sets of 300-Hour Switchboard task.
|
|
Recent Progress in the CUHK Dysarthric Speech Recognition System
Shansong Liu*,
Mengzhe Geng*,
Shoukang Hu*,
Xurong Xie*,
Mingyu Cui,
Jianwei Yu,
Xunying Liu,
Helen Meng
In IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP).
[Paper] [Demo] [Demo Paper]
We report our recent progress in CUHK Dysarthric Speech Recognition System.
|
|
On the Use of Pitch Features for Disordered Speech Recognition
Shansong Liu*,
Shoukang Hu*,
Xunying Liu,
Helen Meng
International Speech Communication Association (INTERSPEECH), 2019.
[Paper] [Demo] [Demo Paper]
We investigate the use of pitch features in Disordered Speech Recognition.
|
|
BLHUC: Bayesian learning of hidden unit contributions for deep neural network speaker adaptation
Xurong Xie,
Xunying Liu,
Tan Lee,
Shoukang Hu,
Lan Wang
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. Best Student Paper Award
[Paper] [Code]
BLHUC achieves 9.7%/10.7% WER on Hub5'00/Rt03 test sets of 300-Hour Switchboard task.
|
|
Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition
Shansong Liu,
Shoukang Hu,
Yi Wang,
Jianwei Yu,
Rongfeng Su,
Xunying Liu,
Helen Meng
International Speech Communication Association (INTERSPEECH), 2019. Best Student Paper Award Nomination
[Paper] [Demo] [Demo Paper]
Bayesian Gated Neural Networks achieves 25.7% WER on UASpeech corpus.
|
Services
- Conference PC Member: ICASSP 22-24, INTERSPEECH 21-23, NeurIPS 22-23, ICML 22-23, AAAI 22-23, IJCAI 23, AISTATS 21, SIGGRAPH 23
- Journal Reviewer: TASLP, TPAMI, TOG, TVCG, JMLR, IJCV, TNNLS, Neural Networks
|
Rewards
- CUHK Postgraduate Student Scholarship
- National Scholarship awarded by the Ministry of Education of China in 2015 & 2016
|
|