Yu Rong

I am currently a Research Engineer of (Meta) Reality Labs Research. Prior to that, I obtained the Ph.D. degree from The Chinese University of Hong Kong, Multimedia Laboratory, in September 2021, advised by Prof. Xiaoou Tang and Prof. Chen Change Loy. I received my B.E from computer science and technology department of Tsinghua University in 2016.

I interned in Facebook AI Research (FAIR) with Dr. Hanbyul Joo and Dr. Takaaki Shiratori in 2020 Spring. Earlier, I worked as a computer vision researcher in Sensetime from Apr. 2015 to Aug. 2017.

My research interests include computer vision, computer graphics, and machine learning. Especially, I am interested in virtual humans, including 3D face, hand, and body reconstruction.

Email / Github / Google Scholar / LinkedIn / CV


  • NEW!  One paper accepted to CVPR 2022! Coming Soon!
  • The code of IHMR (3DV 2021) has been released. Please check GitHub for more details.
  • Our paper, IHMR, is accepted to 3DV 2021.

Industry Experience
sensetime Mar. 2022 - Now, (Meta) Reality Labs Research
Research Engineer. Pittsburgh, PA, U.S.

Currently, I am working in (Meta) Reality Labs Research to build 3D Face Avatar for AR/VR equipments.

sensetime Jan. 2020 - May. 2020, Facebook AI Research (FAIR)
Research Intern. Menlo Park, CA, U.S.

We use SMPL-X to represent 3D hands and bodies and adopt separate modules for predicting independent hand and body motion first. Hand and body motion predictions are then combined and finetuned to get unified hand and body motion results. Our model runs 10x faster than previous methods with better performance on challenging in-the-wild scenarios with motion blur.

sensetime Apr. 2015 - Jul. 2017, SenseTime Research
Computer Vision Researcher. Beijing, China.

During my working in sensetime, I majorly took part in research projects in OCR and face recognition. These projects including but not limited to: ID card recognition (Given an image of ID card, recognize important information like name, address); text detection for in-the-wild scenes (using modified RPN); a remote training system which can train and evaluate automatically to produce a face verification model.


July. 2017 - September 2021, The Chinese Unviersity of Hong Kong
Department of Information Engineering
Doctor of Philosophy


Aug. 2012 - Jul. 2016 , Tsinghua University
Department of Computer Science and Technology,
Balchlor of Engineering

Selected Publications [Full Publication List]

Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis
Jingbo WangYu RongJingyuan LiuSijie YanDahua LinBo Dai
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[PDF]  [Demo

We present a multi-stage framework to synthesize natural and diverse human motions interacting with given scenes under the guidance of action labels.


Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements
Yu RongJingbo WangZiwei LiuChen Change Loy
International Conference on 3D Vision (3DV), 2021
[PDF]  [Project Page]  [Code]  [Demo

We present a two-stage framework for reconstructing collision-aware 3D interacting hands from monocular single images. The first stage uses a CNN to generate initial 3D hands and 2D/3D joints. The second stage refines initial results to diminish collisions via factorized refinement.


VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds
Guanze Liu,  Yu RongLu Sheng
ACM Multimedia (MM), 2021, Oral
[PDF]  [Code]  [Demo

We deisgn a framework, named VoteHMR, to reconstruct reliable 3D human pose and shapes from single-frame partial point clouds obtained from commercial depth sensors such as Kinect.


FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration
Yu RongTakaaki ShiratoriHanbyul Joo
International Conference on Computer Vision Workshops (ICCVW), 2021
[PDF]  [Project Page]  [Code]  [Demo

We present a framework, named FrankMocap, to simultaneously capture 3D whole-body motioin (body, face, and hands) from monocular RGB inputs.


Chasing the Tail in Monocular 3D Human Reconstruction with Prototype Memory
Yu RongZiwei LiuChen Change Loy
IEEE Transactions on Image Processing (TIP), 2022
[PDF]  [Project Page]  [Code

We design a novel framework to increase the 3D human mocap accuracy for challenging poses.


Delving Deep into Hybrid Annotations for 3D Human Recovery in the Wild
Yu RongZiwei LiuCheng LiKaidi CaoChen Change Loy
International Conference on Computer Vision (ICCV), 2019
[PDF]  [Project Page]  [Code]

We provided sufficient investigation of annotation design for in-the-wild 3D human reconstruction.


Pose-Robust Face Recognition via Deep Residual Equivariant Mapping
Kaidi Cao*Yu Rong*Cheng LiXiaoou TangChen Change Loy
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
[PDF]  [Project Page]  [Code]

We presented a Deep Residual EquivAriant Mapping (DREAM) block to improve the performance of face recognition on profile faces.

Academic Activities