FrankMocap: A Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Yu Rong1,3
Takaaki Shiratori2
Hanbyul Joo3

The Chinese University of Hong Kong1
Facebook Reality Labs2
Facebook AI Research3


We present a motion capture system to estimate 3D hand and body motion from monocular videos. Our method works in near real-time (9.5 fps) and produces 3D body and 3D hand motion capture output as a unified parametric model structure. Although the essential nuance of human motion is often conveyed as a combination of body movements and hand gestures, the existing monocular motion capture approaches mostly focus on either body motion capture only ignoring hand parts or hand motion capture only without considering body motion. Our method aims to capture 3D body and 3D hand motion simultaneously from challenging in-the-wild monocular videos with faster speed and better accuracy than previous work. To achieve the ambitious goal, we build the state-of-the-art monocular 3D ``hand'' motion capture method by taking the hand part of the whole body parametric model (SMPL-X). Our 3D hand motion capture output can be directly combined with monocular body motion capture output, producing whole body motion outputs in a unified form. We demonstrate the state-of-the-art performance of our hand motion capture system in public benchmarks, and show the high quality of our whole body motion capture result in various challenging real-world scenarios, including a real-time demo scenario.


Y. Rong, T. Shiratori, H. Joo.

FrankMocap: A Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration.

[pdf]     [Bibtex]


Results (Video)


This webpage template was borrowed from PIFuHD.