Title

Exploring Dense Depth Predictions as a Supervision Source for Human Pose and Shape Estimation

Abstract

Abstract

This thesis examines the effectiveness of dense depth information obtained by state-of-the-art depth estimation models in learning 3D human pose and shape (HPS) estimation from a monocular image. Collecting ground-truth data to supervise HPS estimation is costly and constrained to limited lab environments. Researchers have used priors based on body structure or kinematics, cues obtained from other vision tasks such as optical flow and segmentation, and self-supervised tasks to mitigate the scarcity of supervision. Despite its apparent potential in this context, monocular depth estimation has yet to be explored. We address this by first defining a dense mapping and alignment between the points on the surface of the human mesh and the points reconstructed from depth estimation. We then propose and extensively evaluate several loss functions. We further introduce the idea of Camera Pretraining, a novel learning strategy where, instead of estimating all parameters simultaneously, learning of camera parameters is prioritized (before pose and shape parameters) to avoid unwanted local minima. Our experiments on Human3.6M and 3DPW datasets show that the proposed mapping, alignment, and loss calculation pipeline, together with Camera Pretraining, significantly improves HPS estimation performance over using only 2D keypoint supervision or 2D and 3D supervision.

Supervisor(s)

Supervisor(s)

BATUHAN KARAGOZ

Date and Location

Date and Location

2024-01-23 11:30:00

Category

Category

PhD_Thesis