Diversity and Size
-
• 3.6 million 3D human poses and corresponding images
-
• 11 professional actors (6 male, 5 female)
-
• 17 scenarios (discussion, smoking, taking photo, talking on the phone...)
Accurate Capture and Synchronization
-
• High-resolution 50Hz video from 4 calibrated cameras
-
• Accurate 3D joint positions and joint angles from high-speed motion capture system
-
• Pixel-level 24 body part labels for each configuration
-
• Time-of-flight range data
-
• 3D laser scans of the actors
-
• Accurate background subtraction, person bounding boxes
Support for Development
-
• Precomputed image descriptors
-
• Software for visualization and discriminative human pose prediction
-
• Performance evaluation on withheld test set
|
References
The datasets, large-scale learning techniques, and related experiments are described in:
-
Catalin Ionescu, Dragos Papava, Vlad Olaru and Cristian Sminchisescu, Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments,
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, No. 7, July 2014 [pdf][bibtex]
-
Catalin Ionescu, Fuxin Li and Cristian Sminchisescu, Latent Structured Models for Human Pose Estimation,
International Conference on Computer Vision, 2011 [pdf][bibtex]
The license agreement for data usage implies the citation of the two papers above. Please notice that citing the dataset URL instead of the publications would not be compliant with this license agreement.
Besides the laboratory
test sets created we also focused on providing
test data to cover variations in clothing and complex
backgrounds, as well as camera motion and occlusion.
We are not aware of any setting of this level of difficulty
in the literature. Real images contain people in complex
poses, but the diverse backgrounds as well as the scene
illumination and occlusions can vary independently and
represent important nuisance factors the vision systems
should be robust against. Although approaches to handle
such cases exist, in principle, in the literature, it is still
difficult to annotate real images. This section of our dataset
was especially designed to address such issues.
We create movies by inserting high quality 3D rigged
animation models in real videos, to create a realistic and
complex background, good quality image data and very
accurate 3d pose information. The mixed reality movies
were created by inserting and rendering 3D models of a
fully clothed man and woman in real videos. The poses
used for animating the models were extracted directly
from our laboratory test set. The
actual insertion required solving the camera motion of
the backgrounds, as well as its internal parameters, for
good quality rendering. The scene was set up and rendered using the mental
ray (raytracing) renderer, with several well-placed area
lights and skylights. To improve quality, we have placed
a transparent plane on the ground, to receive shadows.
Scenes with occlusion were also created. The dataset
contains 5 different dynamic backgrounds obtained with
a moving camera, total of 10350 examples, out of which
1270 frames contain various degrees of occlusion.
Code
Visualization : For inspecting the data we provide visualization code, available once you log in.
Baseline pose estimation : We provide a set of implementations for a set of baseline prediction methods. This also includes code for data manipulation, feature extraction as well as large scale discriminative learning methods based on Fourier approximations.
|
Precomputed Segments
We provide two types of segmentation for our video data. These are precomputed on the raw image data for most accurate results and are available in the download section of this website.
Bounding Box : To obtain very accurate bounding boxes we reproject our calibrated poses and fit rectangles around the projections.
Background Subtraction : Background models are obtained from image data separately acquired for this purpose. A graph cut with this unary potential and binary potential given by edges is used to obtain the final background subtraction result. The graph cut is performed only inside the bounding box.
|
Precomputed Features
For both segmentations (bounding boxes and background subtraction) we provide precomputed pyramid HoG with different parameters extracted both on the silhouettes alone and on the silhouettes with internal edges.