========================================================================================================== Percept3D -relates human visual perception (re-enactment and eye-movement recordings) with 3D ground truth. =========================================================================================================== ================== Recording Details: ================== The motion capture data was recorded using a Vicon system that had sample rate of 200Hz. The eyetracking data was recorded using a mobile SMI HED Eyetracking system with a sample rate of 200Hz. ================== Subject Details: ================== There are a total of 10 subjects. Subjects S1, S5, S7, S9 and S10 are females and S2, S3, S4, S6 and S8 are males. For Subject S7, the eyetracking data was recorded from the left eye which was his dominant eye. The other subjects' dominant eye was the right eye. ================== Notation Details: ================== The list of 29 joints for which position information is given in this dataset (in this order) is the following: -PositionsLabels = {'LHip', 'LKnee', 'LAnkle', 'LToes', 'LToesEnd', 'RHip', 'RKnee', 'RAnkle', 'RToes', 'RToesEnd', 'LowerBack', 'MiddleBack', 'Neck', 'Head', 'HeadEnd', 'LClavicle', 'LShoulder', 'LElbow', 'LWrist', 'LThumb', 'LThumbEnd', 'LWristEnd', 'RClavicle', 'RShoulder', 'RElbow', 'RWrist', 'RThumb', 'RThumbEnd', 'RWristEnd'}; -In the results presented in our ICCV 2013 paper ("Pictorial Human Spaces : How well do humans perceive a 3D articulated pose?"), we used only a subset of 17 joints. We removed the extremities and focused on the analysis on the main joints: PositionsLabelsSubset = {'LHip', 'LKnee', 'LAnkle', 'RHip', 'RKnee', 'RAnkle', 'LowerBack', 'MiddleBack', 'Neck', 'Head', 'HeadEnd', 'LShoulder', 'LElbow', 'LWrist', 'RShoulder', 'RElbow', 'RWrist'} The list of angles is the following: -AngleLabels = {'LHip', 'LKnee', 'LAnkle', 'LToes', 'RHip', 'RKnee', 'RAnkle', 'RToes', 'LowerBack', 'MiddleBack', 'Neck', 'Head', 'LClavicle', 'LShoulder', 'LElbow', 'LWrist', 'RClavicle', 'RShoulder', 'RElbow', 'RWrist'} -Their corresponding degrees of freedom are: DOF = [3 1 3 1 3 1 3 1 3 3 3 3 3 3 1 3 3 3 1 3] - In the results presented in our ICCV 2013 paper, we used a subset of 16 angles: AngleLabelsSubset={'LHip', 'LKnee', 'LAnkle', 'RHip', 'RKnee', 'RAnkle', 'LowerBack', 'MiddleBack', 'Neck', 'Head', 'LShoulder', 'LElbow', 'LWrist', 'RShoulder', 'RElbow', 'RWrist'} =========================================================================================================== The dataset is organized as follows: ========================== Pose Images ========================== It contains 2 sub-folders: easyPoses and hardPoses. -easyPoses contains 100 .png files (named 1.png, 2.png... 100.png) representing the images of easy poses shown to subjects. -hardPoses contains 20 .png files(named 101.png, 102.png... 120.png) representing the images of hard poses shown to subjects. The images provided are padded with a borders composed of 3 stripes in the following order: 10px black, 16px green/blue, 10px black The mappings of eyetracking data from scene image to original image as well as the 2D joints positions are provided relative to the coordinates of these images, padded in the way described above. ========================== Pose Ground Truth ========================== It contains 120 .mat files named pose1.mat, pose2.mat..., pose120.mat which corresponds to the 120 poses shown to subjects. Each file pose*.mat is made of 4 matrices: -Joints_2Dpos -Joints_pos -Joints_rot -Joints_visibility Joints_2Dpos is a 2x29 matrix that contains on a column the x and y coordinate of each joint in the corresponding 2D image shown to subjects. Joints_pos is a 1x87 array that contains for each joint its 3D position in the order x1, y1, z1, x2, y2, z2........x29, y29, z29. Joints_rot is a 1x48 array that contains the ZXY Euler Angles between joints. Note that there are only 20 angles (some with 3 degrees of freedom and some with 1 degree of freedom) as some of the joints are terminal joints. (See Notation Details section) For the joints that have 3 degrees of freedom, the order of the angles is z.x.y. Joints_visibility is a 1x29 binary array that specifies whether in the current image a joint i is seen (Joints_visibility[i]=1) or not seen (Joints_visibility[i]=0). ============================ Mocap Data ============================ It contains 10 sub-folders, named S1, S2......S10. Each folder corresponds to one subject. A folder Si contains 120 .mat files representing the motion capture recordings during the process of pose re-enactment for subject i, for each of 120 images shown. Each .mat file contains the following 3 structures: -Angles - matrix of dimensions (noFrames x 48), where noFrames is the number of Frames captured, representing the degree of each of the 20 angles given in ZXY Euler Angles. Note that some of the angles have 3 d.o.f and others 1 d.o.f as explained in Notations Details section. -Position - matrix of dimensions (noFrames x 87), representing the 3D positions of each joint in the order x1, y1, z1, x2, y2, z2........x29, y29, z29. -World_Root -Rotation -matrix of dimensions (noFrames x3) giving the rotation of the root joint in the skeleton hierarchy -Position -matrix of dimensions (noFrames x3) giving the position of the root joint in the skeleton hierarchy ============================ Eye Movements ============================ It contains 10 sub-folders named S1, S2,.....S10. Each folder corresponds to one subject. A folder Si contains 120 .mat files named mappedRawData_1, mappedRawData_2,.....mappedRawData_120. Each of the 120 files represent mappings of raw eyetracking data from the scene camera image (recorded with the Eyetracker's scene camera) to original image projected ,in the following format: Each mappedRawData_i is a cell structure of size noSampleData, where noSampleData is the number of eyetracking samples recorded. Each sample has the following attributes: posX - X coordinate of the sample in the scene video frame posY - Y coordinate of the sample in scene video frame frameTime - time of the current frame eventType - "S" for saccade, "B" for blinks and "F" for fixations frameNo -number of the video frame mappedX -X coordinate of the sample in original projected image i (see Pose Images) mappedY -Y coordinate of the sample in original projected image i (see Pose Images) Note: the mapping is done only for the 5 seconds when the image was available on the projector. If the the values of mappedX and mappedY are equal to -1, it means that either: -the selected sample is outside the 5 seconds range or -there was an invalid sample that could not be mapped (outside the image borders, outside the range of the eyetracking system, etc) ============================ Synchronized Frames ============================ It is a 10x120 cell matrix that defines (for each subject and for each pose shown) the following attributes: syncEyeTracker -the index of the last frame in which the image projected is visible in the EyeTracking recordings; syncMocap -the corresponding index of the last frame of projected image as recorded by the Motion Capture Cameras; endMocap -the index of the frame at which the subject considered that he completed the re-enactment process