APE Dataset

Dataset for Unconstrainted 3D Human Pose Estimation
Download the APE Dataset (3.5 GB) Download the Matlab scripts (1.4 KB)


The Cambridge-Imperial APE (Action-Pose-Estimation) dataset is collected for 3D human pose estimation. Different from other public datasets, the APE dataset provides multimodal data:

  • RGB Images
  • Depth-maps
  • 3D human pose
  • Human action label

The APE dataset contains 245 sequences from 7 subjsects performing 7 different categories of actions. Videos of each subject were recorded in different unconstrained environments, changing camera poses and moving background objects.

The settings is considered challenging for traditional 3D human pose estimation because:

  1. no scene-dependent cues, such as foreground segmentation, can be used
  2. testing is done in completely unseen environments.


Download the APE Dataset (3.5 GB) Download the Matlab scripts (1.4 KB)

In citing the APE dataset, please refer to:
Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-modality Regression Forest
Tsz-Ho Yu, Tae-Kyun Kim, Roberto Cipolla
In proceedings of 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2013
[IEEE Xplore Website]

Action Classes

The seven action classes of the APE dataset are illustrated as follows:


Action Index File

The action label of each sequence is stored in activityLabel.txt, which assigns an action class of each folder.

Image Data

The APE dataset provides Both RGB images and depth-maps of the video sequences. They are stored in folders indicated by the action class and subject, e.g. clap04 is the folder that contains the "clapping" sequences performed by subject 4.

Inside each sequence folder, RGB frames and depth-maps are named as "imageX.jpg" and "depthX.jpg" respectively, where X is the frame index. For example, depth00567.jpg is the depth-map for frame 568.

3D Pose Data

3D pose data are stored in text files indicated by its corresponding folder, e.g. clap04.txt contains the 3D pose data for the clapping action performed by subject 4, corresponding to folder "clap04".

The format of the pose file is described below:

[N (# of frame)]
[No use] [No use] [joint 1] [joint 2] ... [joint 15]
(N lines)
[No use] [No use] [joint 1] [joint 2] ... [joint 15]

The first line contains the number of frame N in the video sequence, which is followed by N lines describing the 3D pose of the subject in each frame. The first two numbers in each line are place holders and not used at this moment, then each joint is represetned by a 14 numbers:

[joint 1] := [X] [Y] [Z] [location_conf] [R00] [R01] [R02] [R10] [R11] [R12] [R20] [R21] [R22] [rotation_conf]

[X,Y,Z] is the 3D spatial location of the joint. [location_conf] indicates the confidence of its location, a joint is not seen when its confidence is zero (occlusions, missing from data, etc.). The orientation of teh joint is represented by a 3x3 rotation matrix with elements [R00 to R22], there rotation-conf is the confidence of the joint's orientation.

Matlab Scripts
Please use thyReadData.m to load the APE dataset to Matlab. thyReadSkel.m is used to read the 3D pose data file, and rot2euler is used to convert rotation matrices to Euler angles.


The APE dataset is released under the MIT license.


For any inquiries or feedback regarding the APE dataset, please contact:

Designed and written by Tsz-Ho Yu 2015