Getting pose ground truth data for YOLO

3 min readFeb 19, 2018

Week 2, as part of my major project YOPO (You Only Pose Once) I’ve been looking at the required ground truths for the YOLO algorithm, and trying to map the MPII pose data ground truth to what YOLO requires.

YOLOv2 wants every dimension relative to the dimensions of the image, see below:

[category number] [object center in X] [object center in Y] [object width in X] [object width in Y]

The Training data from MPII Pose dataset is currently stored in MatLab format, that is heavily nested and is quite hard to read when loading into python and print the nested arrays out to the console. So I wanted to parse to a more readable format.

I’m using a MatLab parsing function written in python taken from here:

https://github.com/bearpaw/pytorch-pose/blob/master/pose/datasets/mpii.py

Fortunately the parser returns the ground truth labels as an array of JSON objects that contains the ground truth for each object.

An example of the ground truth for a single image is as follows:

{
  "train": 1,
  "is_visible": {
    "8": 0,
    "13": 1,
    "12": 1,
    "10": 1,
    "5": 1,
    "2": 1,
    "15": 1,
    "6": 0,
    "4": 1,
    "14": 1,
    "3": 0,
    "7": 1,
    "9": 0,
    "11": 1,
    "0": 1,
    "1": 1
  },
  "head_rect": [
    627,
    100,
    706,
    198
  ],
  "joint_pos": {
    "8": [
      637.0201,
      189.8183
    ],
    "13": [
      692,
      185
    ],
    "12": [
      601,
      167
    ],
    "10": [
      606,
      217
    ],
    "5": [
      656,
      231
    ],
    "2": [
      573,
      185
    ],
    "15": [
      688,
      313
    ],
    "6": [
      610,
      187
    ],
    "4": [
      661,
      221
    ],
    "14": [
      693,
      240
    ],
    "3": [
      647,
      188
    ],
    "7": [
      647,
      176
    ],
    "9": [
      695.9799,
      108.1817
    ],
    "11": [
      553,
      161
    ],
    "0": [
      620,
      394
    ],
    "1": [
      616,
      269
    ]
  },
  "filename": "015601864.jpg"
}

As mentioned above YOLO requires the ground truth in particular format. So now we need to figure out a way to map the grounds truths or failing this we’ll need to change the YOLO network to use the provided ground truths.

YOLO

[category number] [object center in X] [object center in Y] [object width in X] [object width in Y]

MPII

JSON label:

[joint visibility][filename][coordinates of the head rectangle][coordinates of a joint with id of the joint and it's visibility]

Full Label:

As you can see the original labels have a lot more information than the JSON version this is because I’ve strip a lot of information because I thought it wasn’t needed for YOLO at the moment however, if I should discover that this isn’t the case then I can always change the python parser to get and these to the JSON labels.

Creating Labels for YOLO from MPII

So we could create a YOLO label each individual joint and then classify that in the image and then using computer vision techniques or another processing layer in the network draw the skeleton. I will be discussing other options in my next(Tuesday 20th February 2018) tutor meeting with my major project tutor.

Challenges faced this week

The MPII Pose dataset in it’s complete form is 425GB, which obviously requires a lot of time to download and I needed to get additional hard drives to store this data. Luckily Aberystwyth University has agreed to download the training data for me using a very speed connection.

Maping the ground truths from the MPII Pose Data to the YOLO label requirement. Still not sure how I’m going to do this yet, but I’ve got the ground truth data into a usable format at least.

Another Pose dataset that I’m also currently looking at is the VGG Human Pose Estimation datasets. Still looking at the ground truths and changing the parser to support this new dataset labels.

YOLO still requires their labels to be a .txt format however until I found what these labels are going to contain, I won’t bother with finding or writing a JSON to Text parser just yet.