Getting pose ground truth data for YOLO

3 min readMar 9, 2018

So YOLO requires a ground truth text file that has the class number and box dimension — centre point(x,y), width and height.

What do we have? well…

All 16 joints with there image coordinates and if they are visible or not. Along with a bounding box for the head.

Attempt One

To draw rough bounding boxes around each joint with a centre point as the joint coordinate and box with and height of lets a 50px. Should look something like this:

OpenCV code to calculate boxes around each joint. What the YOPO output should look like.

However each image might be a different size or the pose in the image may have a size ratio compared to the image.

Creating Ground Truth

Each image e.g. image.jpg requires a text file called image.txt describing all the class and bounding boxes inside the image.

For my first attempt I drew a 50 by 50 pixels bounding boxes around the each of the joints and labelled each joint with it’s joint id. For this initial run I used 1000 images with 200 test images, I didn’t expect good results but I wanted to get the networking working and make sure all my GPU drives were working correctly.

As for the ground truth text file itself, I started with a joint’s centre point and a box width and height and it’s joint id. YOLO expects these in a format where the box dimensions are related to the image dimensions.

Below I divide the box dimensions by the corresponding image dimensions so:

x = x / img_width
y = y / img_height
box_w = 25 * Scale of pose / img_width
box_h = 25 * Scale of pose / img_height

Note on the box_w and box_h this is just testing out the MPII Human Pose dataset. These number will need to be changed based on the dataset you are using.

This produce number between 0–1 with a total output of something like this:

CLASS NUMBER, X , Y, WIDTH, HEIGHT

1 0.546875 0.7861111111111111 0.0390625 0.06944444444444445

Then you want a folder structure something like this:

As you can see here each image has a text file describing each class in the image. For YOPO each file would have up to 16 classes it would have less if the joint was not visible in the image.

Training on images that have upward orientation bounding boxes have fairly well for network only being trained on 150ish images see below:

However on images that have humans with poses that aren’t mapped by upward oriented bounding boxes, it yields very poor results and in some cases no boxes are found. The network was train for two days on nvidia 1070 GTX which resulted in 20,000 training iterations. With an average IOU of around 3 and wasn’t going any lower. Around 25,000 training iterations the average loss was bouncing up and down which isn’t ideal because the average loss should going down over time with a target value of 0.6.

So for next week I need to look into why it’s not going and a different way of approaching the problem.

References

Yolo Single Image Train ( Single Object Training)

yolo train with custom object (ubuntu 16.04)

medium.com

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Machine Learning

Computer Vision

Written by Richard Price-Jones

31 Followers

39 Following

Software Engineer, Interested Finance and Tech

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Level Up Coding

Jacob Bennett

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jan 7

260

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30, 2024

732

Lists

AI Regulation

6 stories708 saves

ChatGPT prompts

51 stories2643 saves

Generative AI Recommended Reading

52 stories1691 saves

Tech & Tools

23 stories409 saves

CodeX

AI Rabbit

Goodbye Obsidian

Feb 6

How I Am Using a Lifetime 100% Free Server

Harendra

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Oct 26, 2024

170

Getting Started with Lynx: A Next-Gen Cross-Platform Framework

Dilshara Hetti Arachchige

Getting Started with Lynx: A Next-Gen Cross-Platform Framework

The world of cross-platform app development just got a major shake-up. ByteDance, the company behind TikTok, has introduced Lynx, a…

6d ago

Predict

Will Lockett

This Is How Tesla Will Die

The vultures are circling the tech giant.

5d ago

137

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams