Convert Universal Data Tool files to the YOLO format
Converts files in the Universal Data Tool format to the
YOLOv1.1 format.
Note: We do a variation on the YOLO format. Each sample gets it’s own subdirectory which is a valid YOLO
dataset. We may change this in the future.
You’ll need to have npm installed and ffmpeg installed, then run the command below to
convert the UDT file into the yolo directory.
npx udt-to-yolo ./dataset.udt.json -o yolo-dir
There doesn’t seem to be a formal spec for the YOLOv1.1 format, but the directory
structure is simple enough that we can describe it based on the output of programs
like CVAT.
The YOLOv1.1 format is a directory organized like this:
.
├── obj.data ini-like file with dataset stats and paths
├── obj.names labels, each label has new line
├── train.txt lists all the images
└── obj_train_data directory containing images and bounding box txt files
├── frame_000000.PNG image
├── frame_000000.txt bounding boxes for image with same name
├── frame_000001.PNG etc.
├── frame_000001.txt
├── frame_000002.PNG
├── frame_000002.txt
├── frame_000003.PNG
└── frame_000003.txt
Contains key-value pairs.
Key | Description |
---|---|
classes | Number of labels |
train | path to train.txt (relative to “data”, the main directory) |
names | path to label names (relative to “data”, the main directory) |
backup | ??? |
classes = 3
train = data/train.txt
names = data/obj.names
backup = backup/
Each label that appears in the dataset.
label1
label2
label3
Paths to every image frame relative to “data” (main directory).
data/obj_train_data/frame_000000.PNG
data/obj_train_data/frame_000001.PNG
data/obj_train_data/frame_000002.PNG
data/obj_train_data/frame_000003.PNG
data/obj_train_data/frame_000004.PNG
data/obj_train_data/frame_000005.PNG
Each frame of the video, or each image of the dataset.
Lists all the bounding boxes of the image. Each line is a bounding box. The line represents
the <label index (starting at 1)> <leftmost X position> <topmost Y position> <width> <height>
.
The unit of the X, Y, Width and Height are all fractions of the image. So for example, if you have an
image that is (1000 width, 800 height), and has a bounding box that starts at position (100px from left, 200px from top) with a width of 250 pixels and a height of 300 pixels. Let’s say this box uses the second label. You would have the following YOLO line:
2 0.1 0.25 0.25 0.375
Here’s the step in-between:
2 100/1000 200/800 250/1000 300/800
1 0.813552 0.562875 0.033875 0.035104
2 0.813552 0.562875 0.033875 0.035104