Create MS COCO style dataset Create MS COCO style dataset json json

Create MS COCO style dataset


I'm working on a python library which has many useful classes and functions for doing this. It's called Image Semantics.

Here is an example of adding masks and exporting them in COCO format:

from imantics import Mask, Image, Categoryimage = Image.from_path('path/to/image.png')mask = Mask(mask_array)image.add(mask, category=Category("Category Name"))# dict of cocococo_json = image.export(style='coco')# Saves to fileimage.save('coco/annotation.json', style='coco')


You can try to use pycococreator, which includes a set of tools to convert binary masks to the polygon and RLE formats that COCO uses.

https://github.com/waspinator/pycococreator/

Here is an example of how you could use it to create annotation information from a binary mask:

annotation_info = pycococreatortools.create_annotation_info(                    segmentation_id, image_id, category_info, binary_mask,                    image.size, tolerance=2)

You can read more details about how to use pycococreator here:https://patrickwasp.com/create-your-own-coco-style-dataset/


In order to convert a mask array of 0's and 1's into a polygon similar to the COCO-style dataset, use skimage.measure.find_contours, thanks to code by waleedka.

import numpyfrom skimage.measure import find_contours mask = numpy.zeros(width, height) # Maskmask_polygons = [] # Mask Polygons# Pad to ensure proper polygons for masks that touch image edges.padded_mask = np.zeros((mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8)padded_mask[1:-1, 1:-1] = maskcontours = find_contours(padded_mask, 0.5)for verts in contours:    # Subtract the padding and flip (y, x) to (x, y)    verts = np.fliplr(verts) - 1    pat = PatchCollection([Polygon(verts, closed=True)], facecolor='green', linewidths=0, alpha=0.6)    mask_polygons.append(pat) 

To generate the JSON file for a COCO-style dataset, you should look into the Python's JSON API. Beyond that, it's just simply about matching the format used by the COCO dataset's JSON file.

You should take a look at my COCO style dataset generator GUI repo. I built a very simple tool to create COCO-style datasets.

The specific file you're interested in is create_json_file.py, which takes matplotlib polygon coordinates in the form (x1, y1, x2, y2 ...) for every polygon annotation and converts it into the JSON annotation file quite similar to the default format of COCO.