PubLayNet

PubLayNet is a large dataset of document images, of which the layout is annotated with both bounding boxes and polygonal segmentations. For more information, see PubLayNet original


PMC4334925_00006.jpg	PMC538274_00004.jpg

Recent updates

15/Sept/2020 - Add training code.

29/Feb/2020 - Add benchmarking for maskrcnn_resnet50_fpn.

22/Feb/2020 - Pre-trained Mask-RCNN model in (Pytorch) are released .

Benchmarking

Architecture	Iter num (x16)	AP	AP50	AP75	AP Small	AP Medium	AP Large	MD5SUM
MaskRCNN-Resnet50-FPN	196k	0.91	0.98	0.96	0.41	0.76	0.95	393e6700095a673065fcecf5e8f264f7

Demo

Download trained weights in Benchmarking section above, locate it in maskrcnn directory

Run

cd maskrcnn
python infer.py --image_path = "document_image_dir/image.jpg" --model_path = "mrcnn_model_dir/model.pth" --output_path="model_segmentation_output_dir/"

Avarage Precision in validation stages (via Tensorboard)

Training

Please take a look at training_code dir. Sorry for the dirty code but I really don’t have time to refactor it :D