In conjunction with this workshop, we will hold three challenges this year.


Weakly-supervised Semantic Segmentation

This track targets on learning to perform object semantic segmentation using image-level annotations as supervision [1, 2, 3]. The dataset is built upon the image detection track of ImageNet Large Scale Visual Recognition Competition (ILSVRC) [4], which totally includes 456, 567 training images from 200 categories. We provide pixel-level annotations of 15K images (validation/testing: 5, 000/10, 000) for evaluation.


Weakly-supervised Scene Parsing

This track targets on learning to perform scene parsing using points-based annotation as supervision. The dataset is built upon the ADE20K dataset [5]. There are 20,210 images in the training set, 2,000 images in the validation set, and 3,000 images in the testing set. We provide the additional point-based annotations on the training set [6].


Weakly-supervised Object Localization

This track targets on making the classification networks be equipped with the ability of object localization [7, 8, 9]. The dataset is built upon the image classification/localization track of ImageNet Large Scale Visual Recognition Competition (ILSVRC), which totally includes 1.2 million training images from 1000 categories. We provide pixel-level annotations of 44, 271 images (validation/testing: 23, 151/21, 120) for evaluation.

  • Evalution: IoU curve. With the predicted object localization map, we calculate the IoU scores between the foreground pixels and the ground-truth masks under different thresholds. In the ideal curve, the highest IoU score is expected to close to 1.0. The threshold value corresponding to the highest IoU score is expected to be 255 since the higher threshold values can reflect a higher contrast between the target object and the background.
  • Download: validation dataset, test list and evaluation scripts are available at Baidu Drive (pwd: z5yp) and Google Drive
  • Submission:


This year, we have two strict rules for all competitors.

  1. For training, only the images provided in the training set are permitted. Competitors can use the classification models pre-trained on the training set of ILSVRC CLS-LOC to initialize the parameters but CANNOT leverage any datasets with pixel-level annotations. In particular, for Track 1 and Track 3, only the image-level annotations of training images can be leveraged for supervision and the bounding-box annotations are NOT permitted.
  2. We encourage competitors to design elegant and effective models competing for all the tracks rather than ensembling multiple models. Therefore, we restrict the parameter size of the inference model(s) should be LESS than 150M (slightly more than two DeepLab V3+ [10] models using Resnet 101 as the backbone). The competitors ranked at Top-3 are required to submit the inference code for verification.


This year, Baidu Inc will provide cash awards to the winners of each track. Participants are encouraged to submit the inference code based on the deep learning platform PaddlePaddle , especially on the semantic segmentation toolkit PaddleSeg. Winners will receive a cash award of USD 2000 if they use the PaddlePaddle platform or a USD 500 cash award if other deep learning platforms are used.


[1] George Papandreou, Liang-Chieh Chen, Kevin Murphy, and Alan L Yuille. Weakly-and semi-supervised learning of a dcnn for semantic image segmentation. In ICCV, 2015.

[2] Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, and Thomas S Huang. Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In CVPR, 2018.

[3] Peng-Tao Jiang, Qibin Hou, Yang Cao, Ming-Ming Cheng, Yunchao Wei, and Hong-Kai Xiong. Integral object mining via online attention accumulation. In ICCV, 2019.

[4] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: a large-scale hierarchical image database. In CVPR, 2009.

[5] Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. In CVPR, 2017.

[6] Rui Qian, Yunchao Wei, Honghui Shi, Jiachen Li, Jiaying Liu, and Thomas Huang. Weakly Supervised Scene Parsing with Point-based Distance Metric Learning. In AAAI, 2019

[7] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning Deep Features for Discriminative Localization. In IEEE CVPR, 2016.

[8] Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, and Thomas Huang. Adversarial complementary learning for weakly supervised object localization. In IEEE CVPR, 2018.

[9] Xiaolin Zhang, Yunchao Wei, Guoliang Kang, Yi Yang, and Thomas Huang. Self-produced guidance for weakly-supervised object localization. In ECCV, 2018.

[10] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.