Visual Genome contains Visual Question Answering data in a multi-choice setting. It consists of 101,174 images from MSCOCO with 1.7 million QA pairs, 17 questions per image on average. Compared to the Visual Question Answering dataset, Visual Genome represents a more balanced distribution over 6 question types: What, Where, When, Who, Why and How. The Visual Genome dataset also presents 108K images with densely annotated objects, attributes and relationships.
声明:本站所有文章,如无特殊说明或标注,本站所有资源来源于网络,版权均属于原作者所有。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。