2820

WDW数据集

Who-did-What Dataset

阅读理解 WDW 文本数据集 自然语言处理 文本分析

一个阅读理解数据集

免积分下载
数据集市
2020年11月30日
26 GB

相关数据

Twitter情感分析训练语料库
Twitter情感分析训练语料库
该情感分析数据集 包含1,578,627条分类推文,每行标记... 免积分下载
多领域情感评论文本数据集
多领域情感评论文本数据集
多领域情感数据集包含从Amazon.com获取的部分产品评论... 免积分下载
Euler图学习开源数据集
Euler图学习开源数据集
Euler图学习平台自研算法对应的开源图数据与样本数据 免积分下载

数据介绍

We have constructed a new "Who-did-What" dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus. The WDW dataset has a variety of novel features. First, in contrast with the CNN and Daily Mail datasets (Hermann et al., 2015) we avoid using article summaries for question formation. Instead, each problem is formed from two independent articles --- an article given as the passage to be read and a separate article on the same events used to form the question. Second, we avoid anonymization --- each choice is a person named entity. Third, the problems have been filtered to remove a fraction that are easily solved by simple baselines, while remaining 84% solvable by humans. We report performance benchmarks of standard systems and propose the WDW dataset as a challenge task for the community. ( ARTICLE HERE )

数据规格

发布时间 2011年6月17日
还没有任何文件记录.