Please use this identifier to cite or link to this item: http://hdl.handle.net/10995/103284
Title: RuBQ: A Russian Dataset for Question Answering over Wikidata
Authors: Korablinov, V.
Braslavski, P.
Issue Date: 2020
Publisher: Springer Science and Business Media Deutschland GmbH
Citation: Korablinov V. RuBQ: A Russian Dataset for Question Answering over Wikidata / V. Korablinov, P. Braslavski. — DOI 10.1007/978-3-030-62466-8_7 // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). — 2020. — Vol. 12507 LNCS. — P. 97-110.
Abstract: The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification. The freely available dataset will be of interest for a wide community of researchers and practitioners in the areas of Semantic Web, NLP, and IR, especially for those working on multilingual question answering. The proposed dataset generation pipeline proved to be efficient and can be employed in other data annotation projects. © 2020, Springer Nature Switzerland AG.
Keywords: EVALUATION
KNOWLEDGE BASE QUESTION ANSWERING
RUSSIAN LANGUAGE RESOURCES
SEMANTIC PARSING
KNOWLEDGE BASED SYSTEMS
LARGE DATASET
NATURAL LANGUAGE PROCESSING SYSTEMS
AUTOMATIC FILTERING
AUTOMATIC GENERATION
DATA ANNOTATION
KNOWLEDGE BASE
MACHINE TRANSLATIONS
QUESTION ANSWERING
QUESTION-ANSWER PAIRS
SPARQL QUERIES
SEMANTIC WEB
URI: http://hdl.handle.net/10995/103284
Access: info:eu-repo/semantics/openAccess
SCOPUS ID: 85096596949
PURE ID: 20220236
0428da86-88a9-4f2e-8056-6bc739fa0a8e
ISSN: 3029743
ISBN: 9783030624651
DOI: 10.1007/978-3-030-62466-8_7
metadata.dc.description.sponsorship: We thank Mikhail Galkin, Svitlana Vakulenko, Daniil Sorokin, Vladimir Kovalenko, Yaroslav Golubev, and Rishiraj Saha Roy for their valuable comments and fruitful discussion on the paper draft. We also thank Pavel Bakhvalov, who helped collect RuWikidata8M sample and contributed to the first version of the entity linking tool. We are grateful to Yandex.Toloka for their data annotation grant. PB acknowledges support by Ural Mathematical Center under agreement No. 075-02-2020-1537/1 with the Ministry of Science and Higher Education of the Russian Federation.
Appears in Collections:Научные публикации, проиндексированные в SCOPUS и WoS CC

Files in This Item:
File Description SizeFormat 
2-s2.0-85096596949.pdf544,34 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.