Please use this identifier to cite or link to this item: http://hdl.handle.net/10995/101404
Title: SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis
Authors: Efimov, P.
Chertok, A.
Boytsov, L.
Braslavski, P.
Issue Date: 2020
Publisher: Springer Science and Business Media Deutschland GmbH
Citation: SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis / P. Efimov, A. Chertok, L. Boytsov, et al. — DOI 10.1007/978-3-030-58219-7_1 // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). — 2020. — Vol. 12260 LNCS. — P. 3-15.
Abstract: The paper presents SberQuAD – a large Russian reading comprehension (RC) dataset created similarly to English SQuAD. SberQuAD contains about 50K question-paragraph-answer triples and is seven times larger compared to the next competitor. We provide its description, thorough analysis, and baseline experimental results. We scrutinized various aspects of the dataset that can have impact on the task performance: question/paragraph similarity, misspellings in questions, answer structure, and question types. We applied five popular RC models to SberQuAD and analyzed their performance. We believe our work makes an important contribution to research in multilingual question answering. © 2020, Springer Nature Switzerland AG.
Keywords: EVALUATION
MULTILINGUAL QUESTION ANSWERING
READING COMPREHENSION
RUSSIAN LANGUAGE RESOURCES
ASSOCIATION REACTIONS
NATURAL LANGUAGE PROCESSING SYSTEMS
QUESTION ANSWERING
QUESTION TYPE
RC MODELS
READING COMPREHENSION
TASK PERFORMANCE
LARGE DATASET
URI: http://hdl.handle.net/10995/101404
Access: info:eu-repo/semantics/openAccess
SCOPUS ID: 85092191483
PURE ID: 14123133
5932df21-7a79-4285-ae44-5931887a552d
ISSN: 3029743
ISBN: 9783030582180
DOI: 10.1007/978-3-030-58219-7_1
metadata.dc.description.sponsorship: We thank Peter Romov, Vladimir Suvorov, and Ekaterina Arte-mova (Chernyak) for providing us with details about SberQuAD preparation. We also thank Natasha Murashkina for initial data processing. PB acknowledges support by Ural Mathematical Center under agreement No. 075-02-2020-1537/1 with the Ministry of Science and Higher Education of the Russian Federation.
Appears in Collections:Научные публикации, проиндексированные в SCOPUS и WoS CC

Files in This Item:
File Description SizeFormat 
2-s2.0-85092191483.pdf322,02 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.