Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на этот ресурс: http://elar.urfu.ru/handle/10995/103128
Полная запись метаданных
Поле DCЗначениеЯзык
dc.contributor.authorTarasov, D.en
dc.date.accessioned2021-08-31T15:07:39Z-
dc.date.available2021-08-31T15:07:39Z-
dc.date.issued2020-
dc.identifier.citationTarasov D. Language attribution of an unmarked text corpus / D. Tarasov. — DOI 10.37394/23203.2020.15.75 // WSEAS Transactions on Systems and Control. — 2020. — Vol. 15. — P. 754-759.en
dc.identifier.issn19918763-
dc.identifier.otherFinal2
dc.identifier.otherAll Open Access, Bronze3
dc.identifier.otherhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85099953727&doi=10.37394%2f23203.2020.15.75&partnerID=40&md5=ae68a5cc35e2b5ade3ad4cb1ee1346bf
dc.identifier.otherhttps://doi.org/10.37394/23203.2020.15.75m
dc.identifier.urihttp://elar.urfu.ru/handle/10995/103128-
dc.description.abstractUnmarked text corps will increasingly appear with the growth of information on the web. Automated analysis of Big Data in search engines, scientific and commercial applications requires detailed information about the object under study. In the case of text bodies, information on the language of the documents is extremely important. Working with the scanned texts the situation is even more complicated. In this paper, the idea of using the fractal-inspired irregularity to attribute the language of the text is being further developed. A methodology for the attribution is proposed and an experiment based on 10 European languages is conducted. The proposed approach has shown its effectiveness and promise. A selection of approximately 4000 characters (1 page of text) allows you to uniquely attribute the language of the text. © 2020, World Scientific and Engineering Academy and Society. All rights reserved.en
dc.format.mimetypeapplication/pdfen
dc.language.isoenen
dc.publisherWorld Scientific and Engineering Academy and Societyen
dc.rightsinfo:eu-repo/semantics/openAccessen
dc.sourceWSEAS Trans. Syst. Control2
dc.sourceWSEAS Transactions on Systems and Controlen
dc.subjectBIG DATAen
dc.subjectFRACTALen
dc.subjectIRREGULARITYen
dc.subjectLANGUAGEen
dc.titleLanguage attribution of an unmarked text corpusen
dc.typeArticleen
dc.typeinfo:eu-repo/semantics/articleen
dc.typeinfo:eu-repo/semantics/publishedVersionen
dc.identifier.doi10.37394/23203.2020.15.75-
dc.identifier.scopus85099953727-
local.contributor.employeeTarasov, D., Department of IT and Automation, Ural Federal University, Mira 32 – R041, Ekaterinburg, 620002, Russian Federation
local.description.firstpage754-
local.description.lastpage759-
local.volume15-
local.contributor.departmentDepartment of IT and Automation, Ural Federal University, Mira 32 – R041, Ekaterinburg, 620002, Russian Federation
local.identifier.pure20889886-
local.identifier.puree5a0334b-66ba-49c0-8a1e-1d903bc266feuuid
local.identifier.eid2-s2.0-85099953727-
Располагается в коллекциях:Научные публикации ученых УрФУ, проиндексированные в SCOPUS и WoS CC

Файлы этого ресурса:
Файл Описание РазмерФормат 
2-s2.0-85099953727.pdf1,88 MBAdobe PDFПросмотреть/Открыть


Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.