Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на этот ресурс:
http://elar.urfu.ru/handle/10995/131416
Название: | Natural language processing for clusterization of genes according to their functions |
Авторы: | Dordiuk, V. Demicheva, E. Espino, F. P. Ushenin, K. |
Дата публикации: | 2022 |
Издатель: | Institute of Electrical and Electronics Engineers Inc. |
Библиографическое описание: | Dordiuk, V, Demicheva, E, Espino, FP & Ushenin, K 2022, Natural language processing for clusterization of genes according to their functions. в Proceedings - 2022 Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine, CSGB 2022. Proceedings - 2022 Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine, CSGB 2022, Institute of Electrical and Electronics Engineers Inc., стр. 1-4. https://doi.org/10.1109/CSGB56354.2022.9865330 Dordiuk, V., Demicheva, E., Espino, F. P., & Ushenin, K. (2022). Natural language processing for clusterization of genes according to their functions. в Proceedings - 2022 Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine, CSGB 2022 (стр. 1-4). (Proceedings - 2022 Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine, CSGB 2022). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CSGB56354.2022.9865330 |
Аннотация: | There are hundreds of methods for analysis of data obtained in mRNA-sequencing. The most of them are focused on small number of genes. In this study, we propose an approach that reduces the analysis of several thousand genes to analysis of several clusters. The list of genes is enriched with information from open databases. Then, the descriptions are encoded as vectors using the pretrained language model (BERT) and some text processing approaches. The encoded gene function pass through the dimensionality reduction and clusterization. Aiming to find the most efficient pipeline, 180 cases of pipeline with different methods in the major pipeline steps were analyzed. The performance was evaluated with clusterization indexes and expert review of the results. © 2022 IEEE. |
Ключевые слова: | BERT CLUSTERIZATION DIFFERENTIAL GENE EXPRESSION ANALYSIS GENE EXPRESSION GENE ONTOLOGY NATURAL LANGUAGE PROCESSING SEMANTIC ANALYSIS GENE ONTOLOGY NATURAL LANGUAGE PROCESSING SYSTEMS PIPELINES SEMANTICS TEXT PROCESSING BERT CLUSTERIZATION DIFFERENTIAL GENE EXPRESSION ANALYSE DIFFERENTIAL GENE EXPRESSIONS GENE EXPRESSION ANALYSIS GENE ONTOLOGY GENES EXPRESSION LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING NATURAL LANGUAGES SEMANTIC ANALYSIS GENE EXPRESSION |
URI: | http://elar.urfu.ru/handle/10995/131416 |
Условия доступа: | info:eu-repo/semantics/openAccess |
Конференция/семинар: | 7 July 2022 through 8 July 2022 |
Дата конференции/семинара: | 2022 Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine, CSGB 2022 |
Идентификатор SCOPUS: | 85138478040 |
Идентификатор PURE: | 30979989 09b7ecd7-fbc7-4fce-a67d-a7d6086d37f3 |
ISBN: | 978-166545288-5 |
DOI: | 10.1109/CSGB56354.2022.9865330 |
Располагается в коллекциях: | Научные публикации ученых УрФУ, проиндексированные в SCOPUS и WoS CC |
Файлы этого ресурса:
Файл | Описание | Размер | Формат | |
---|---|---|---|---|
2-s2.0-85138478040.pdf | 1,86 MB | Adobe PDF | Просмотреть/Открыть |
Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.