Development of Cross-Language Embeddings for Extracting Chemical Structures from Texts in Russian and English
Abstract
Full Text:
PDFReferences
T. Mikolov, K. Chen, G. Corrado, J. Dean. Efficient Estimation of Word Representations in Vector Space. https://doi.org/10.48550/arXiv.1301.3781;
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, Winter 1989;
H. Sak, A. Senior, F. Beaufays. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition https://doi.org/10.48550/arXiv.1402.1128;
Attention Is All You Need A. Vaswani, L. Jones, N. Shazeer, N. Parmar, J. Uszkoreit, A. N. Gomez, Ł. Kaiser, I. Polosukhin https://doi.org/10.48550/arXiv.1706.03762;
Sciapp [Electronic resource]. – URL: https://sciapp.ru/ (accessed: 19.09.2024)
Taboureau O. et al. ChemProt: a disease chemical biology database //Nucleic Acids Research. – 2010. – Vol. 39. – No. suppl_1. – pp. D367-D372.
mBERT base model [Electronic resource]. – URL: https://huggingface.co/google-bert/bert-base-multilingual-cased (accessed: 19.09.2024)
Li, B., He, Y., & Xu, W. (2021). Cross-lingual Named Entity Recognition Using Parallel Corpus: A New Approach Using XLM-Roberta Alignment. arXiv preprint arXiv:2101.11112.
Chipman H. A. et al. mBART: Multidimensional Monotone BART //Bayesian Analysis. – 2022. – Vol. 17. – No. 2. – pp. 515-544.
F. Feng, Y. Yang, D. Cer, N. Arivazhagan, W. Wang. Language-agnostic BERT Sentence Embedding //arXiv preprint arXiv:2007.01852. – 2020, doi: https://doi.org/10.48550/arXiv.2007.01852.
LaBSE base model [Internet resource]. – URL: https://huggingface.co/cointegrated/LaBSE-en-ru (accessed: 19.09.2024)
X. Ouyang, S. Wang, C. Pang, Y. Sun, H. Tian, H. Wu. ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora, 2020, https://doi.org/10.48550/arXiv.2012.15674.
Mikel Artetxe, Holger Schwenk; Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. Transactions of the Association for Computational Linguistics 2019; 7 597–610. doi: https://doi.org/10.1162/tacl_a_00288.
F. Luo, W. Wang, J. Liu, Y. Liu, B. Bi. VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation. 2020, https://doi.org/10.48550/arXiv.2010.16046.
Y. Fang, S. Wang, Z. Gan, S. Sun, J. Liu. FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding. 2020, https://doi.org/10.48550/arXiv.2009.05166.
H. Huang, Y. Liang, N. Duan, M. Gong, L. Shou. Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks. 2019, https://doi.org/10.48550/arXiv.1909.00964.
Aroca-Ouellette, S., and Rudzicz, F. (2020). "On Losses for Modern Language Models," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics), 4970–4981. Available online at: https://www.aclweb.org/anthology/2020.emnlp-main.403
Lukashkina Yu. N., Vorontsov K. V. Assessing Stability and Completeness of Topic Models of Multidisciplinary Text Collections. [Electronic resource]. – URL: http://www.machinelearning.ru/wiki/images/4/4b/Lukashkina2017MSc.pdf (accessed: 19.10.2024)
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: A Pre-trained Biomedical Language Representation Model for Biomedical Text Mining. Bioinformatics, 36(4), 1234-1240.
S. Chithrananda, B. Ramsundar, G. Grand. ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction https://doi.org/10.48550/arXiv.2010.09885;
BERT base model https://huggingface.co/google-bert/bert-base-uncased.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность ИБП для ЦОД СНЭ
ISSN: 2307-8162