An investigation of early and late collapse of language models in medical applications

E.V. Bobrova, K.S. Zaytsev, D.K. Sviridenko, D.V. Kholod, E.V. Dyuldin

Abstract


The aim of the work is a comprehensive analysis of the mechanisms of early collapse of language models working with medical texts during their recursive training using the example of the Mistral-7B and LLaMA-3 architectures. An experimental study of the dynamics of perplexity change, BLEU and ROUGE metrics, as well as the probability distribution of tokens in the process of multi-generation synthetic training was conducted. Two types of model collapse are identified: early (characterized by rapid degradation of probability distributions) and late (with a gradual decrease in the diversity of generation). It is established that the Mistral model demonstrates greater resistance to data collapse compared to LLaMA, which is due to the features of its architecture with a sliding window attention mechanism. The paper proposes a new methodological approach to quantifying the degradation of language models and formulates practical recommendations for preventing the loss of model diversity during recursive learning. The study was conducted on text cytological data used in the diagnosis of thyroid diseases.

Full Text:

PDF (Russian)

References


Cooper N., Scholak T. Perplexed: Understanding when large language models are confused //arXiv preprint arXiv:2404.06634. – 2024

Mezzoudj F., Benyettou A. An empirical study of statistical language models: n-gram language models vs. neural network language models //International Journal of Innovative Computing and Applications. – 2018. – Т. 9. – №. 4. – С. 189-202.

Gritsai, G.M., Khabutdinov, I.A. & Grabovoy, A.V. Stack More LLM’s: Efficient Detection of Machine-Generated Texts via Perplexity Approximation. Dokl. Math. 110 (Suppl 1), S203–S211 (2024): https://doi.org/10.1134/S1064562424602075

Canvas4Everyone. Unraveling the Mystery of Perplexity: A Deep Dive into Likelihood Scores [Электронный ресурс]. URL: https://canvas4everyone.com/blogs/news/unraveling-the-mystery-of-perplexity-a-deep-dive-into-likelihood-scores (дата обращения: 27.03.2025).

Chang Y. et al. A survey on evaluation of large language models //ACM transactions on intelligent systems and technology. – 2024. – Т. 15. – №. 3. – С. 1-45.

UpTrain Blog. Decoding Perplexity and Its Significance in LLMs [Электронный ресурс]. URL: https://blog.uptrain.ai/decoding-perplexity-and-its-significance-in-llms/ (дата обращения: 27.03.2025).

Madala, Sudheer. Introduction to Probability Theory in NLP [Электронный ресурс] // Scaler Topics. URL: https://www.scaler.com/topics/nlp/probability-theory-nlp/ (дата обращения: 27.03.2025).

Gu J. et al. Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation //arXiv preprint arXiv:2404.09043. – 2024.

Ali S, Cibas E. The Bethesda System for Reporting Thyroid Cytopathology. (Ali SZ, Cibas ES, eds.). Cham: Springer International Publishing; 2018. doi: https://doi.org/10.1007/978-3-319-60570-8

Ali SZ, Baloch ZW, Cochand-Priollet B, Schmitt FC, Vielh P, VanderLaan PA. The 2023 Bethesda System for Reporting Thyroid Cytopathology. Thyroid®. July 2023. doi: https://doi.org/10.1089/thy.2023.0141

Papineni K. et al. Bleu: a method for automatic evaluation of machine translation //Proceedings of the 40th annual meeting of the Association for Computational Linguistics. – 2002. – С. 311-318.

Lin C. Y. Rouge: A package for automatic evaluation of summaries //Text summarization branches out. – 2004. – С. 74-81.

Shumailov I. et al. AI models collapse when trained on recursively generated data //Nature. – 2024. – Т. 631. – №. 8022. – С. 755-759.

Allen-Zhu Z., Li Y. Physics of language models: Part 3.3, knowledge capacity scaling laws //arXiv preprint arXiv:2404.05405. – 2024.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИБП для ЦОД СНЭ

ISSN: 2307-8162