Modern Methods for Training Large Language Models with Minimal Data: From One Example to Absolute Zero – an Academic Review

Alexey Pichugov, Dmitry Namiot, Elena Zubareva

Abstract


This academic review systematizes and analyzes breakthrough approaches to training large language models (LLM) developed between 2022 and 2025, with an emphasis on radically reducing or eliminating the reliance on human-labeled data. The paper examines in detail methodologies such as single-shot reinforcement learning, the Absolute Zero Reasoner paradigm of fully autonomous learning, test-time learning, low-example efficient learning, and self-generated curriculum via task decomposition. The key results of these approaches on standard benchmarks are analyzed, and the emergent cognitive properties of the models, cross-domain effects, and associated risks and ethical aspects are discussed. The review is intended for students, teachers, researchers, and practitioners in the fields of artificial intelligence and natural language processing seeking to understand the cutting edge of research in the field of effective LLM teaching.

Full Text:

PDF (Russian)

References


Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, et al., "Constitutional AI: Harmlessness from AI Feedback," 2022, arXiv:2212.08073, doi: 10.48550/arXiv.2212.08073.

E. Ben Zaken, S. Ravfogel, and Y. Goldberg, "BitFit: Simple Parameter efficient Fine tuning for Transformer based Masked Language models," 2022, arXiv:2106.10199, doi: 10.48550/arXiv.2106.10199.

X. Chen, Y. Deng, M. Wang, and Y. Zhang, "Skeleton of Thought: Prompting Large Language Models for Efficient Parallel Generation," 2024, arXiv:2307.15337, doi: 10.48550/arXiv.2307.15337.

T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, "QLoRA: Efficient Finetuning of Quantized Large Language Models," 2023, arXiv:2305.14314, doi: 10.48550/arXiv.2305.14314.

Z. Shi and A. Lipani, "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning," 2023, arXiv:2309.05173, doi: 10.48550/arXiv.2309.05173.

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, "LoRA: Low Rank Adaptation of Large Language Models," 2021, arXiv:2106.09685, doi: 10.48550/arXiv.2106.09685.

T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, "Large Language Models are Zero Shot Reasoners," 2023, arXiv:2205.11916, doi: 10.48550/arXiv.2205.11916.

B. Lester, R. Al-Rfou, and N. Constant, "The Power of Scale for Parameter Efficient Prompt Tuning," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov. 2021, pp. 3045-3059, doi: 10.18653/v1/2021.emnlp-main.243.

X. L. Li, and P. Liang, "Prefix-Tuning: Optimising Continuous Prompts for Generation," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Aug. 2021, pp. 4582-4597, doi: 10.18653/v1/2021.acl-long.353.

A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, et al., "SELF-REFINE: iterative refinement with self-feedback," in Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23), Dec. 2023, Art. no. 2019, pp. 46534-46594. Available: https://papers.nips.cc/paper_files/paper/2023/file/91edff07232fb1b55a505a9e9f6c0ff3-Paper-Conference.pdf

S. Min, M. Lewis, L. Zettlemoyer, and H. Hajishirzi, "MetaICL: Learning to Learn In Context," in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jul. 2022, pp. 2791-2809, doi: 10.18653/v1/2022.naacl-main.201.

N. Ding, Y. Qin, G. Yang, F. Wei, Z. Yang, Y. Su, et al., "Parameter-efficient fine-tuning of large-scale pre-trained language models," Nature Machine Intelligence, vol. 5, no. 3, pp. 220-235, Mar. 2023, doi: 10.1038/s42256-023-00626-4.

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, et al., "Training language models to follow instructions with human feedback," in Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), Oct. 2022, pp. 1-15. Available: https://openreview.net/pdf?id=TG8KACxEON

Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, "Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey," 2024, arXiv:2403.14608, doi: 10.48550/arXiv.2403.14608.

R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P.

Liang, and T. B. Hashimoto, "Alpaca: A Strong, Replicable Instruction Following Model," Stanford CRFM Tech. Rep., 2023. Available: https://crfm.stanford.edu/2023/03/13/alpaca.html

X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou, "Self-Consistency Improves Chain of Thought Reasoning in Language Models," 2023, arXiv:2203.11171, doi: 10.48550/arXiv.2203.11171.

Y. Wang, Q. Yang, Z. Zeng, L. Ren, L. Liu, B. Peng, et al.,

"Reinforcement Learning for Reasoning in Large Language Models with One Training Example," 2025, arXiv:2504.20571, doi: 10.48550/arXiv.2504.20571.

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, Ed H. Chi, Q. V. Le, and D. Zhou, "Chain-of-thought prompting elicits reasoning in large language models," in Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS '22), Nov. 2022, Art. no. 1800, pp. 24824-24837. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf

Y. Wu, et al., "Self-Play Preference Optimization for Language Model Alignment," 2024, arXiv:2405.00675, doi: 10.48550/arXiv.2405.00675.

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, "ReAct: Synergizing Reasoning and Acting in Language Models," 2023, arXiv:2210.03629, doi: 10.48550/arXiv.2210.03629.

H. Lee, et al. "RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback," 2023, arXiv:2309.00267, doi: 10.48550/arXiv.2309.00267.

B. Lester, et al., "Reducing Retraining by Recycling Parameter-Efficient Prompts," 2022, arXiv:2208.05577, doi: 10.48550/arXiv.2208.05577.

N. Ding, et al., "Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models," 2022, arXiv:2203.06904, doi: 10.48550/arXiv.2203.06904.

W. Yuan, R. Y. Pang, K. Cho, X. Li, S. Sukhbaatar, J. Xu, and J. Weston, "Self-rewarding language models," in Proceedings of the 41st International Conference on Machine Learning (ICML'24), Jul. 2024, Art. no. 2389, pp. 57905-57923. Available: https://proceedings.mlr.press/v235/yuan24d.html

R. Rafailov, et al., "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," 2024, arXiv:2305.18290, doi: 10.48550/arXiv.2305.18290.

A. Zhao, Y. Wu, Y. Yue, T. Wu, Q. Xu, Y. Yue, et al., "Absolute Zero: Reinforced Self Play Reasoning with Zero Data," 2025, arXiv:2505.03335, doi: 10.48550/arXiv.2505.03335.

C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y. Mao, et al., "LIMA: less is more for alignment," in: Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23), Dec. 2023, Art. no. 2400, pp. 55006-55021. Available: https://openreview.net/pdf?id=KBMOKmX2he

Z. Sun, et al., "Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision,". In: Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23), Dec. 2023, Art. no. 115, pp. 2511-2565. Available: https://openreview.net/pdf?id=p40XRfBX96

Y. Zuo, K. Zhang, L. Sheng, S. Qu, G. Cui, X. Zhu, et al., "TTRL: Test-Time Reinforcement Learning," 2025, arXiv:2504.16084, doi: 10.48550/arXiv.2504.16084.

T. Simonds and A. Yoshiyama, "LADDER: Self-Improving LLMs Through Recursive Problem Decomposition," 2025, arXiv:2503.00735, doi: 10.48550/arXiv.2503.00735.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИБП для ЦОД СНЭ

ISSN: 2307-8162