Modern Methods for Training Large Language Models with Minimal Data: From One Example to Absolute Zero – an Academic Review
Abstract
Full Text:
PDF (Russian)References
Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, et al., "Constitutional AI: Harmlessness from AI Feedback," 2022, arXiv:2212.08073, doi: 10.48550/arXiv.2212.08073.
E. Ben Zaken, S. Ravfogel, and Y. Goldberg, "BitFit: Simple Parameter efficient Fine tuning for Transformer based Masked Language models," 2022, arXiv:2106.10199, doi: 10.48550/arXiv.2106.10199.
X. Chen, Y. Deng, M. Wang, and Y. Zhang, "Skeleton of Thought: Prompting Large Language Models for Efficient Parallel Generation," 2024, arXiv:2307.15337, doi: 10.48550/arXiv.2307.15337.
T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, "QLoRA: Efficient Finetuning of Quantized Large Language Models," 2023, arXiv:2305.14314, doi: 10.48550/arXiv.2305.14314.
Z. Shi and A. Lipani, "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning," 2023, arXiv:2309.05173, doi: 10.48550/arXiv.2309.05173.
E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, "LoRA: Low Rank Adaptation of Large Language Models," 2021, arXiv:2106.09685, doi: 10.48550/arXiv.2106.09685.
T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, "Large Language Models are Zero Shot Reasoners," 2023, arXiv:2205.11916, doi: 10.48550/arXiv.2205.11916.
B. Lester, R. Al-Rfou, and N. Constant, "The Power of Scale for Parameter Efficient Prompt Tuning," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov. 2021, pp. 3045-3059, doi: 10.18653/v1/2021.emnlp-main.243.
X. L. Li, and P. Liang, "Prefix-Tuning: Optimising Continuous Prompts for Generation," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Aug. 2021, pp. 4582-4597, doi: 10.18653/v1/2021.acl-long.353.
A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, et al., "SELF-REFINE: iterative refinement with self-feedback," in Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23), Dec. 2023, Art. no. 2019, pp. 46534-46594. Available: https://papers.nips.cc/paper_files/paper/2023/file/91edff07232fb1b55a505a9e9f6c0ff3-Paper-Conference.pdf
S. Min, M. Lewis, L. Zettlemoyer, and H. Hajishirzi, "MetaICL: Learning to Learn In Context," in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jul. 2022, pp. 2791-2809, doi: 10.18653/v1/2022.naacl-main.201.
N. Ding, Y. Qin, G. Yang, F. Wei, Z. Yang, Y. Su, et al., "Parameter-efficient fine-tuning of large-scale pre-trained language models," Nature Machine Intelligence, vol. 5, no. 3, pp. 220-235, Mar. 2023, doi: 10.1038/s42256-023-00626-4.
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, et al., "Training language models to follow instructions with human feedback," in Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), Oct. 2022, pp. 1-15. Available: https://openreview.net/pdf?id=TG8KACxEON
Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, "Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey," 2024, arXiv:2403.14608, doi: 10.48550/arXiv.2403.14608.
R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P.
Liang, and T. B. Hashimoto, "Alpaca: A Strong, Replicable Instruction Following Model," Stanford CRFM Tech. Rep., 2023. Available: https://crfm.stanford.edu/2023/03/13/alpaca.html
X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou, "Self-Consistency Improves Chain of Thought Reasoning in Language Models," 2023, arXiv:2203.11171, doi: 10.48550/arXiv.2203.11171.
Y. Wang, Q. Yang, Z. Zeng, L. Ren, L. Liu, B. Peng, et al.,
"Reinforcement Learning for Reasoning in Large Language Models with One Training Example," 2025, arXiv:2504.20571, doi: 10.48550/arXiv.2504.20571.
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, Ed H. Chi, Q. V. Le, and D. Zhou, "Chain-of-thought prompting elicits reasoning in large language models," in Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS '22), Nov. 2022, Art. no. 1800, pp. 24824-24837. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf
Y. Wu, et al., "Self-Play Preference Optimization for Language Model Alignment," 2024, arXiv:2405.00675, doi: 10.48550/arXiv.2405.00675.
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, "ReAct: Synergizing Reasoning and Acting in Language Models," 2023, arXiv:2210.03629, doi: 10.48550/arXiv.2210.03629.
H. Lee, et al. "RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback," 2023, arXiv:2309.00267, doi: 10.48550/arXiv.2309.00267.
B. Lester, et al., "Reducing Retraining by Recycling Parameter-Efficient Prompts," 2022, arXiv:2208.05577, doi: 10.48550/arXiv.2208.05577.
N. Ding, et al., "Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models," 2022, arXiv:2203.06904, doi: 10.48550/arXiv.2203.06904.
W. Yuan, R. Y. Pang, K. Cho, X. Li, S. Sukhbaatar, J. Xu, and J. Weston, "Self-rewarding language models," in Proceedings of the 41st International Conference on Machine Learning (ICML'24), Jul. 2024, Art. no. 2389, pp. 57905-57923. Available: https://proceedings.mlr.press/v235/yuan24d.html
R. Rafailov, et al., "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," 2024, arXiv:2305.18290, doi: 10.48550/arXiv.2305.18290.
A. Zhao, Y. Wu, Y. Yue, T. Wu, Q. Xu, Y. Yue, et al., "Absolute Zero: Reinforced Self Play Reasoning with Zero Data," 2025, arXiv:2505.03335, doi: 10.48550/arXiv.2505.03335.
C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y. Mao, et al., "LIMA: less is more for alignment," in: Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23), Dec. 2023, Art. no. 2400, pp. 55006-55021. Available: https://openreview.net/pdf?id=KBMOKmX2he
Z. Sun, et al., "Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision,". In: Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23), Dec. 2023, Art. no. 115, pp. 2511-2565. Available: https://openreview.net/pdf?id=p40XRfBX96
Y. Zuo, K. Zhang, L. Sheng, S. Qu, G. Cui, X. Zhu, et al., "TTRL: Test-Time Reinforcement Learning," 2025, arXiv:2504.16084, doi: 10.48550/arXiv.2504.16084.
T. Simonds and A. Yoshiyama, "LADDER: Self-Improving LLMs Through Recursive Problem Decomposition," 2025, arXiv:2503.00735, doi: 10.48550/arXiv.2503.00735.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность ИБП для ЦОД СНЭ
ISSN: 2307-8162