Application of machine learning methods to classify construction works titles for data normalization and predictive modeling

V.V. Konkov, V.I. Shirokov, V.A. Sychev

Abstract


Exceeding the planned deadlines for the implementation of construction projects remains one of the most pressing problems reducing the efficiency of the Russian construction industry. The key obstacle to using historical data and machine learning methods is the extreme fragmentation and denormalization of the source data, in particular, significant variability and non-standardization of the names of construction works. The authors propose a methodology for the automatic unification of variability in the names of construction works using machine learning. This unification is considered as a critical step in building systems for forecasting deadlines within the framework of the state strategy to reduce construction cycles by 30% by 2030. A comprehensive methodology is proposed: data preprocessing, text vectorization and automatic unification. The use of two approaches is studied: clustering (K-Means, DBSCAN) to identify groups of similar works and classification (including Logistic regression, SVM, Random Forest, XGBoost, LSTM) to assign descriptions to unified classes. A comparative analysis of the efficiency of algorithms by the metrics accuracy/recall/F1 was conducted.  High efficiency of machine learning methods (especially XGBoost, Random Forest, LSTM) for unification was proven, which allows to form a structured array of historical data. A comprehensive solution for generating recommendations for optimizing calendar plans based on the analysis of similar historical objects using ML was proposed. A structured array of historical data was formed, including information on the names of works and their durations, which will then be used to implement solutions that allow to obtain reasonable forecasts of deadlines, adjust the planned duration of construction work and minimize the risks of failure, increasing the efficiency of project management


Full Text:

PDF (Russian)

References


Order of the Government of the Russian Federation of October 31, 2022 No. 3268-r

Jim Burns DELAYS IN THE CONSTRUCTION INDUSTRY: OUR 2022 SURVEY RESULTS AND HOW THEY COMPARE TO 2016 [URL - https://www.cornerstoneprojects.co.uk/blog/delays-in-the-construction-industry-our-2022-survey-results-and-how-they compare-to-2016/ , accessed on June 25, 2025)

Konkov, V. V. Forecasting construction delays using machine learning based on historical data on the actual duration of completed projects / V. V. Konkov, V. I. Shirokov, M. G. Zhabitsky // International Journal of Open Information Technologies. – 2024. – Vol. 12, No. 8. – Pp. 35-47. – EDN OEJMJN.

Konkov, VV Analysis of the possibility of forecasting construction delays based on historical data using machine learning methods / VV Konkov, VI Shirokov, MG Zhabitsky // Modern problems of physics and technology: Collection of abstracts of reports of the XI international youth scientific school-conference, Moscow, April 23-25, 2024. – Moscow: National Research Nuclear University MEPhI, 2024. – Pp. 218-220. – EDN YTBOZX.

Murzagaleev, TM Development of a resource forecasting model for performing project management tasks in the design and construction of facilities / TM Murzagaleev, NA Zhukova // International conference on soft computing and measurements. – 2025. – Vol. 1. – Pp. 364-367. – EDN ICRZOV.

Zelentsov, L. B. Forecasting of time and cost parameters in the management of investment and construction projects / L. B. Zelentsov, M. S. Shogenov, D. V. Pirko // Construction production. – 2020. – No. 3. – Pp. 41-45. – DOI 10.54950/26585340_2020_3_41. – EDN VNEOKK.

Zelentsov, L. Methodology of making organizational and technological decisions at the stage of operational management of construction operations based on the forecasting system / L. Zelentsov, L. Mailyan, D. Pirko // Journal of Physics: Conference Series. – 2021. – Vol. 2131, No. 2. – P. 022114. – DOI 10.1088/1742-6596/2131/2/022114. – EDN ZMVHIG.

Alsugair A. M. et al. Artificial neural network model to predict final construction contract duration //Applied Sciences. – 2023. – T. 13. – No. 14. – P. 8078

Liben, S. M. Comparing advanced and traditional machine learning algorithms for construction duration prediction: a case study of Addis Ababa’s public sector / S. M. Liben, D. A. Belachew, W. A. Elsaigh // Engineering Research Express. – 2024. – Vol. 6, No. 4. – P. 045119. – DOI 10.1088/2631-8695/ad979f. – EDN QBQUKP.

Mashtakov, N. S. Predictive analytics: from historical data to strategic forecasting / N. S. Mashtakov, P. S. Chasov // Original research. – 2023. – Vol. 13, No. 12. – Pp. 61-66. – EDN LKTDSY.

Moon S. et al. Automated construction specification review with named entity recognition using natural language processing //Journal of Construction Engineering and Management. – 2021. – Vol. 147. – No. 1. – Pp. 04020147.

Yaseen, Z.M.; Ali, Z.H.; Salih, S.Q.; Al-Ansari, N. Prediction of Risk Delay in Construction Projects Using a Hybrid Artificial Intelligence Model. Sustainability 2020, 12, 1514. https://doi.org/10.3390/su12041514

Kuzina, O. N. Development of machine learning models for forecasting construction duration / O. N. Kuzina // Actual problems of the construction industry and education - 2023: Collection of reports of the IV National Scientific Conference, Moscow, December 15, 2023. - Moscow: Moscow State University of Civil Engineering (National Research University), 2024. - Pp. 816-820. - EDN UWJKXK.

Petrochenko M. V. et al. Classification of construction information in BIM using artificial intelligence algorithms // Bulletin of MGSU. - 2022. - Vol. 17. - No.. 11. - Pp. 1537-1550.

Meshkova E. A., Didkovskaya O. V., Meshkov A. A. PRICING IN CONSTRUCTION BASED ON TIME SERIES ANALYSIS USING NEURAL NETWORK MODELING // Bulletin of the International Market Institute. - 2022. - No. 1. - P. 147-152.

Bolotin S. A., Dadar A. Kh., Malsagov A. R. Forecasting the duration of construction based on executive calendar schedules, statistical and neural network modeling // Real Estate: Economics, Management. - 2017. - No. 3. - P. 59-63.

Chizhik A. V. Comparison of text vectorization models for the task of analyzing the sentiment of short messages from social networks // Computer linguistics and computational ontologies. – 2024. – V. 1. – No. 7. – P. 81-89.

18. Kozinets A. N. Application of machine learning methods to predict employee turnover based on open data. – 2025.

Zhilov R. A. Intelligent methods of data clustering // Bulletin of the Kabardino-Balkarian Scientific Center of the Russian Academy of Sciences. – 2023. – No. 6 (116). – P. 152-159.

20. Molchanova T. A. Further training of large language models to solve specialized problems: master's thesis: diss. – 2024.

Lagutina N. S., Vasiliev A. M., Zafievsky D. D. Problems in the field of recognition ofenumerable entities: technologies and tools // Modeling and analysis of information systems. - 2023. - Vol. 30. - No. 1. - P. 64-85.

Akhmetov I. et al. An Open-Source Lemmatizer for Russian Language based on Tree Regression Models // Res. Comput. Sci. - 2020. - Vol. 149. - No. 3. - P. 147-153.

Baganov A. P. et al. Classification of texts with a high level of distortion for product categorization problems: master's thesis in the direction of training: 01.04. 02-Applied Mathematics and Computer Science. - 2021.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИБП для ЦОД СНЭ

ISSN: 2307-8162