Optimizations of two lock queue
Abstract
Real-time parallelism has become commonplace for modern devices. But the effectiveness of parallelization still remains a problem. Architecture on mobile devices also adds uncertainty to this problem. The parallelization tool is one of the most important parts of any multithreaded program. The most popular tool is a thread pool. Its main element is a thread-safe queue. This means that the time between the creation of a task and start of its execution directly depends on the efficiency of the internal queue of the thread pool. This article discusses the optimization of two lock queue in a thread pool scenario. We have provided a number of optimizations based on different parameters of thread-safe containers such as the duration of the critical section and the dependence of different methods on each other. We have tested all optimizations individually on device and presented the most effective combinations of these optimizations. In the end, we compared these combinations with the most сommon implementations.
Full Text:
PDF (Russian)References
Zhu Y., Reddi V. J. High-performance and energy-efficient mobile web browsing on big/little systems //2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). – IEEE, 2013. – С. 13-24. URL: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=cf64cdc889a4edaf641a307aa2b11d89d4d10a09
Stevens A. Introduction to AMBA® 4 ACE™ and big. LITTLE™ Processing Technology //ARM White Paper, CoreLink Intelligent System IP by ARM. – 2011. URL: https://picture.iczhiku.com/resource/paper/WyIderGepoDsTXXb.pdf
Hunt N., Sandhu P. S., Ceze L. Characterizing the performance and energy efficiency of lock-free data structures //2011 15th Workshop on Interaction between Compilers and Computer Architectures. – IEEE, 2011. – С. 63-70. URL: https://www.cs.fsu.edu/~xyuan/INTERACT-15/papers/paper06.pdf
Xu D. Performance study and dynamic optimization design for thread pool systems. – Ames Lab., Ames, IA (United States), 2004. – №. IS-T 2359. URL: https://www.osti.gov/servlets/purl/835380
Shiina S. et al. Lightweight preemptive user-level threads //Proceedings of the 26th ACM SIGPLAN symposium on principles and practice of parallel programming. – 2021. – С. 374-388. URL: https://dl.acm.org/doi/pdf/10.1145/3437801.3441610
Sai A. M. A. et al. Producer-Consumer problem using Thread pool //2022 3rd international conference for emerging technology (incet). – IEEE, 2022. – С. 1-5. URL: https://iopscience.iop.org/article/10.1088/1742-6596/1616/1/012073/pdf
Michael M. M., Scott M. L. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms //Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing. – 1996. – С. 267-275. URL: https://dl.acm.org/doi/pdf/10.1145/248052.248106
Nikolaev R., Ravindran B. Wcq: A fast wait-free queue with bounded memory usage //Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures. – 2022. – С. 307-319. URL: https://dl.acm.org/doi/pdf/10.1145/3490148.3538572
Koval N., Alistarh D., Elizarov R. Fast and scalable channels in Kotlin coroutines //Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. – 2023. – С. 107-118. URL: https://arxiv.org/pdf/2211.04986
Dechev D. The ABA problem in multicore data structures with collaborating operations //7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom). – IEEE, 2011. – С. 158-167. URL: https://www.osti.gov/servlets/purl/1118172
Maffione V., Lettieri G., Rizzo L. Cache‐aware design of general‐purpose Single‐Producer–Single‐Consumer queues //Software: Practice and Experience. – 2019. – Т. 49. – №. 5. – С. 748-779. URL: https://arpi.unipi.it/bitstream/11568/942147/7/master.pdf
Pöter M., Träff J. L. Memory models for C/C++ programmers //arXiv preprint arXiv:1803.04432. – 2018. URL: https://arxiv.org/pdf/1803.04432
Bolosky W. J., Scott M. L. False sharing and its e ect on shared memory performance //4th symposium on experimental distributed and multiprocessor systems. – 1993. – С. 57-71. URL: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8090a0702dae2a90bb614e6ef8de4f049e596233
Jeremiassen T. E., Eggers S. J. Reducing false sharing on shared memory multiprocessors through compile time data transformations //ACM SIGPLAN Notices. – 1995. – Т. 30. – №. 8. – С. 179-188. URL: https://dl.acm.org/doi/pdf/10.1145/209937.209955
Liu N., Zang B., Chen H. No barrier in the road: a comprehensive study and optimization of ARM barriers //Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. – 2020. – С. 348-361. URL: https://ipads.se.sjtu.edu.cn/zh/publications/LiuPPoPP20.pdf
Zhou J. et al. MemPerf: Profiling Allocator-Induced Performance Slowdowns //Proceedings of the ACM on Programming Languages. – 2023. – Т. 7. – №. OOPSLA2. – С. 1418-1441. URL: https://dl.acm.org/doi/pdf/10.1145/3622848
Michael M. M., Scott M. L. Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors //journal of parallel and distributed computing. – 1998. – Т. 51. – №. 1. – С. 1-26. ULR: https://www.sciencedirect.com/science/article/pii/S0743731598914460/pdf?md5=799575cda60e46da338868a2c9635da2&pid=1-s2.0-S0743731598914460-main.pdf&_valck=1
Michael M. M. Scalable lock-free dynamic memory allocation //Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation. – 2004. – С. 35-46. URL: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=068d0b393db03678ea1d346ee01871e91e88c560
Leite R., Rocha R. LRMalloc: A modern and competitive lock-free dynamic memory allocator //International Conference on Vector and Parallel Processing. – Cham : Springer International Publishing, 2018. – С. 230-243. URL: https://www.dcc.fc.up.pt/~ricroc/homepage/publications/2018-VECPAR.pdf
Purohit A. et al. Improving application performance through system call composition. – Technical Report FSL-02-01, Computer Science Department, Stony Brook University, 2003. URL: https://www.am-utils.org/docs/cosy-perf/cosy.pdf
Scott M. L., Scherer W. N. Scalable queue-based spin locks with timeout //ACM SIGPLAN Notices. – 2001. – Т. 36. – №. 7. – С. 44-52. URL: https://urresearch.rochester.edu/fileDownloadForInstitutionalItem.action?itemId=1325&itemFileId=1613
Anderson T. E. The performance of spin lock alternatives for shared-memory multiprocessors //IEEE Transactions on Parallel and Distributed Systems. – 1990. – Т. 1. – №. 1. – С. 6-16. URL: https://pdos.csail.mit.edu/6.828/2010/readings/anderson-locks.pdf
Baier C. et al. Waiting for locks: How long does it usually take? //International Workshop on Formal Methods for Industrial Critical Systems. – Berlin, Heidelberg : Springer Berlin Heidelberg, 2012. – С. 47-62. URL: https://wwwtcs.inf.tu-dresden.de/ALGI/spinlock-FMICS2012.pdf
Mueller F. et al. A Library Implementation of POSIX Threads under UNIX //Usenix winter. – 1993. – С. 29-42. URL: https://arcb.csc.ncsu.edu/~mueller/ftp/pub/PART/pthreads_usenix93.pdf
Bellassai D. et al. Supporting logical execution time in multi-core POSIX systems //Journal of Systems Architecture. – 2023. – Т. 144. – С. 102987. URL: http://retis.sssup.it/~a.biondi/papers/LET_JSS23.pdf
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность ИТ конгресс СНЭ
ISSN: 2307-8162