Optimizations of two lock queue

Mikhail Molotkov

Abstract


Real-time parallelism has become commonplace for modern devices. But the effectiveness of parallelization still remains a problem. Architecture on mobile devices also adds uncertainty to this problem. The parallelization tool is one of the most important parts of any multithreaded program. The most popular tool is a thread pool. Its main element is a thread-safe queue. This means that the time between the creation of a task and start of its execution directly depends on the efficiency of the internal queue of the thread pool. This article discusses the optimization of two lock queue in a thread pool scenario. We have provided a number of optimizations based on different parameters of thread-safe containers such as the duration of the critical section and the dependence of different methods on each other. We have tested all optimizations individually on device and presented the most effective combinations of these optimizations. In the end, we compared these combinations with the most сommon implementations. 


Full Text:

PDF (Russian)

References


Zhu Y., Reddi V. J. High-performance and energy-efficient mobile web browsing on big/little systems //2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). – IEEE, 2013. – С. 13-24. URL: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=cf64cdc889a4edaf641a307aa2b11d89d4d10a09

Stevens A. Introduction to AMBA® 4 ACE™ and big. LITTLE™ Processing Technology //ARM White Paper, CoreLink Intelligent System IP by ARM. – 2011. URL: https://picture.iczhiku.com/resource/paper/WyIderGepoDsTXXb.pdf

Hunt N., Sandhu P. S., Ceze L. Characterizing the performance and energy efficiency of lock-free data structures //2011 15th Workshop on Interaction between Compilers and Computer Architectures. – IEEE, 2011. – С. 63-70. URL: https://www.cs.fsu.edu/~xyuan/INTERACT-15/papers/paper06.pdf

Xu D. Performance study and dynamic optimization design for thread pool systems. – Ames Lab., Ames, IA (United States), 2004. – №. IS-T 2359. URL: https://www.osti.gov/servlets/purl/835380

Shiina S. et al. Lightweight preemptive user-level threads //Proceedings of the 26th ACM SIGPLAN symposium on principles and practice of parallel programming. – 2021. – С. 374-388. URL: https://dl.acm.org/doi/pdf/10.1145/3437801.3441610

Sai A. M. A. et al. Producer-Consumer problem using Thread pool //2022 3rd international conference for emerging technology (incet). – IEEE, 2022. – С. 1-5. URL: https://iopscience.iop.org/article/10.1088/1742-6596/1616/1/012073/pdf

Michael M. M., Scott M. L. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms //Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing. – 1996. – С. 267-275. URL: https://dl.acm.org/doi/pdf/10.1145/248052.248106

Nikolaev R., Ravindran B. Wcq: A fast wait-free queue with bounded memory usage //Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures. – 2022. – С. 307-319. URL: https://dl.acm.org/doi/pdf/10.1145/3490148.3538572

Koval N., Alistarh D., Elizarov R. Fast and scalable channels in Kotlin coroutines //Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. – 2023. – С. 107-118. URL: https://arxiv.org/pdf/2211.04986

Dechev D. The ABA problem in multicore data structures with collaborating operations //7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom). – IEEE, 2011. – С. 158-167. URL: https://www.osti.gov/servlets/purl/1118172

Maffione V., Lettieri G., Rizzo L. Cache‐aware design of general‐purpose Single‐Producer–Single‐Consumer queues //Software: Practice and Experience. – 2019. – Т. 49. – №. 5. – С. 748-779. URL: https://arpi.unipi.it/bitstream/11568/942147/7/master.pdf

Pöter M., Träff J. L. Memory models for C/C++ programmers //arXiv preprint arXiv:1803.04432. – 2018. URL: https://arxiv.org/pdf/1803.04432

Bolosky W. J., Scott M. L. False sharing and its e ect on shared memory performance //4th symposium on experimental distributed and multiprocessor systems. – 1993. – С. 57-71. URL: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8090a0702dae2a90bb614e6ef8de4f049e596233

Jeremiassen T. E., Eggers S. J. Reducing false sharing on shared memory multiprocessors through compile time data transformations //ACM SIGPLAN Notices. – 1995. – Т. 30. – №. 8. – С. 179-188. URL: https://dl.acm.org/doi/pdf/10.1145/209937.209955

Liu N., Zang B., Chen H. No barrier in the road: a comprehensive study and optimization of ARM barriers //Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. – 2020. – С. 348-361. URL: https://ipads.se.sjtu.edu.cn/zh/publications/LiuPPoPP20.pdf

Zhou J. et al. MemPerf: Profiling Allocator-Induced Performance Slowdowns //Proceedings of the ACM on Programming Languages. – 2023. – Т. 7. – №. OOPSLA2. – С. 1418-1441. URL: https://dl.acm.org/doi/pdf/10.1145/3622848

Michael M. M., Scott M. L. Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors //journal of parallel and distributed computing. – 1998. – Т. 51. – №. 1. – С. 1-26. ULR: https://www.sciencedirect.com/science/article/pii/S0743731598914460/pdf?md5=799575cda60e46da338868a2c9635da2&pid=1-s2.0-S0743731598914460-main.pdf&_valck=1

Michael M. M. Scalable lock-free dynamic memory allocation //Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation. – 2004. – С. 35-46. URL: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=068d0b393db03678ea1d346ee01871e91e88c560

Leite R., Rocha R. LRMalloc: A modern and competitive lock-free dynamic memory allocator //International Conference on Vector and Parallel Processing. – Cham : Springer International Publishing, 2018. – С. 230-243. URL: https://www.dcc.fc.up.pt/~ricroc/homepage/publications/2018-VECPAR.pdf

Purohit A. et al. Improving application performance through system call composition. – Technical Report FSL-02-01, Computer Science Department, Stony Brook University, 2003. URL: https://www.am-utils.org/docs/cosy-perf/cosy.pdf

Scott M. L., Scherer W. N. Scalable queue-based spin locks with timeout //ACM SIGPLAN Notices. – 2001. – Т. 36. – №. 7. – С. 44-52. URL: https://urresearch.rochester.edu/fileDownloadForInstitutionalItem.action?itemId=1325&itemFileId=1613

Anderson T. E. The performance of spin lock alternatives for shared-memory multiprocessors //IEEE Transactions on Parallel and Distributed Systems. – 1990. – Т. 1. – №. 1. – С. 6-16. URL: https://pdos.csail.mit.edu/6.828/2010/readings/anderson-locks.pdf

Baier C. et al. Waiting for locks: How long does it usually take? //International Workshop on Formal Methods for Industrial Critical Systems. – Berlin, Heidelberg : Springer Berlin Heidelberg, 2012. – С. 47-62. URL: https://wwwtcs.inf.tu-dresden.de/ALGI/spinlock-FMICS2012.pdf

Mueller F. et al. A Library Implementation of POSIX Threads under UNIX //Usenix winter. – 1993. – С. 29-42. URL: https://arcb.csc.ncsu.edu/~mueller/ftp/pub/PART/pthreads_usenix93.pdf

Bellassai D. et al. Supporting logical execution time in multi-core POSIX systems //Journal of Systems Architecture. – 2023. – Т. 144. – С. 102987. URL: http://retis.sssup.it/~a.biondi/papers/LET_JSS23.pdf


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИТ конгресс СНЭ

ISSN: 2307-8162