Quick test on a simple scenario.
Description
I have a shared integer, that every thread tries to increment multiple times.The information I need is the final value, sum of all these increments.
I prevent over subscription and under subscription by having as many software threads as logical cores.
I use basic thread functions, so there is no particular overhead (no task scheduling or whatsoever).
Proposed solutions
solution 1 : use a CRITICAL_SECTION to protect the increment operationsolution 2 : use a SRWLOCK to protect the increment operation
solution 3 : use a custom SpinLock to protect the increment operation
solution 4 : use an atomic operation and no lock
solution 5 : use TLS (thread local storage) to have a local counter per thread, and sync the results with an atomic operation in the end
Results
I ran the test multiple times, and always had about the same numbers.Here are the results.
#######################################################################
Infos summary :
System w/ 8 CPU cores
Nb 'tasks' : 80 000 000
Nb (worker) threads : 8
Average work load per thread : 10 000 000 tasks
=============================================================
Results summary :
TEST_TLS_ATOMIC 0.295073 s TEST_SRWLOCK 2.548695 s TEST_ATOMIC 2.701041 s TEST_CRITICAL_SECTION 7.383443 s TEST_SPINLOCK 18.363033 s
#######################################################################
Explanations
With no surprise, solution 5 (TLS) wins, since there is almost no contention.On the contrary, I was quite surprised to see that the SWRLOCK solution was about the same speed as the atomic increment one (sometimes slightly faster, sometimes slightly slower).
The only explanation I have is that each atomic operation inccurs a cache update & flush, so here we have 80M of these.
On the other hand, using SRWLOCK might lead to fewer cache update & flush, e.g. if the same thread increments the shared value multiple times before another thread manages to grab the lock.
And the SpinLock... damn, this is so slow. I wonder if my (naive) implementation is right >.<.
Various interesting benchs :
RépondreSupprimer(though as usual you have to know exactly the context in which code runs to understand the numbers)
http://nasutechtips.blogspot.fr/2010/11/slim-read-write-srw-locks.html
http://tokyocabinetwiki.pbworks.com/w/page/24174719/43_BenchmarkingVariousLocksPartOne