mardi 17 avril 2012

MultiCore -- Benchmark 01 - Increment shared variable



Quick test on a simple scenario.

Description 

I have a shared integer, that every thread tries to increment multiple times.
The information I need is the final value, sum of all these increments.

I prevent over subscription and under subscription by having as many software threads as logical cores.
I use basic thread functions, so there is no particular overhead (no task scheduling or whatsoever).

Proposed solutions 

solution 1 : use a CRITICAL_SECTION to protect the increment operation
solution 2 : use a SRWLOCK to protect the increment operation
solution 3 : use a custom SpinLock to protect the increment operation
solution 4 : use an atomic operation and no lock
solution 5 : use TLS (thread local storage) to have a local counter per thread, and sync the results with an atomic operation in the end

Results 

I ran the test multiple times, and always had about the same numbers.
Here are the results.

#######################################################################
 Infos summary :
        System w/ 8 CPU cores
        Nb 'tasks' : 80 000 000
        Nb (worker) threads : 8
        Average work load per thread : 10 000 000 tasks
=============================================================  
Results summary :
TEST_TLS_ATOMIC 0.295073 s
TEST_SRWLOCK 2.548695 s
TEST_ATOMIC 2.701041 s
TEST_CRITICAL_SECTION7.383443 s
TEST_SPINLOCK18.363033 s

#######################################################################

Explanations 

With no surprise, solution 5 (TLS) wins, since there is almost no contention.

On the contrary, I was quite surprised to see that the SWRLOCK solution was about the same speed as the atomic increment one (sometimes slightly faster, sometimes slightly slower).
The only explanation I have is that each atomic operation inccurs a cache update & flush, so here we have 80M of these.
On the other hand, using SRWLOCK might lead to fewer cache update & flush, e.g. if the same thread increments the shared value multiple times before another thread manages to grab the lock.

And the SpinLock... damn, this is so slow. I wonder if my (naive) implementation is right >.<.


1 commentaire:

  1. Various interesting benchs :
    (though as usual you have to know exactly the context in which code runs to understand the numbers)

    http://nasutechtips.blogspot.fr/2010/11/slim-read-write-srw-locks.html

    http://tokyocabinetwiki.pbworks.com/w/page/24174719/43_BenchmarkingVariousLocksPartOne

    RépondreSupprimer