Melting Bits: MultiCore -- Benchmark 01 - Increment shared variable

Quick test on a simple scenario.

Description

I have a shared integer, that every thread tries to increment multiple times.
The information I need is the final value, sum of all these increments.

I prevent over subscription and under subscription by having as many software threads as logical cores.
I use basic thread functions, so there is no particular overhead (no task scheduling or whatsoever).

Proposed solutions

solution 1 : use a CRITICAL_SECTION to protect the increment operation
solution 2 : use a SRWLOCK to protect the increment operation
solution 3 : use a custom SpinLock to protect the increment operation
solution 4 : use an atomic operation and no lock
solution 5 : use TLS (thread local storage) to have a local counter per thread, and sync the results with an atomic operation in the end

Results

I ran the test multiple times, and always had about the same numbers.
Here are the results.

#######################################################################
Infos summary :
System w/ 8 CPU cores
Nb 'tasks' : 80 000 000
Nb (worker) threads : 8
Average work load per thread : 10 000 000 tasks
=============================================================

Results summary :

TEST_TLS_ATOMIC 0.295073 s

TEST_SRWLOCK 2.548695 s

TEST_ATOMIC 2.701041 s

TEST_CRITICAL_SECTION 7.383443 s

TEST_SPINLOCK 18.363033 s

#######################################################################

Explanations

With no surprise, solution 5 (TLS) wins, since there is almost no contention.

On the contrary, I was quite surprised to see that the SWRLOCK solution was about the same speed as the atomic increment one (sometimes slightly faster, sometimes slightly slower).
The only explanation I have is that each atomic operation inccurs a cache update & flush, so here we have 80M of these.
On the other hand, using SRWLOCK might lead to fewer cache update & flush, e.g. if the same thread increments the shared value multiple times before another thread manages to grab the lock.

And the SpinLock... damn, this is so slow. I wonder if my (naive) implementation is right >.<.

1 commentaire:

Unknown17 avril 2012 à 07:17
Various interesting benchs :
(though as usual you have to know exactly the context in which code runs to understand the numbers)

http://nasutechtips.blogspot.fr/2010/11/slim-read-write-srw-locks.html

http://tokyocabinetwiki.pbworks.com/w/page/24174719/43_BenchmarkingVariousLocksPartOne
RépondreSupprimer
Réponses

Ajouter un commentaire

Melting Bits

mardi 17 avril 2012

MultiCore -- Benchmark 01 - Increment shared variable

Description

Proposed solutions

Results

Explanations

1 commentaire:

Qui êtes-vous ?

TEST_TLS_ATOMIC	0.295073 s
TEST_SRWLOCK	2.548695 s
TEST_ATOMIC	2.701041 s
TEST_CRITICAL_SECTION	7.383443 s
TEST_SPINLOCK	18.363033 s