Note: phiên bản Tiếng Việt của bài này ở link dưới.
https://duongnt.com/interlocked-synchronization-vie
Synchronization is a crucial part of concurrent programming. The .NET standard provides multiple constructs to synchronize threads. Perhaps acquiring a lock
on an object is the most common solution. But when possible, the methods inside the Interlocked
class can achieve superior performance compared to a lock
.
You can download all sample code from the link below.
https://github.com/duongntbk/InterlockedDemo
A simple scenario that requires synchronization
Let’s say we have a list of the first 10,000
integers.
var src = Enumerable.Range(1, 10_000);
A naive solution to sum that list in parallel might look like this.
long sum = 0;
Parallel.ForEach(src, n =>
{
// Other code if needed...
sum += n
});
// The result most certainly IS NOT 50,005,000
Of course, if we run the code above, the result will be way off. It is because every time a thread tries to update sum
, it has to read the current value of sum
first. But at the same time, another thread might update sum
. And when the first thread writes back to sum
, it will override the result of the second thread.
Thread synchronization using lock
As mentioned in the opening section, we can synchronize access to the sum
variable by using a lock
.
private static readonly object _lockObj = new object();
long sum = 0;
Parallel.ForEach(src, n =>
{
// Other code if needed...
lock (_lockObj)
{
sum += n;
}
});
// The result is 50,005,000
However, acquiring a lock every time we want to update sum
can be costly. Not to mention to risk of deadlock if we misuse _lockObj
.
Atomically add two numbers with Interlocked.Add
As mentioned in the previous section, the reason why we need synchronization is that a thread can change the value of sum
while another thread is in the middle of updating it. Thus, we can reason that if we can guarantee the adding step to be atomic then no synchronization is needed. This is exactly the purpose of the Interlocked.Add
method. Specifically, we will use the signature below.
// Atomically sum "value" with "location1",
// then replace the value in "location1" with the result.
public static long Add(ref long location1, long value);
The code to sum our list using Interlocked.Add
looks like this.
long sum = 0;
Parallel.ForEach(src, n =>
{
// Other code if needed...
Interlocked.Add(ref sum, n)
});
// The result IS 50,005,000
Benchmark result
Below is the comparison of lock
and Interlocked.Add
.
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
NoSynchronize | 55.01 μs | 0.642 μs | 0.601 μs | 6.7749 | 0.2441 | – | 34 KB |
Lock | 235.25 μs | 10.391 μs | 30.638 μs | 14.1602 | 1.4648 | – | 74 KB |
LockLocalVar | 53.87 μs | 0.754 μs | 0.706 μs | 6.3477 | 0.3052 | – | 32 KB |
Interlocked | 69.77 μs | 0.766 μs | 0.717 μs | 7.3242 | 0.2441 | – | 37 KB |
InterlockedLocalVar | 53.52 μs | 0.526 μs | 0.439 μs | 6.3477 | 0.2441 | – | 33 KB |
We can see that without using thread local variables, Interlocked.Add
is more than 3 times faster than lock
. When using thread local variables, lock
and Interlocked.Add
have similar processing time. This makes sense because with thread local variables, we only need to lock once for each thread.
Other methods in Interlocked class
Interlocked.Add
is not the only method in the Interlocked
class. Below, we will look at some other synchronization scenarios that can be improved by using Interlocked
.
Count elements in a collection with Interlocked.Increment
The Interlocked.Increment
can atomically increase a number by one. We can use it to count elements in a collection that satisfy a condition.
The code below counts the number of even numbers in a collection. It uses lock
to synchronize threads.
long sum = 0;
Parallel.ForEach(src, n =>
{
if (n % 2 == 0)
{
lock (_lockObj)
{
sum++;
}
}
});
return sum;
And below is the same code, but uses Interlocked.Increment
.
long sum = 0;
Parallel.ForEach(src, n =>
{
if (predicate(n))
{
Interlocked.Increment(ref sum);
}
});
return sum;
From the benchmark result, we can see that Interlocked.Increment
is over twice as fast compared to lock
.
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
NoSynchronize | 63.43 μs | 1.254 μs | 1.047 μs | 6.5918 | 0.2441 | – | 33 KB |
Lock | 155.51 μs | 3.080 μs | 7.199 μs | 8.7891 | 0.4883 | – | 45 KB |
Interlocked | 66.29 μs | 0.888 μs | 0.741 μs | 6.7139 | 0.2441 | – | 34 KB |
Check availability with Interlocked.Exchange
The Interlocked.Exchange
can atomically set a variable to a value and return the original value.
Let’s say in our program, there is a method that can’t be run on multiple threads at the same time. Below is how we can check a flag before running that method.
Parallel.For(0, load, (i, loop) =>
{
lock (_lockObj)
{
// Other code if needed...
if (!_isSafeBool)
{
return;
}
else
{
_isSafeBool = false;
}
}
DummyDoWork();
lock (_lockObj)
{
_isSafeBool = true;
}
});
Or we can use an integer as a flag. If that integer is 1
then we can call the method, but if it is 0
then a different thread is already calling it.
Parallel.For(0, load, (i, loop) =>
{
// Other code if needed...
// Try to set _isSafe to 0 and check the original value.
// If the original value is 1 then we can safely call the method.
if (Interlocked.Exchange(ref _isSafe, 0) == 1)
{
DummyDoWork();
// Remember to set _isSafe back to 1 so that other threads can call the method.
Interlocked.Exchange(ref _isSafe, 1);
}
});
From the benchmark result, we can see that Interlocked.Exchange
is seven times as fast as lock
.
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
Lock | 185.48 μs | 8.815 μs | 25.574 μs | 0.9766 | – | – | 7 KB |
Interlocked | 26.70 μs | 0.076 μs | 0.067 μs | 0.6714 | – | – | 4 KB |
Why is Interlocked so fast?
Not all synchronization constructs are created equal. There are three types of synchronization constructs in .NET.
- User-mode constructs: use special CPU instructions to coordinate threads. If a thread cannot acquire some resource, it will keep spinning in user-mode and waits until that resource becomes available. Because the coordination happens in hardware, user-mode constructs are very fast. Example: volatile, Interlocked.
- Kernel-model constructs: require coordination from the operating system. These constructs cause the calling thread to transit between managed code, native user-mode code, and native kernel-mode code. All those contexts switching can greatly affect performance. Example: Semaphore, Mutex.
- Hybrid constructs: is as fast as user-mode constructs when there is no contention, and only change to kernel-mode when multiple threads are trying to access the same resource at the same time. Example: Monitor, SemaphoreSlim, ReaderWriterLockSlim.
Interlocked
is a primitive user-mode construct. Because of that, it has all the speed advantage of a user-mode construct.
The lock
keyword, however, uses the Monitor
construct in the background, as documented here.
lock (x)
{
// Your code...
}
When x is a reference type, the code above is equal to.
object __lockObj = x;
bool __lockWasTaken = false;
try
{
System.Threading.Monitor.Enter(__lockObj, ref __lockWasTaken);
// Your code...
}
finally
{
if (__lockWasTaken) System.Threading.Monitor.Exit(__lockObj);
}
The Monitor
construct is a hybrid construct. So when there are contentions, it changes to kernel-mode, and takes a performance hit.
Conclusion
When faced with a synchronization problem, we should always consider if it is possible to use the Interlocked
class. Just remember that it is not a magic button, and there are cases when it is not the suitable solution.
Thank you so much! This article is very helpful!
You are welcome.
Please check out my other posts as well.