Note: phiên bản Tiếng Việt của bài này ở link dưới.

https://duongnt.com/interlocked-synchronization-vie

interlocked-synchronization

Synchronization is a crucial part of concurrent programming. The .NET standard provides multiple constructs to synchronize threads. Perhaps acquiring a lock on an object is the most common solution. But when possible, the methods inside the Interlocked class can achieve superior performance compared to a lock.

You can download all sample code from the link below.

https://github.com/duongntbk/InterlockedDemo

A simple scenario that requires synchronization

Let’s say we have a list of the first 10,000 integers.

var src = Enumerable.Range(1, 10_000);

A naive solution to sum that list in parallel might look like this.

long sum = 0;
Parallel.ForEach(src, n =>
{
    // Other code if needed...

    sum += n
});

// The result most certainly IS NOT 50,005,000

Of course, if we run the code above, the result will be way off. It is because every time a thread tries to update sum, it has to read the current value of sum first. But at the same time, another thread might update sum. And when the first thread writes back to sum, it will override the result of the second thread.

Thread synchronization using lock

As mentioned in the opening section, we can synchronize access to the sum variable by using a lock.

private static readonly object _lockObj = new object();

long sum = 0;
Parallel.ForEach(src, n =>
{
    // Other code if needed...

    lock (_lockObj)
    {
        sum += n;
    }
});

// The result is 50,005,000

However, acquiring a lock every time we want to update sum can be costly. Not to mention to risk of deadlock if we misuse _lockObj.

Atomically add two numbers with Interlocked.Add

As mentioned in the previous section, the reason why we need synchronization is that a thread can change the value of sum while another thread is in the middle of updating it. Thus, we can reason that if we can guarantee the adding step to be atomic then no synchronization is needed. This is exactly the purpose of the Interlocked.Add method. Specifically, we will use the signature below.

// Atomically sum "value" with "location1",
// then replace the value in "location1" with the result.
public static long Add(ref long location1, long value);

The code to sum our list using Interlocked.Add looks like this.

long sum = 0;
Parallel.ForEach(src, n =>
{
    // Other code if needed...

    Interlocked.Add(ref sum, n)
});

// The result IS 50,005,000

Benchmark result

Below is the comparison of lock and Interlocked.Add.

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
NoSynchronize 55.01 μs 0.642 μs 0.601 μs 6.7749 0.2441 34 KB
Lock 235.25 μs 10.391 μs 30.638 μs 14.1602 1.4648 74 KB
LockLocalVar 53.87 μs 0.754 μs 0.706 μs 6.3477 0.3052 32 KB
Interlocked 69.77 μs 0.766 μs 0.717 μs 7.3242 0.2441 37 KB
InterlockedLocalVar 53.52 μs 0.526 μs 0.439 μs 6.3477 0.2441 33 KB

We can see that without using thread local variables, Interlocked.Add is more than 3 times faster than lock. When using thread local variables, lock and Interlocked.Add have similar processing time. This makes sense because with thread local variables, we only need to lock once for each thread.

Other methods in Interlocked class

Interlocked.Add is not the only method in the Interlocked class. Below, we will look at some other synchronization scenarios that can be improved by using Interlocked.

Count elements in a collection with Interlocked.Increment

The Interlocked.Increment can atomically increase a number by one. We can use it to count elements in a collection that satisfy a condition.

The code below counts the number of even numbers in a collection. It uses lock to synchronize threads.

long sum = 0;
Parallel.ForEach(src, n =>
{
    if (n % 2 == 0)
    {
        lock (_lockObj)
        {
            sum++;
        }
    }
});
return sum;

And below is the same code, but uses Interlocked.Increment.

long sum = 0;
Parallel.ForEach(src, n =>
{
    if (predicate(n))
    {
        Interlocked.Increment(ref sum);
    }
});
return sum;

From the benchmark result, we can see that Interlocked.Increment is over twice as fast compared to lock.

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
NoSynchronize 63.43 μs 1.254 μs 1.047 μs 6.5918 0.2441 33 KB
Lock 155.51 μs 3.080 μs 7.199 μs 8.7891 0.4883 45 KB
Interlocked 66.29 μs 0.888 μs 0.741 μs 6.7139 0.2441 34 KB

Check availability with Interlocked.Exchange

The Interlocked.Exchange can atomically set a variable to a value and return the original value.

Let’s say in our program, there is a method that can’t be run on multiple threads at the same time. Below is how we can check a flag before running that method.

Parallel.For(0, load, (i, loop) =>
{
    lock (_lockObj)
    {
        // Other code if needed...

        if (!_isSafeBool)
        {
            return;
        }
        else
        {
            _isSafeBool = false;
        }
    }
    DummyDoWork();

    lock (_lockObj)
    {
        _isSafeBool = true;
    }
});

Or we can use an integer as a flag. If that integer is 1 then we can call the method, but if it is 0 then a different thread is already calling it.

Parallel.For(0, load, (i, loop) =>
{
    // Other code if needed...

    // Try to set _isSafe to 0 and check the original value.
    // If the original value is 1 then we can safely call the method.
    if (Interlocked.Exchange(ref _isSafe, 0) == 1)
    {
        DummyDoWork();

        // Remember to set _isSafe back to 1 so that other threads can call the method.
        Interlocked.Exchange(ref _isSafe, 1);
    }
});

From the benchmark result, we can see that Interlocked.Exchange is seven times as fast as lock.

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Lock 185.48 μs 8.815 μs 25.574 μs 0.9766 7 KB
Interlocked 26.70 μs 0.076 μs 0.067 μs 0.6714 4 KB

Why is Interlocked so fast?

Not all synchronization constructs are created equal. There are three types of synchronization constructs in .NET.

  • User-mode constructs: use special CPU instructions to coordinate threads. If a thread cannot acquire some resource, it will keep spinning in user-mode and waits until that resource becomes available. Because the coordination happens in hardware, user-mode constructs are very fast. Example: volatile, Interlocked.
  • Kernel-model constructs: require coordination from the operating system. These constructs cause the calling thread to transit between managed code, native user-mode code, and native kernel-mode code. All those contexts switching can greatly affect performance. Example: Semaphore, Mutex.
  • Hybrid constructs: is as fast as user-mode constructs when there is no contention, and only change to kernel-mode when multiple threads are trying to access the same resource at the same time. Example: Monitor, SemaphoreSlim, ReaderWriterLockSlim.

Interlocked is a primitive user-mode construct. Because of that, it has all the speed advantage of a user-mode construct.

The lock keyword, however, uses the Monitor construct in the background, as documented here.

lock (x)
{
    // Your code...
}

When x is a reference type, the code above is equal to.

object __lockObj = x;
bool __lockWasTaken = false;
try
{
    System.Threading.Monitor.Enter(__lockObj, ref __lockWasTaken);
    // Your code...
}
finally
{
    if (__lockWasTaken) System.Threading.Monitor.Exit(__lockObj);
}

The Monitor construct is a hybrid construct. So when there are contentions, it changes to kernel-mode, and takes a performance hit.

Conclusion

When faced with a synchronization problem, we should always consider if it is possible to use the Interlocked class. Just remember that it is not a magic button, and there are cases when it is not the suitable solution.

A software developer from Vietnam and is currently living in Japan.

3 Thoughts on “Synchronization with Interlocked in C#”

Leave a Reply