Before .NET 4.0, if we needed to use the Dictionary class in a multithreaded environment, we had no choice but to implement thread synchronization ourselves to keep the threads safe.
Many developers have certainly implemented a similar thread-safe solution, either by creating an entirely new thread-safe dictionary type, or simply encapsulating a Dictionary object in a class and adding a locking mechanism to all methods, which we call "Dictionary + Locks".
But now, we have ConcurrentDictionary. The thread-safe description of the Dictionary class documentation on MSDN states that if you need to use a thread-safe implementation, use ConcurrentDictionary.
So, now that we have a thread-safe dictionary class, we don't need to implement it ourselves anymore. Great, isn't it?
Origin of the problem
In fact, I have only used CocurrentDictionary once before, in my test to test its responsiveness. Because it performed well in the tests, I immediately replaced it with my class, did some testing, and then something went wrong.
So, what went wrong? Didn't you say thread safe?
After more testing, I found the root of the problem. But for some reason, MSDN version 4.0 doesn't include a description of the GetOrAdd method signature that requires passing a delegate type parameter. After looking at version 4.5, I found this note:
If you call GetOrAdd simultaneously on different threads, addValueFactory may be called multiple times, but its key/value pair might not be added to the dictionary for every call. That's the problem I ran into. Because it wasn't previously described in the documentation, I had to do more testing to confirm the problem. Of course, the problem I'm running into is related to my usage, in general, I use the dictionary type to cache some data:
This data is very slow to create; This data can only be created once, because the second creation will throw an exception, or multiple creations may lead to resource leakage, etc.; I had a problem with the second condition. If both threads find that a piece of data does not exist, it will be created once, but only one result will be successfully saved. What about the other one?
If the process you create throws an exception, you can use try: catch (not elegant enough, but it solves the problem). But what if a resource is created and not recycled?
You might say that an object is created and will be garbage collected if it is no longer referenced in it. However, consider what would happen if the situation described below occurred:
Generate code dynamically with Emit. I used this approach in a Remoting framework and put all the implementations in an assembly that could not be recycled. If a type is created twice, the second will always exist, even if it has never been used. Create a thread directly or indirectly. For example, we need to build a component that uses a proprietary thread to process asynchronous messages and relies on the order in which they are received. When the component is instantiated, a thread is created. When this component instance is destroyed, the thread is also terminated. But if we delete the reference to the object after destroying the component, but the thread does not end for some reason and holds the reference to the object. Then, if the thread does not die, the object will not be recycled either. Perform a P/Invoke operation. Require that the number of closed times for the received handle must be the same as the number of openings. To be sure, there are many similar situations. For example, a dictionary object will hold a connection to a service on a remote server, which can only be requested once, and if it is requested a second time, the other service will think that some kind of error has occurred and log it in the log. (In a company I worked for, there were some legal penalties for this condition.) ) So, it's easy to see that Dictionary + Locks can't be hastily replaced with ConcurrentDictionary, even if the documentation says it's thread-safe.
Analyze the problem
Still don't understand?
It is true that this issue may not arise under the Dictionary + Locks approach. Since this depends on the specific implementation, let's look at this simple example:
In the above code, we hold the lock on the dictionary before starting to query the key value. If the specified key-value pair does not exist, it will be created directly. At the same time, because we already hold a lock on that dictionary, we can add key-value pairs directly to the dictionary. Then release the dictionary lock, and return the result. If two threads are querying the same key value at the same time, the first thread that gets the dictionary lock will complete the creation of the object, and the other thread will wait for the completion of this creation and get the created key value result after getting the dictionary lock.
That's good, isn't it?
It's really not! I don't think creating an object in parallel like this, where only one is used in the end, doesn't create the problem I've described.
The situation and problem I'm trying to elaborate on may not always be reproducible, in a parallel environment we can simply create two objects and then discard one. So, how exactly do we compare Dictionary + Locks and ConcurrentDictionary?
The answer is: it depends on the lock usage strategy and how the dictionary is used.
Game 1: Create the same object in parallel
First, let's assume that an object can be created twice, so what happens if two threads create this object at the same time?
Secondly, how long do we spend on similar creations?
We can simply build an example where instantiating an object takes 10 seconds. When the first thread creates the object 5 seconds later, the second implementation tries to call the GetOrAdd method to get the object, and since the object still doesn't exist, it also starts creating the object.
In this condition, we have 2 CPUs working in parallel for 5 seconds, and when the first thread finishes working, the second thread still needs to continue running for 5 seconds to complete the construction of the object. When the second thread finishes building the object, it finds that an object already exists, and it chooses to use the existing object and discard the newly created object directly.
If the second thread simply waits and the second CPU does some other work (running other threads or applications, saving some power), it will get the desired object after 5 seconds instead of 10 seconds.
So, under these conditions, Dictionary + Locks wins a small game.
Game 2: Visit different objects in parallel
No, the situation you said is not true at all!
Well, the above example is a bit peculiar, but it does describe the problem, it's just that this usage is more extreme. So, consider what happens if the first thread is creating an object, and the second thread needs to access another key-value object, and that key-value object already exists?
In ConcurrentDictionary, the lock-free design makes reads very fast because there is no lock on the read. In the case of Dictionary + Locks, the read operation will be locked mutually exclusive, even if it is a completely different key, which will obviously slow down the read operation.
In this way, ConcurrentDictionary pulled back a game.
Note: Here I consider that you understand several concepts such as Bucket/Node/Entry in the dictionary class, if not, it is recommended to read Ofir Makmal's article "Understanding Generic Dictionary in-depth", which explains these concepts well.
The third game of the game: read more and write single
What happens if you use Multiple Readers and Single Writer instead of a full lock on the dictionary in Dictionary + Locks?
If a thread is creating an object and holds an upgradeable lock until the object is created, the lock is upgraded to a write lock, then the read operation can be performed in parallel.
We can also solve the problem by leaving a read operation idle for 10 seconds. But if there are far more reads than writes, we will find that ConcurrentDictionary is still fast because it implements lock-free mode reads.
Using ReaderWriterLockSlim for Dictionaries makes reads worse, and it is generally recommended to use Full Lock for Dictionaries instead of ReaderWriterLockSlim.
So, under these conditions, ConcurrentDictionary won another game.
Note: I have covered the YieldReaderWriterLock and YieldReaderWriterLockSlim classes in previous articles. By using this read-write lock, the speed has been improved considerably (now evolved into SpinReaderWriterLockSlim) and allows multiple reads to be executed in parallel with little to no impact. While I'm still using this way, a lockless ConcurrentDictionary would obviously be faster.
Game 4: Add multiple key-value pairs
The showdown is not over yet.
What if we have multiple key values to add, and all of them don't collide and are assigned in different buckets?
At first, this question was curious, but I did a test that didn't quite fit. I used a dictionary of type <int, int> and the object's construction factory would return a negative result directly as the key.
I was expecting ConcurrentDictionary to be the fastest, but it turned out to be the slowest. Dictionary + Locks, on the other hand, performs faster. Why is that?
This is because ConcurrentDictionary allocates nodes and puts them in different buckets, which is optimized to meet the lock-free design for read operations. However, when adding key-value items, the process of creating a node becomes expensive.
Even in parallel conditions, allocating a Node lock still consumes more time than using a full lock.
So, Dictionary + Locks wins this game.
Playing the fifth game: The frequency of reading operations is higher
Frankly, if we had a delegate that could quickly instantiate objects, we wouldn't need a Dictionary. We can directly call the delegate to get the object, right?
In fact, the answer is also that it depends on the situation.
Imagine that the key type is string and contains path maps for various pages in the web server, and the corresponding value is an object type that contains the record of the current users accessing the page and the number of all visits to the page since the server started.
Creating an object like this is almost instantaneous. And after that, you don't need to create a new object, just change the values saved in it. So it is possible to allow the creation of a way twice until only one instance is used. However, because ConcurrentDictionary allocates Node resources more slowly, using Dictionary + Locks will result in faster creation times.
So, with this example is very special, we also see that Dictionary + Locks performs better under this condition, taking less time.
Although the node allocation in ConcurrentDictionary is slower, I didn't try to put 100 million data items into it to test the time. Because that obviously takes a lot of time.
But in most cases, once a data item is created, it is always read. How the content of the data item changes is another matter. So it doesn't matter how many milliseconds more it takes to create a data item, because reads are faster (just a few milliseconds faster), but reads happen more frequently.
So, ConcurrentDictionary won the game.
Game 6: Create objects that consume different times
What happens if the time it takes to create different data items varies?
Create multiple data items that consume different times and add them to the dictionary in parallel. This is the strongest point of ConcurrentDictionary.
ConcurrentDictionary uses a number of different locking mechanisms to allow data items to be added concurrently, but logic such as deciding which lock to use, requesting a lock to change the size of the bucket, etc., doesn't help. The speed at which data items are put into a bucket is machine fast. What really makes ConcurrentDictionary win is its ability to create objects in parallel.
However, we can actually do the same thing. If we don't care if we're creating objects in parallel, or if some of them have been discarded, we can add a lock to detect if the data item already exists, then release the lock, create the data item, press it to get the lock, check again if the data item exists, and if it doesn't, add the data item. The code might look something like this:
* Note that I use a dictionary of type <int, int>.
In the simple structure above, Dictionary + Locks performs almost as well as ConcurrentDictionary when creating and adding data items in parallel conditions. But there is also the same problem, where some values may be generated but never used.
conclusion
So, is there a conclusion?
At this moment, there are still some:
All dictionary classes are very fast. Even though I've created millions of data, it's still fast. Normally, we only create a small number of data items, and there are some time intervals between reads, so we generally don't notice the time overhead of reading data items. If the same object cannot be created twice, do not use ConcurrentDictionary. If you're really concerned about performance, Dictionary + Locks might still be a good solution. An important factor is the number of data items added and removed. But if there are many read operations, it is slower than ConcurrentDictionary. Although I didn't introduce it, there is actually more freedom to use the Dictionary + Locks scheme. For example, you can lock once, add multiple data items, delete multiple data items, or query multiple times, etc., and then release the lock. In general, avoid using ReaderWriterLockSlim if there are far more reads than writes. Dictionary types are already much faster than getting a read lock in a read-write lock. Of course, this also depends on the time consumed to create an object in a lock. So, I think the examples given are a bit extreme, but they show that using ConcurrentDictionary is not always the best solution.
Feel the difference
I wrote this article with the intention of seeking a better solution.
I'm already trying to get a deeper understanding of how a specific dictionary class works (now I feel like I'm very clear).
Arguably, Bucket and Node in ConcurrentDictionary are very simple. I did something similar when I tried to create a dictionary class. The regular Dictionary class may seem simpler, but in fact, it is more complex.
In ConcurrentDictionary, each Node is a complete class. In the Dictionary class, Node is implemented using a value type, and all nodes are kept in a huge array, while Bucket is used to index in the array. It is also used in place of a Node's simple reference to its next Node (after all, as a Node of a struct type, it cannot contain a Node member of a struct type).
When adding and removing a dictionary, the Dictionary class cannot simply create a new node, it must check if there is an index marking a node that has been deleted, and then reuse it. Or "Count" is used to get the position of the new Node in the array. In fact, when the array is full, the Dictionary class forces a size change.
For ConcurrentDictionary, a Node can be thought of as a new object. Removing a Node is simply removing its reference. Adding a new Node can simply create a new Node instance. Changing the size is only to avoid conflicts, but it is not mandatory.
So, if the Dictionary class is purposefully using more complex algorithms to handle it, how will ConcurrentDictionary ensure that it performs better in a multi-threaded environment?
The truth is: putting all the nodes in one array is the fastest way to allocate and read, even if we need another array to keep track of where to find those data items. So it looks like having the same number of buckets will use more memory, but the new data items don't need to be reallocated, no new object syncs are needed, and new garbage collection doesn't occur. Because everything is already in place.
However, replacing content in a Node is not an atomic operation, which is one of the factors that make its thread insecure. Because nodes are all objects, a node is initially created, and then a separate reference is updated to point to it (atomic operation here). So, the read thread can read the dictionary content without a lock, and the read must be one of the old and new values, and there is no chance of reading an incomplete value.
So, the truth is: if you don't need a lock, the Dictionary class is faster on reads, because it's the lock that slows down the read.
This article is translated from Paulo Zemek's article "Dictionary + Locking versus ConcurrentDictionary" on CodeProject, and some statements will change for reasons of understanding.
|