What data structures the .NET collections use
I’ve compiled some information about time complexity and underlying data structures of .NET simple collections and dictionaries. It was difficult to find some of this information on official sources like MSDN and non-official sources seemed to differ, so I used reflector and actually had a look at the .NET framework code to confirm these cases.
Simple collections
Type | Data structure | Notes |
---|---|---|
List<T> |
Array | A regular list using a dynamic array |
SortedSet<T> |
Red-black tree | A list stored using a red-black tree |
Complexity
Type | Get ([i] ) |
Find | Add | Insert | Remove |
---|---|---|---|---|---|
List |
O(1) | O(n) | O(1)* | O(n) | O(n) |
SortedSet |
N/A | O(\log n) | O(\log n) | O(\log n) | O(\log n) |
* List.Add
is O(n) when adding beyond the array’s capacity
Dictionaries
Dictionaries or hash tables are ideal when you either need to access the data via an arbitrary key or you need fast deletion and insertion. Note that this section doesn’t the Lookup
class which stores a collection of items against a key.
Type | Data structure | Notes |
---|---|---|
HashSet<T> |
Hash table | A hash table where the key is the object itself |
Dictionary<TKey, TValue> |
Hash table | A hash table using a key not necessarily on the object being stored |
SortedList<TKey, TValue> |
Array | The same as Dictionary only items and their keys are stored sorted arrays |
SortedDictionary<TKey, TValue> |
Red-black tree | The same as Dictionary only items and their keys are stored in a red-black tree. Uses SortedSet behind the scenes |
Complexity
Type | Find by key | Remove | Add |
---|---|---|---|
HashSet |
O(1)* | O(1)* | O(1)** |
Dictionary |
O(1)* | O(1)* | O(1)** |
SortedList |
O(\log n) | O(n) | O(n) |
SortedDictionary |
O(\log n) | O(\log n) | O(\log n) |
* O(n) with collision
** O(n) with collision or when adding beyond the array’s capacity
SortedList
vs SortedDictionary
SortedList
and SortedDictionary
are best used when you need order to the items that you’re storing. Here is a pretty detailed comparison from Microsoft:
The
SortedList<TKey, TValue>
generic class is an array of key/value pairs with O(\log n) retrieval, where n is the number of elements in the dictionary. In this, it is similar to theSortedDictionary<TKey, TValue>
generic class. The two classes have similar object models, and both have O(\log n) retrieval. Where the two classes differ is in memory use and speed of insertion and removal:
SortedList<TKey, TValue>
uses less memory thanSortedDictionary<TKey, TValue>
.SortedDictionary<TKey, TValue>
has faster insertion and removal operations for unsorted data, O(\log n) as opposed to O(n) forSortedList<TKey, TValue>
.- If the list is populated all at once from sorted data,
SortedList<TKey, TValue>
is faster thanSortedDictionary<TKey, TValue>
.Another difference between the
SortedDictionary<TKey, TValue>
andSortedList<TKey, TValue>
classes is thatSortedList<TKey, TValue>
supports efficient indexed retrieval of keys and values through the collections returned by the Keys and Values properties. It is not necessary to regenerate the lists when the properties are accessed, because the lists are just wrappers for the internal arrays of keys and values.
References
- MSDN -
List<T>
- MSDN -
SortedSet<T>
- MSDN -
HashSet<T>
- MSDN -
Dictionary<TKey, TValue>
- MSDN -
SortedList<TKey, TValue>
- MSDN -
SortedDictionary<TKey, TValue>