Growing with the Web

What data structures the .NET collections use

Published
Tags:

I’ve compiled some information about time complexity and underlying data structures of .NET simple collections and dictionaries. It was difficult to find some of this information on official sources like MSDN and non-official sources seemed to differ, so I used reflector and actually had a look at the .NET framework code to confirm these cases.

Simple collections

Type Data structure Notes
List<T> Array A regular list using a dynamic array
SortedSet<T> Red-black tree A list stored using a red-black tree

Complexity

Type Get ([i]) Find Add Insert Remove
List O(1) O(n) O(1)* O(n) O(n)
SortedSet N/A O(\log n) O(\log n) O(\log n) O(\log n)

* List.Add is O(n) when adding beyond the array’s capacity

Dictionaries

Dictionaries or hash tables are ideal when you either need to access the data via an arbitrary key or you need fast deletion and insertion. Note that this section doesn’t the Lookup class which stores a collection of items against a key.

Type Data structure Notes
HashSet<T> Hash table A hash table where the key is the object itself
Dictionary<TKey, TValue> Hash table A hash table using a key not necessarily on the object being stored
SortedList<TKey, TValue> Array The same as Dictionary only items and their keys are stored sorted arrays
SortedDictionary<TKey, TValue> Red-black tree The same as Dictionary only items and their keys are stored in a red-black tree. Uses SortedSet behind the scenes

Complexity

Type Find by key Remove Add
HashSet O(1)* O(1)* O(1)**
Dictionary O(1)* O(1)* O(1)**
SortedList O(\log n) O(n) O(n)
SortedDictionary O(\log n) O(\log n) O(\log n)

* O(n) with collision
** O(n) with collision or when adding beyond the array’s capacity

SortedList vs SortedDictionary

SortedList and SortedDictionary are best used when you need order to the items that you’re storing. Here is a pretty detailed comparison from Microsoft:

The SortedList<TKey, TValue> generic class is an array of key/value pairs with O(\log n) retrieval, where n is the number of elements in the dictionary. In this, it is similar to the SortedDictionary<TKey, TValue> generic class. The two classes have similar object models, and both have O(\log n) retrieval. Where the two classes differ is in memory use and speed of insertion and removal:

  • SortedList<TKey, TValue> uses less memory than SortedDictionary<TKey, TValue>.
  • SortedDictionary<TKey, TValue> has faster insertion and removal operations for unsorted data, O(\log n) as opposed to O(n) for SortedList<TKey, TValue>.
  • If the list is populated all at once from sorted data, SortedList<TKey, TValue> is faster than SortedDictionary<TKey, TValue>.

Another difference between the SortedDictionary<TKey, TValue> and SortedList<TKey, TValue> classes is that SortedList<TKey, TValue> supports efficient indexed retrieval of keys and values through the collections returned by the Keys and Values properties. It is not necessary to regenerate the lists when the properties are accessed, because the lists are just wrappers for the internal arrays of keys and values.

References

Like this article?
Subscribe for more!