Counting sort
Counting sort is a distribution sort that achieves linear time complexity given some trade-offs and provided some requirements are met.
Counting sort works by creating an auxiliary array the size of the range of values, the unsorted values are then placed into the new array using the value as the index. The auxiliary array is now in sorted order and can be iterated over to construct the sorted array.
Counting sort can be exceptionally fast because of the way that elements are sorted using their values as array keys. This means that more memory is required for the extra array at the cost of running time. It runs in O(n + k) time where n is the number of elements to be sorted and k is the number of possible values in the range.
Complexity
Time | Space | ||
---|---|---|---|
Worst case | Best case | Average case | Worst case |
O(n + k) | O(n + k) | O(n + k) | O(k) auxiliary |
Primitives, objects and duplicates
The basic algorithm can be augmented depending on the situation. For example when sorting primitives, you likely don’t care about retaining the original reference or stability of duplicates so the auxiliary array can be used to count instances of the value which can be reconstructed after.
However, sorting objects where there can be duplicates will require a more sophisticated method of storing values in the auxiliary array, such as a linked list or dynamic array.
Non-zero minimum
Most versions of counting sort use a minimum value of either 0 or 1, this can easily be adjusted to suit any minimum value though by shifting the indexes back and forth by the minimum value.
Say the list is known to have a minimum possible value of 200, the algorithm can be modified so that values are added onto auxiliary array at index (value - 200) and added back on to the sorted array with the value (index + 200), improving both memory usage and performance.
Ideal usage scenario
Counting sort is an ideal choice when:
- The list is made up of integers or can be mapped to integers
- The range of elements is known
- Most of the elements in the range are present
- The additional memory usage is not an issue
If the list is known to be partially sorted then another option such as insertion sort may end up being better, since counting sort does not take advantage of that.
Code
public class CountingSort : IIntegerSortingAlgorithm {
public void Sort(int[] array) {
if (array.Length == 0) {
return;
}
// Determine minimum and maximum values
int minValue = array[0];
int maxValue = array[0];
for (int i = 1; i < array.Length; i++) {
if (array[i] < minValue) {
minValue = array[i];
} else if (array[i] > maxValue) {
maxValue = array[i];
}
}
Sort(array, minValue, maxValue);
}
public void Sort(int[] array, int minValue, int maxValue) {
int[] buckets = new int[maxValue - minValue + 1];
for (int i = 0; i < array.Length; i++) {
buckets[array[i] - minValue]++;
}
int sortedIndex = 0;
for (int i = 0; i < buckets.Length; i++) {
while (buckets[i] > 0) {
array[sortedIndex++] = i + minValue;
buckets[i]--;
}
}
}
}
public class CountingSort {
public static void sort(Integer[] array) {
if (array.length == 0) {
return;
}
// Determine minimum and maximum values
Integer minValue = array[0];
Integer maxValue = array[0];
for (int i = 1; i < array.length; i++) {
if (array[i] < minValue) {
minValue = array[i];
} else if (array[i] > maxValue) {
maxValue = array[i];
}
}
sort(array, minValue, maxValue);
}
public static void sort(Integer[] array, int minValue, int maxValue) {
int[] buckets = new int[maxValue - minValue + 1];
for (int i = 0; i < array.length; i++) {
buckets[array[i] - minValue]++;
}
int sortedIndex = 0;
for (int i = 0; i < buckets.length; i++) {
while (buckets[i] > 0) {
array[sortedIndex++] = i + minValue;
buckets[i]--;
}
}
}
}
/**
* Sorts an array using counting sort.
* @param {Array} array The array to sort.
* @param {number} maxValue The max value for the counting sort.
*/
function sort(array, maxValue) {
var buckets = new Array(maxValue + 1);
var sortedIndex = 0;
var i;
for (i = 0; i < array.length; i++) {
if (!buckets[array[i]]) {
buckets[array[i]] = 0;
}
buckets[array[i]]++;
}
for (i = 0; i < buckets.length; i++) {
while (buckets[i] > 0) {
array[sortedIndex++] = i;
buckets[i]--;
}
}
return array;
}
def sort(array, maxValue=None):
if maxValue is None:
maxValue = 0
for i in range(0, len(array)):
if array[i] > maxValue:
maxValue = array[i]
buckets = [0] * (maxValue + 1)
sortedIndex = 0
for i in range(0, len(array)):
buckets[array[i]] += 1
for i in range(0, len(buckets)):
while (buckets[i] > 0):
array[sortedIndex] = i
sortedIndex += 1
buckets[i] -= 1
return array
module CountingSort
# Sorts an array using counting sort.
def self.sort(array, max_value)
buckets = Array.new(max_value + 1)
sorted_index = 0
(0..array.length - 1).each do |i|
if buckets[array[i]].nil?
buckets[array[i]] = 0
end
buckets[array[i]] += 1
end
(0..buckets.length - 1).each do |i|
while not buckets[i].nil? and buckets[i] > 0
array[sorted_index] = i
sorted_index += 1
buckets[i] -= 1
end
end
end
end
References
- T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, “Sorting in Linear Time” in Introduction to Algorithms, 2nd ed., Cambridge, MA: The MIT Press, 2001, ch. 8, pp.168-170
Textbooks
Here are two CS textbooks I personally recommend; the Algorithm Design Manual (Steven S. Skiena) is a fantastic introduction to data structures and algorithms without getting to deep into the maths side of things, and Introduction to Algorithms (CLRS) which provides a much deeper, math heavy look.