Bucket sort
Bucket sort, also known as bin sort, is a distribution sort that works by arranging elements into several ‘buckets’ which are then sorted using another sort, typically insertion sort, and merged into a sorted list.
Bucket sort can be exceptionally fast because of the way elements are assigned to buckets, typically using an array where the index is the value. This means that more auxiliary memory is required for the buckets at the cost of running time than more comparison sorts. It runs in O(n+k) time in the average case where n is the number of elements to be sorted and k is the number of buckets.
While bucket sort is a distribution sort, it typically uses a comparison sort to sort the buckets after they have been allocated.
Complexity
Time | Space | ||
---|---|---|---|
Worst case | Best case | Average case | Worst case |
O(n^2) | O(n + k) | O(n + k) | O(n + k) auxiliary |
When it’s fast
Bucket sort’s best case occurs when the data being sorted can be distributed between the buckets perfectly. If the values are sparsely allocated over the possible value range, a larger bucket size is better since the buckets will likely be more evenly distributed. An example of this is [2303, 33, 1044]
, if buckets can only contain 5 different values then for this example 461 buckets would need to be initialised. A bucket size of 200-1000 would be much more reasonable.
The inverse of this is also true; a densely allocated array like [103, 99, 119, 112, 111]
performs best when buckets are as small as possible.
Bucket sort is an ideal algorithm choice when:
- The additional O(n + k) memory usage is not an issue
- Elements are expected to be fairly evenly distributed
When it’s slow
Bucket sort performs at its worst, O(n^2), when all elements at allocated to the same bucket. Since individual buckets are sorted using another algorithm, if only a single bucket needs to be sorted, bucket sort will take on the complexity of the inner sorting algorithm.
This depends on the individual implementation though and can be mitigated. For example a bucket sort algorithm could be made to work with large bucket sizes by using insertion sort on small buckets (due to its low overhead), and merge sort or quicksort on larger buckets.
Bucket sort vs counting sort vs radix sort
There are two main variants of bucket sort; one where there is a bucket for each value, and where buckets hold several values. Bucket sort is often seen as a generalisation of counting sort because bucket sort with a bucket size of 1 is essentially counting sort, just with a more complex implementation. This can be spun the other way as well by saying that bucket sort is counting sort that uses more sophisticated buckets.
Another similar sort, radix sort, also distributes items into buckets. However, they are distributed based on the individual digits within the values themselves.
In summary:
- Counting sort: buckets hold only a single value
- Bucket sort: buckets hold a range of values
- Radix sort: buckets hold values based on digits within their values
Code
The most common implementation of bucket sort works by splitting the array of size n into k buckets, each of which house a value range of n/k. The buckets are then sorted using a simple sorting algorithm that works well on the expected data, such as insertion sort. Buckets are typically implemented using either linked lists or dynamic arrays.
public class BucketSort : IIntegerSortingAlgorithm {
private const int DefaultBucketSize = 5;
public void Sort(int[] array) {
Sort(array, DefaultBucketSize);
}
public void Sort(int[] array, int bucketSize) {
if (array.Length == 0) {
return;
}
// Determine minimum and maximum values
int minValue = array[0];
int maxValue = array[0];
for (int i = 1; i < array.Length; i++) {
if (array[i] < minValue) {
minValue = array[i];
} else if (array[i] > maxValue) {
maxValue = array[i];
}
}
// Initialise buckets
int bucketCount = (maxValue - minValue) / bucketSize + 1;
IList<List<int>> buckets = new List<List<int>>(bucketCount);
for (int i = 0; i < bucketCount; i++) {
buckets.Add(new List<int>());
}
// Distribute input array values into buckets
for (int i = 0; i < array.Length; i++) {
buckets[(array[i] - minValue) / bucketSize].Add(array[i]);
}
// Sort buckets and place back into input array
int currentIndex = 0;
var insertionSort = new InsertionSort<int>();
for (int i = 0; i < buckets.Count; i++) {
int[] bucketArray = buckets[i].ToArray();
insertionSort.Sort(bucketArray);
for (int j = 0; j < bucketArray.Length; j++) {
array[currentIndex++] = bucketArray[j];
}
}
}
}
public class BucketSort {
private static final int DEFAULT_BUCKET_SIZE = 5;
public static void sort(Integer[] array) {
sort(array, DEFAULT_BUCKET_SIZE);
}
public static void sort(Integer[] array, int bucketSize) {
if (array.length == 0) {
return;
}
// Determine minimum and maximum values
Integer minValue = array[0];
Integer maxValue = array[0];
for (int i = 1; i < array.length; i++) {
if (array[i] < minValue) {
minValue = array[i];
} else if (array[i] > maxValue) {
maxValue = array[i];
}
}
// Initialise buckets
int bucketCount = (maxValue - minValue) / bucketSize + 1;
List<List<Integer>> buckets = new ArrayList<List<Integer>>(bucketCount);
for (int i = 0; i < bucketCount; i++) {
buckets.add(new ArrayList<Integer>());
}
// Distribute input array values into buckets
for (int i = 0; i < array.length; i++) {
buckets.get((array[i] - minValue) / bucketSize).add(array[i]);
}
// Sort buckets and place back into input array
int currentIndex = 0;
for (int i = 0; i < buckets.size(); i++) {
Integer[] bucketArray = new Integer[buckets.get(i).size()];
bucketArray = buckets.get(i).toArray(bucketArray);
InsertionSort.sort(bucketArray);
for (int j = 0; j < bucketArray.length; j++) {
array[currentIndex++] = bucketArray[j];
}
}
}
}
/**
* Sorts an array using bucket sort.
* @param {number[]} array The array to sort.
* @param {number} [bucketSize=5] The number of values a bucket can hold.
* @returns The sorted array.
*/
function sort(array, bucketSize) {
if (array.length === 0) {
return array;
}
// Determine minimum and maximum values
var i;
var minValue = array[0];
var maxValue = array[0];
for (i = 1; i < array.length; i++) {
if (array[i] < minValue) {
minValue = array[i];
} else if (array[i] > maxValue) {
maxValue = array[i];
}
}
// Initialise buckets
var DEFAULT_BUCKET_SIZE = 5;
bucketSize = bucketSize || DEFAULT_BUCKET_SIZE;
var bucketCount = Math.floor((maxValue - minValue) / bucketSize) + 1;
var buckets = new Array(bucketCount);
for (i = 0; i < buckets.length; i++) {
buckets[i] = [];
}
// Distribute input array values into buckets
for (i = 0; i < array.length; i++) {
buckets[Math.floor((array[i] - minValue) / bucketSize)].push(array[i]);
}
// Sort buckets and place back into input array
array.length = 0;
for (i = 0; i < buckets.length; i++) {
insertionSort(buckets[i]);
for (var j = 0; j < buckets[i].length; j++) {
array.push(buckets[i][j]);
}
}
return array;
}
def sort(array, bucketSize=DEFAULT_BUCKET_SIZE):
if len(array) == 0:
return array
# Determine minimum and maximum values
minValue = array[0]
maxValue = array[0]
for i in range(1, len(array)):
if array[i] < minValue:
minValue = array[i]
elif array[i] > maxValue:
maxValue = array[i]
# Initialize buckets
bucketCount = math.floor((maxValue - minValue) / bucketSize) + 1
buckets = []
for i in range(0, bucketCount):
buckets.append([])
# Distribute input array values into buckets
for i in range(0, len(array)):
buckets[math.floor((array[i] - minValue) / bucketSize)].append(array[i])
# Sort buckets and place back into input array
array = []
for i in range(0, len(buckets)):
insertion_sort.sort(buckets[i])
for j in range(0, len(buckets[i])):
array.append(buckets[i][j])
return array
module BucketSort
include InsertionSort
# Sorts an array using bucket sort.
def self.sort(array, bucket_size = 5)
if array.empty?
return
end
# Determine minimum and maximum values
min_value = array.min
max_value = array.max
# Initialise buckets
bucket_count = ((max_value - min_value) / bucket_size).floor + 1
buckets = Array.new(bucket_count)
(0..buckets.length - 1).each do |i|
buckets[i] = []
end
# Distribute input array values into buckets
(0..array.length - 1).each do |i|
buckets[((array[i] - min_value) / bucket_size).floor].push(array[i])
end
# Sort buckets and place back into input array
array.clear
(0..buckets.length - 1).each do |i|
InsertionSort.sort buckets[i]
buckets[i].each do |value|
array.push(value)
end
end
end
end
Textbooks
Here are two CS textbooks I personally recommend; the Algorithm Design Manual (Steven S. Skiena) is a fantastic introduction to data structures and algorithms without getting to deep into the maths side of things, and Introduction to Algorithms (CLRS) which provides a much deeper, math heavy look.