Oh no, sorting! It is known to everyone! It is from CompSci basics, there are tons of books around and every sane programming language has sorting implementation. Why bother about it? Well, because actually nobody reads about sorting, except when trapped in a corner, e.g. when one needs to write a code that sorts, say, an 8 GB file. Having even a brief knowledge of sorting algorithms principles can give you a serious advance over your mates. I will not bore you to death with endless pictures of moving rectangles—just the bare minimum of words to keep you informed.

Insertion sort

You actually know this algorithm because you use it on a daily basis to sort money in your wallet. You choose a banknote and scan your wallet to find an appropriate place among other notes you have. Then you proceed to next note and so on. This algorithm has O(n2) time complexity, so it is not appropriate for large datasets (it means that you won’t use it if you have 106 or more banknotes in your wallet ;).

If Insertion Sort is so bad, why bother remembering it? First, it can be implemented in about 5 lines of code. Second, on small datasets (about 50 items) it is fast enough to compete with more sophisticated algorithms, and it even can beat them because it doesn’t use recursion and additional storage. Third, it is stable. It means that if you have several items with the same value, their relative positions to each other will remain unchanged after sorting. This is useful when your items are actually n-field tuples and you need to sort them by several fields.

Merge Sort

A simple idea: a single element is already sorted, right? Thus, we can take two adjacent elements of our array and merge them into sorted sequence of two elements. Then we take those two sequences, merge them and get a four elements sorted sequence, and so on, until our subsequence grows up to the whole input array. Because we can merge two sorted sequences in a linear time, the total complexity of Merge Sort is O(n log n) (it’s the least possible time complexity for comparison sorts), but it uses up to O(n) additional space to hold the result of merging. Merge Sort is stable too.

Quick Sort

You must have heard of it. This is a divide-and-conquer algorithm invented by C.A.R. Hoare. On each iteration a dividing item is chosen, and after each iteration a sub-array will contain firstly all lesser items (unsorted), then an item itself and then all items that greater than the dividing item. Initially we take the whole array and then recursively repeat iteration for sub-arrays to the left and to the right from the dividing item.

Worst-case time for Quick Sort is O(n2), but if a dividing item is chosen randomly, in average we will have a O(n log n) complexity. Quick Sort is often used in real-life sorting implementations. To avoid extensive recursion, a cutoff value is defined for the minimum sub-array size, then sorting is finished using Insertion Sort or Heap Sort. For example, a sort algorithm from C++ STL library uses this approach.

Heap Sort

This algorithm exploits an interesting data structure: heap. Heap is a binary tree with two defining properties: first, in every sub-tree the greatest item is in the root, second, leaf items are only allowed at the last two levels, and the last level is filled from left to right. The latter property allows heap to be stored in an array level-by-level. Heap Sort consists of two phases: first, a heap is built from an input array by adding each item to the heap (this is done in place, only constant space is needed), then we take a root item of heap (remember, it’s the greatest item!) and swap it with the last item of array, swapped item is then sifted down to its place in heap. Having heap restored, we repeat swapping root, now with next to last item of array, and so on.

Heap Sort has a guarantied O(n log n) time complexity but in general implementations it loses in absolute performance to Quick Sort. David MacKay has an explanation for it.

Counting Sort

The last one. It can astonish you, but this algorithm has a linear time complexity. I mean O(n). And earlier I’ve said that the best we can have for comparison sorting is O(n log n). The trick is that Counting Sort doesn’t compare items, thus it’s not a comparison sort. Counting Sort is only applicable to numbers. It uses the fact that a number can be used as an index in another array. So, we create an additional array to count the number of occurrences of each number value from an array being sorted. Then we can just rebuild our input array and voilà! Counting sort can even be made stable, see here.

Advertisements