With an unstable sort this might happen (pink shows the shuffling) |
Sorted by name
|
User sorts by number
|
User sorts by name again
| ||||||||||||||||||||||||||||||
With a stable sort the user will always see this |
Sorted by name
|
User sorts by number
|
User sorts by name again
|
|
<-- | Switching to Insertion | <-- | ||
---|---|---|---|---|---|
The chart at left shows how different values of the M parameter affect sort performance (for
an array of 5000 distinct long integers). These are NOT running averages, I ran the test for
each M only once - but the trend lines are clear. The different colours indicate how much (and
what sort of order) is found in the input data:
|
|
<-- | Middle Element | <-- | ||
---|---|---|---|---|---|
The chart at left shows timings for a conventional Quicksort algorithm (VB implementation,
sorting long integers via an index array, running under the VB IDE on a PII/350),
which always chooses the middle element of a range as the partition element,
when supplied with
|
|
<-- | Random Element | <-- | ||
---|---|---|---|---|---|
The chart at left shows timings for the same Quicksort algorithm, modified to choose
a random partition element (these aren't averages, I only ran each test once. That's
why there are spikes here and there). There's not much difference. Arrays that are already in
ascending order tend to sort faster than ones in reverse order because less swaps are
necessary. Arrays mostly in reverse order (usually) sort faster than ones mostly in random order
because early partitions will swap them - once -and then they'll be in order and won't need
swapping again. For N below 1000 the extra cost of calling Rnd() seems to outweigh the
advantages. |
|
<-- | Avoiding Recursion | <-- | ||
---|---|---|---|---|---|
I'll give you the numbers because the differences aren't that clear from the chart.
When we compare for 20480 records, we get:
|
Const minSubFile = 9 Sub ISortAlgorithm_Sort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant) If l + minSubFile <= r Then Quicksort d, l, r, a 'Don't bother quicksorting small arrays End If InsertionSort d, l, r, a 'Straight insertion sort End Sub Sub Quicksort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant) 'Description: Two-sided recursion verison (may blow stack limit!) Dim p As Long 'index of index of partition element Dim v As Variant 'partition value Dim rr As Long 'original right boundary Dim temp As Long 'for swapping indices rr = r 'remember the index of end of partition p = Int(l + r) / 2 'pick partition element v = d(a(p)) 'remember partitioning value temp = a(p): a(p) = a(l): a(l) = temp 'swap index of partition element to bottom p = l 'remember where we put it Do Do For l = l + 1 To r - 1 If v <= d(a(l)) Then Exit For Next For r = r To l + 1 Step -1 If d(a(r)) <= v Then Exit Do Next If rr < l Then 'if we tacked an infinity on the right we could do without this l = l - 1 ElseIf v < d(a(l)) Then l = l - 1 End If temp = a(p): a(p) = a(l): a(l) = temp If p <= l - minSubFile Then Quicksort d, p, l - 1, a If rr - l >= minSubFile Then Quicksort d, l + 1, rr, a Exit Sub Loop temp = a(l): a(l) = a(r): a(r) = temp r = r - 1 Loop End Sub Private Sub InsertionSort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant) Dim j As Long 'index when searching for inversions Dim i As Long 'index when correcting inversions Dim v As Variant 'value of element on the move Dim oldI As Long 'index of element being moved For j = l + 1 To r If d(a(j)) < d(a(j - 1)) Then i = j - 1 oldI = a(j) v = d(oldI) Do a(i + 1) = a(i) i = i - 1 If i < l Then Exit Do 'because we don't have a magic low value at the bottom Loop Until d(a(i)) <= v a(i + 1) = oldI End If Next End Sub
sub randomizeArrayOrder(byref a As Variant) dim i As Long 'index of element to swap dim j As Long 'index of element to swap it with dim m As Long 'number of elements remaining that might be swapped dim temp As Long 'for swapping elements (declare as Variant if a isn't an array of longs) m = ubound(a) - lbound(a) + 1 for i=lbound(a) to ubound(a)-1 j = i + int(m*rnd()) temp=a(i) a(i)=a(j) a(j)=temp m = m - 1 next end subWhen you are randomizing order by exchanging, it takes three copies to exchange two elements. If you've got more room to work in you can go a bit quicker.
sub randomizeArrayOrder(byref aIn as Variant, byref aOut as Variant) 'aIn gets trashed dim i As Long 'index of element to replace dim j AS Long 'index of element to replace it with dim m As Long 'number of elements remaining that might be swapped m = ubound(aIn) - lbound(aIn) + 1 for i=lbound(aIn) to ubound(aIn) j = i + int(m*rnd()) aOut(i)=aIn(j) aIn(j)=aIn(i) '<--two copies, not three, but aIn is trashed m=m-1 next end sub 'Note that for this to work aIn must not be the same array as aOut! 'You can trick VB into getting that right by calling it like this: randomizeArrayOrder (a),a 'The brackets force VB to make a temporary copy of A. VB can copy an entire array 'in one hit, saving a lot of time.
|
<-- | Duplicate Values... | <-- | ||
---|---|---|---|---|---|
It's the red and pink lines that are really scary! Though the purple line doesn't look too
good either. What this graph shows is that using Quicksort to sort on a field which only has
a few possible values (or where most of the values of the records will be the same, say for
a comment field that isn't used much, or the third line of an address field, or the
"reasons you should cut my pay" field...)... is not a good idea. Or at least,
not the usual version of Quicksort.
|
while l<r and element(l)<=pivot l = l + 1 wend while l<r and pivot<=element(r) 'technically we don't need to check l<r here r = r - 1 'in most implementations, as the pivot element itself wend 'will have been swapped with the leftmost element in the 'current partition. 'if we get here we may have two records to swapIt shouldn't. What we should be doing, if we want better performance when there are lots of duplicated values, is swapping more agressively. In other words, whenever we have an element equal to the pivot or belonging on the other side, we swap it. Yeah, we're deliberately swapping when we don't have to. This idea is from R.M.Sedgewick, and I'm glad he wrote it in to Donald Knuth! It's counterintuitive! It turns out that the extra swaps are less important than the fact that swapping aggressively keeps the left and right subpartitions about the same size whenever most of the values we find are the same as the pivot. With that change in place, running the same tests gets us these results:
|
<-- | Duplicate Values Part 2 | <-- | ||
---|---|---|---|---|---|
Aa you can see, this version works much better for arrays with many duplicates. I also put
in two tests against ascending and randomly ordered arrays (to show it works okay for
data without all those duplicates, too).
|
sub rekeySortOrder(byref d as Variant, byref a as Variant) ' a=array of indices resulting from unstable sort pass ' containing one copy of each of the values between lbound(a) ' and ubound(a). ' d=the column input array supplied to the sorting pass. ' dim class 'indicates the equivalence class for each element dim classCount 'count of elements in each equivalence class dim thisClass 'the current equivalence class number dim i 'index into array of indices dim j 'index into array of equivalence counts redim class(lbound(a) to ubound(a)) redim classCount(0 to ubound(a)-lbound(a)) class(lbound(a)) = 0 classCount(0) = 1 thisClass = 0 for i=lbound(a)+1 to ubound(a) if d(a(i-1)) < d(a(i)) then thisClass = thisClass + 1 end if class(a(i)) = thisClass classCount(thisClass) = classCount(thisClass) + 1 next ' 'We now know which equivalence class each element belongs in 'And we know how many elements there were in each class. 'So now we just need to know where the rows in each equivalence 'class will start in the output array, and we can re-order. ' dim offset dim classOffset redim classOffset(0 to thisClass) offset = lbound(a) for j=0 to thisClass classOffset(j) = offset offset = offset + classCount(j) next ' 'At this point we know (a) where each equivalence class should 'start in the output array (classOffset gives us this), 'and (b) which class each element is in (class gives us this). 'So, we can interate through the class array, writing into the 'part of the output order array for the appropriate equivalence 'class. ' for i = lbound(a) to ubound(a) j = class(i) a(classoffset(j)) = i classOffset(j) = classOffset(j) + 1 next end subSo far as I know, nobody has published an algorithm for making an unstable sort stable which will execute in time proportional to the number of records (as this one does) - although list insertion and (particularly) distribution sorts hint at this technique. With this little beast in your arsenal, you can use unstable sorts where stable sorts are required, and then run the above code against the sort order.
Const minSubFile As Long= 9 Private partition2 As Variant 'Indices from the right that belong on the left (in order found) Private partition3 As Variant 'Indices from the left that belong in the centre (ascending) Sub ISortAlgorithm_Sort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant) If Not initialized Then ReDim partition2(l To r) As Long ReDim partition3(l To r) As Long initialized = True End If CheckBoundsEnough 0, r - l, partition2 CheckBoundsEnough 0, r - l, partition3 If l + minSubFile <= r Then Quicksort d, l, r, a 'Don't bother quicksorting small arrays InsertionSort d, l, r, a 'Straight insertion sort End Sub Private Sub InsertionSort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant) Dim j As Long 'index when searching for inversions Dim i As Long 'index when correcting inversions Dim v As Variant 'value of element on the move Dim oldI As Long 'index of element being moved For j = l + 1 To r If d(a(j)) < d(a(j - 1)) Then i = j - 1 oldI = a(j) v = d(oldI) Do a(i + 1) = a(i) i = i - 1 If i < l Then Exit Do 'because we don't have a magic low value at the bottom Loop Until d(a(i)) <= v a(i + 1) = oldI End If Next End Sub Private Sub Quicksort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant) Dim n As Long 'Number of elements in area to sort Dim m As Long 'Used to calculate stack size required Dim p As Long 'index of index of partition element Dim v As Variant 'partition value Dim i As Long 'Read point in initial index array Dim x As Long 'Original index at that point Dim vv As Variant 'Value of array element being compared to partition Dim p2Count As Long 'Number of elements from left belong in centre Dim p3Count As Long 'Number of elements from left belonging at right Dim w As Long 'Write point (when writing back new order for the partition) Dim lStack() As Long 'Stack of left boundaries Dim rStack() As Long 'Stack of right boundaries Dim stackSize As Long 'Initially, needed stack size. Later, current stack size. Dim cl As Long 'Destination of first element matching current partition value Dim cr As Long 'Destination of last element matching current partition value 'cl and cr are used to determine the next partitions to sort. n = r - l + 1 m = 1 While m < n * 2 m = m * 2 stackSize = stackSize + 1 Wend ReDim lStack(stackSize) 'We use a stack so we can avoid the overhead for ReDim rStack(stackSize) 'recursive procedure calls. We only need to stack l and r! stackSize = -1 'Nothing in the stack Do p = Int(l + r) / 2 'pick partition element (could be random, doesn't really matter) v = d(a(p)) 'remember partitioning value p2Count = 0 p3Count = 0 w = l For i = l To r x = a(i) vv = d(x) If vv < v Then a(w) = x w = w + 1 ElseIf v < vv Then partition3(p3Count) = x p3Count = p3Count + 1 Else partition2(p2Count) = x p2Count = p2Count + 1 End If Next cl = w 'first point in centre For i = 0 To p2Count - 1 a(w) = partition2(i) w = w + 1 Next cr = w - 1 'last point in centre For i = 0 To p3Count - 1 a(w) = partition3(i) w = w + 1 Next If cl - l < r - cr Then If minSubFile < r - cr Then stackSize = stackSize + 1 lStack(stackSize) = cr + 1 rStack(stackSize) = r End If r = cl - 1 Else If minSubFile < cl - l Then stackSize = stackSize + 1 lStack(stackSize) = l rStack(stackSize) = cl - 1 End If l = cr + 1 End If If r - l < minSubFile Then If stackSize = -1 Then Exit Do l = lStack(stackSize) r = rStack(stackSize) stackSize = stackSize - 1 End If Loop End Sub
|
<-- | Stable 3-Way Quicksort | <-- | ||
---|---|---|---|---|---|
This algorithm performs very well when there aren't many different values in the column
of the array being sorted. The fewer there are the better it does. However, the
modified version from the last graph works with ordered unique values a lot better,
and unique random values a little better, so this one is probably worth using only when
you can be certain there will be a lot of duplicates in the area. If you're sure there
won't be, use the standard one.
|