You are visitor number ????? since 29 February 2004

Go to Howard Kaikow's home page

Copyright © 2003-2007 by Howard Kaikow. All rights reserved.

Date: 16 July 2007

Author: Howard Kaikow <kaikow@standards.com>

Subject: Sort Performance Comparison program - General description


Purpose of the program

The program is intended to facilitate the comparison of several well-known sorting methods so one can choose among the methods and, for Microsoft Office VBA programmers, choose an alternative to Microsoft Word's WordBasic.SortArray, Microsoft Word's document sort, and Microsoft Excel's worksheet sort.

Please note that by making this program available, I am not indicating any intent to recommend algorithms or to discuss the details of any algorithms.

Overview

Running the program

Terminating the program

Selecting sort methods

Number of data items

Sorting algorithms

Data generation

Data types

Source of the unsorted data

Target for the sorted data

Program outputs


Overview

I have often wondered why Microsoft did not include a VBA method to sort an array, e.g. why was there not a Word VBA method included to replace WordBasic's SortArray? In newsgroups and web forums, many people have asked questions such as: "How do I sort an array in VBA?". Often the response has been one of the following:

For Microsoft Word ("Word"):

For Microsoft Excel ("Excel"):

In the case of Word, it seemed obvious that using WordBasic.SortArray would be best. After all, the WordBasic.SortArray method is compiled into Word, and surely, Microsoft must have used a reasonable sorting algorithm, so how could one expect to write a sorting procedure in VB/VBA that would outperform WordBasic.SortArray?

Up until a few years ago, I too believed that there was no point in trying to do better than using WordBasic.SortArray, but then I got curious and did some experimentation. I was shocked to learn how badly WordBasic.SortArray performed. It is rather trivial to outdo the performance of WordBasic.SortArray. Along the way, I also did some experimentation with Excel.

I have developed a Microsoft Visual Basic 6 ("VB 6") program to facilitate comparing the performance of various sorting methods.

Download the Sort Performance Comparison Program

Click here to view more information for this program.


Running the program

Note: It is very important to read the documentation for this program before running the program.

There is no installation necessary, just directly run the program. You should first read the remainder of this document before running the program, especially, Terminating the program and Selecting sort methods.

In order to run this program, you will need to have the following installed:

When you run the program, a Visual Basic 6 Form will be displayed that looks something like the following.

The ListBox near the top of the Form lists the current action being performed by the program and provides certain other information.

The Form displays the following controls (not all the controls are displayed for each data type):

Select Sorting Algorithm combo box

Choosing Algorithm Comparison from the combo box presents a list of several sorting algorithms. Choosing any of the other options in the combo box presents a list of one, or more, implementations of the selected sorting algorithm.

Note that the methods listed with Algorithm Comparison are not necessarily the most efficient for each algorithm.

List of sorting methods

Use the check boxes to select, one, or more, sorting methods .

Number of data items

Enter the number of data items, the sample size, to be sorted. The program will generate the data.

Order of source data

Choose how to order the data to be sorted.

Use Integer data

If checked, the program uses only integer data.

Maximum integer value Choose the maximum integer value to be used when you have elected to use integer data.

Choose data type

Choose the data type.

Case insensitive sort If checked, the program will perform a case insensitive sort for the String data type.
Use non-numeric strings If not checked, strings will be numbers. If checked, strings will be data randomly chosen from an internal word list.

Display data

If checked, both the unsorted and sorted data will be displayed in ListBoxes for the last selected sort method that was run successfully.

Reuse data

If checked, the data used for the previous set of selected sort methods will be reused, but only if the data type, the number of data items, whether non-numeric, and the order of source data have not been modified.

Number of samples

The number of times the selected sort methods should be run. If Reuse data is checked, the same data will be used for each run.

Select Source

Select the source of the unsorted data .

Select Target

Select where to put the sorted data.
Set randomize seed If checked, each time you load the program, the same seed will be used to generate the same random number sequence when you first select the Start button
Verify sort If the target is an array, the correctness of the sort will be verified.

Clear

Unselects all selected sort methods.

Select All

Selects all listed sort methods.

Exit

Terminate the program.

Start

Run using the selected options.

Terminating the program

In order to allow for more accurate timing estimates, the program cannot include code to facilitate your interrupting program execution. The program needs to run to completion and self-terminate. If you have a need to terminate the program, you may do so by using the Windows Task Manager (CTRL + ALT + DEL) to terminate the program.

If the program is allowed to run to completion, the program should clean up after itself. If you interrupt program execution, and you have selected any of the options that utilize Excel or Word, you should also terminate the WinWord and Excel tasks created by the program. The tasks will also vanish upon rebooting.

Selecting sort methods

Some of the sorting algorithms are very efficient, but a few, such as Bubble Sort, are very inefficient.

Click here to view some examples of performance.

Number of data items

The sample size that may be used is limited by:

Sorting algorithms

The program includes implementations of the following sorting algorithms:

The program provides a means to compare the performance of the algorithms with each other, with various implementations of the same algorithm, and with the following:

Verification of sort

If the target of the sort is an array, the program will verify whether the sort was done correctly.

The program will not attempt to verify the correctness of the following as I have yet to see documentation of how those methods sort:

Data generation

The program will generate the data used and allows you to specify the order of the data that is to be sorted.

Random

The VB Rnd function is used to generate random data. The data will be non-numeric strings if Use non-numeric strings is checked. If Use non-numeric strings is not checked, the data will be numbers. The data will be all integers if Use integer data is checked.

Partially sorted data

The data will be consecutive integers starting with 0. The number of integers will be equal to the sample size. The first data item will be swapped with a randomly (using the VB Rnd function) chosen data item.

Shuffle

The data will be consecutive integers starting with 0. The number of integers will be equal to the sample size. The data will be shuffled.

Forward

The data will be consecutive integers, in ascending order, starting with 0. The number of integers will be equal to the sample size.

Reverse

The data will be consecutive integers, in descending order, starting with one less than the sample size. The number of integers will be equal to the sample size.

Constant

The data will be a constant value.

Maximum integer value

You may choose the maximum integer value to be used when Use integer data is checked.

Data types

I have chosen to test the sorting of only the String, Double and Long (32-bit signed integer) data types.

Source of the unsorted data

The program assumes that the goal is to sort a vector (one-dimensional array) of data and that the source for the data may be any of the following:

The program simulates retrieval of the data from the selected data source by creating the data source and then retrieving the data from the data source.

The simulated source should be deleted when the program terminates normally.

Target for the sorted data

The sorted data is not of much use unless the data can be saved in the intended target. The target may be any of the following:

The program creates the data target and copies the sorted data to the desired target.

The simulated target should be deleted when the program terminates normally.


Program outputs

If you choose to run one sample, the result may look something like the following.

You can already see how badly WordBasic.SortArray performs compared with all other methods listed.

If you have elected to run more than one sample, the result may look something like the following.

When running more than one sample, the time (milliseconds) reported is the arithmetic average (truncated to an integer). In addition, the total time (truncated to an integer) and the number of samples run for each sort method are reported.

If you choose to display the data, the result may look something like the following.

The sorted data displayed is from the last sorting method that was run, in this case, quicksort. If the sorting algorithms are correct, the result should have been the same no matter which algorithms were used.

Log file

The program will create, or append to, a log file named SortPerformanceLog.txt. The file will be located in the same directory in which you installed the program.

The primary purpose of the log file is to record the options used and results for debugging the program.

If the Verify sort box is checked, upon a failure to verify correctly, a file will be recorded in the same directory as the log file. The file will have a name that reflects the date and time the file was created, e.g., SortData-38006.635775463.txt. In combination with the log file, the SortData file allows for debugging of, if any, sort failures.

You may delete these files after running the program, unless you need to report a problem, in which case, I will need both files sent to me in a manner that I will specify as needed.