Sunday, September 07, 2014

Teaching Sorting

For many years now, while teaching the introductory Algorithms and Data Structures, I have told students that we won't be starting with (or really covering) sorting and searching, because it's boring.  While that description is an exaggeration (the students see mergesort [an example of divide and conquer] and heapsort [heaps are important as an implementation of priority queues for Dijkstra's algorithm]), I've never thought that staring this class with a long unit on sorting/searching algorithms, which most textbooks like to start with, is the best use of time.

So it was a bit odd to find myself for our new "honors course", CS 125, spending the first day-and-a-half or so on sorting algorithms.  The key was that there was the right motivation for it.  We'll be tackling both algorithms/data structures and complexity, and the theme of how to model computation will play a big role throughout the course.  When we thought about what was the right way to introduce the class, and in particular this theme, sorting seemed like the right choice.

After very quickly refreshing the students on some basic comparison sorts (bubblesort and mergesort), we present the standard lower bound argument for comparison-based sorts.  But then we show how this lower bound can be broken, by assumptions about the data that allow one to go beyond comparisons to other operations.  Specifically, we raced through counting sort, bucket sort (for random data), and sorting via van Emde Boas trees.    (The students in particular seemed to be talking through the details of van Emde Boas trees at the class break, which I'd like to think because they were new and interesting, but possibly was due to the fact that it was the first time I'd ever presented them, and maybe my delivery for that topic needs some tuning.) 

Lo and behold, I find myself finding sorting interesting!  We didn't even get to more advanced interesting possibilities, like data-oblivious sorting or "bleeding edge" parallel sorting.  I feel I need to apologize to sorting, for mistakenly suggesting that it's a boring topic.  I still wouldn't want to start an algorithms/data structures class with multiple weeks of sorting and searching algorithms, but sorting is a natural way to introduce some key concepts in algorithms, and there's fun to be had in the large variety of approaches to sorting.


Anonymous said...

Student here. The students in my immediate vicinity were chatting about VEB trees because we'd never seen them before and they were interesting, not because we were hopelessly lost. (On the whole, I for one think your presentation of them was excellent.) The discussion, of course, turned up a few things we realized that we didn't understand, which we were able to clear up after the break.

If the structure of [ (1) present interesting novel topic (2) have short break (3) reconvene, and answer any questions that came to light over the break ] wasn't intentional, I would highly suggest trying it again. On Thursday, at least, it worked really well.

Anonymous said...

Sorting remains valuable for other interesting reasons: ask your students why Quicksort is often the preferred sorting method (e.g., in the C/C++ libraries) while it's only O(n log n) in the avg case -- over Heapsort, which is worst-case O(n log n) with small constants... there are some interesting lessons to learn there..

Jouni Sirén said...

I have started to think that sorting is the most useful topic covered in a basic algorithms and data structures course.

The traditional topics on such courses - dynamic data structures using pointers in the RAM model - work well when the data structures are not much bigger than the L3 cache. When the amount of data grows, the RAM model and the data structures and algorithms designed for it may no longer be that relevant. With larger datasets, using arrays of tuples as the fundamental data structure, with scanning, sorting, and joins as the basic operations, is often easier and more efficient.

Anonymous said...

After explaining about introspection sorting - you may want them to explore other hybrid sorts and whther they are useful/practical or not and explain why.