## Monday, August 13, 2007

### Teaching Heuristics in an Algorithms Class

In my undergraduate class, I devote one class (and one programming assignment) to heuristic search methods. I focus on hill climbing, since it is most clearly connected to other things covered previously (greedy algorithms, the simplex algorithm), but I also discuss other well-known methods, including the Metropolis algorithm, simulated annealing, tabu search, go with the winners, genetic algorithms, and, of course, BubbleSearch.

Generally, algorithms textbooks hardly mention heuristics. At best, heuristics enter the picture when discussing approximation algorithms, as though the fact that we can prove that the greedy local search algorithm for max-cut achieves a cut of at least 1/2 the edges is the important thing. I wish the standard textbooks would devote a solid chapter on heuristic techniques; in real life, most students are probably more likely to have to implement a heuristic for some NP-hard problem than to have to implement a minimum spanning tree algorithm.

The theory purists will undoubtedly suggest that heuristics don't belong in an algorithms class, because generally we can't prove formal statements about them. Obviously, I disagree. Heuristics are closely tied to other class concepts -- like greedy algorithms and approximations. Perhaps most importantly, a successful heuristic approach depends on sound modeling and building an understanding of a problem, which are exactly the important skills we should be teaching in undergraduate algorithms. Finally, we need to give students some basic, realistic algorithmic tools for handling NP-hard problems in practice. (2-approximations do not suffice.) So while I acknowledge that students may be exposed further to these and other heuristic approaches in for example an AI class, I think it's important to clearly connect it to what they're learning in algorithms.

Suresh Venkatasubramanian said...

What about K&T, Chapter 12: Local search

* landscape of an optimization problem
* Metropolis and Simulated Annealing
* An Application of Local Search to Hopfield Networks
* Max cut via local search
and more...

It's not perfect: no EM/k-means-type heuristic, but it's a start.

I'm glad you posted this. I want to throw in one such lecture in my algorithms class as well, but I might have to sacrifice something else..

Suresh Venkatasubramanian said...

p.s could you enable comment feeds ?

Michael Mitzenmacher said...

Suresh,

Chapter 12 of Kleinberg and Tardos is as good as it gets right now, but I think it could have been better. The problem is that the chapter is about local search (the title of the chapter), and not about heuristics. The first problem with this is that they don't discuss heuristic methods in either enough breadth or depth. No mention of several heuristic methods, and the bare-bones stuff they mention about landscape of an optimization problem, Metropolis/Simulated Annealing, etc. covers pretty much what I already had in my notes (which are now a decade old). I wanted something more. The second problem with this is that they've
collapsed together two aspects of local search that would be better separated: local search as a heuristic, and local search as an approach for approximation algorithms. Most of the chapter, it seems to me, is focused on the latter, eclipsing the idea of the utility of local search as a heuristic. They prove factor-of-2 competitiveness theorems, but never give an example of where in practice it gets you 98+% of the way to optimal. Once again, it leaves the impression that theorists just don't "get it"; we're so entranced by proving that we get within a factor of 2, we miss the higher level message that most of the time, you can get very-close-to-optimal solutions.

Suresh Venkatasubramanian said...

fair enough. I'm impressed that you can cover all those topics in one lecture though.

Anonymous said...

I'm not sure I understand what you mean when you say local search as an approx alg should be separated from local search as a heuristic. What is your definition of heuristics? For example, do you consider SVD-based algorithms (many of which have proven performance guarantees, but not in the framework of approximation algorithms) heuristics?

Perhaps the distinction should be between the techniques and proofs.

Michael Mitzenmacher said...

I don't know why there's no link that let's you get comments by RSS. It seems you can add it manually with this URL.

http://mybiasedcoin.blogspot.com/feeds/

This is the RSS URL for the post.

http://mybiasedcoin.blogspot.com/feeds/
posts/default

Anonymous said...

Hi,

I also think that a variety of optimization methods should be taught in algorithms class, even though one cannot prove (much) about them.

There is one caveat though: in many places, an algorithms course has a dual purpose: (i) to introduce algorithmic techniques and (ii) to introduce a rigorous/mathematical framework for understanding algorithms (and serve as a prereq for other theory courses). This makes sense: one can kill two birds with one stone. However, teaching heuristics is somewhat at odds with (ii), so ... here you go.

Piotr

Suresh Venkatasubramanian said...

I think Michael's point is part of his larger theme (espoused in an earlier post on programming) that algorithms folks should not make it seem like we live in an imaginary world divorced from reality.

In the "real world", practitioners use all kinds of heuristics, some more well founded than others. It is also true that many heuristics do quite well in practice (and one can often "explain" this theoretically). By discussing standard heuristics like hill climbing, metropolis etc, that do have good properties, we're both acknowledging that there are limits to worst-case analysis (which none of us will deny), and are also steering people away from total hack methods towards heuristics that have rigorous foundations behind them (if not provable guarantees)

incidentally, I see nothing heuristic about SVD-based methods: they have clearly stated running time bounds and approximation guarantees: (just in terms of different definitions of optimal). k-means on the other hand, has none of these.

Noel said...

What useful things can you say about heuristic methods, other than "they exist" and "here are some examples"?

I think the answer is "not much", but I'd loved to be proven wrong. I know there is work on "perfect sampling" for MCMC, which seems like something useful (and something I should know more about). There is also work analysing GAs, but here it seems that the frameworks amenable to analysis are so much simpler than those used in practice that they may be useless.

Michael Mitzenmacher said...

Noel (and to some extent Piotr),

If you'll forgive the obvious sarcasm, I'd ask:
'What useful things can you say about, say, algorithms for minimum spanning trees, other than "they exist" and "here are some examples"?'

One point here being that I'm not sure what you're getting at with your question. Another point is there's a lot you can say about heuristics -- there are whole books, conferences, and journals on the subject. Obviously, the high-order information is, as you suggest, that these methods exist. Moreover, I would add that for many problems they work amazingly well, and that many heuristics are related to algorithmic techniques we have explored in the class. And then I'd give some examples. That is about all I can do in one lecture. And of course, there's a tradeoff -- there's one less lecture I can give on something else, with more mathematical/theoretical content. But since I'd guess that 80+% of my students are more likely to someday use a heuristic algorithmic approach than to use whatever else I'd fill that lecture with, it seems like the right tradeoff to me.

Noel, you seem to have the mindset I think is too prevalent -- you seem to be equating "useful things you can say" with "things you can rigorously prove". The point is that heuristics are very useful, even if we can't always prove things about them. We should be exposing students to them for their utility -- and, perhaps, to show them the limits of what we know.

Anonymous said...

Mike,

I am not sure how your last comment relates to my post, but I'll take your word for it. Just for the sake of clarity: I am a fan of algorithms whose behavior is not fully understood theoretically, and I often teach them (EM, k-means, Gibbs sampling and whatnot), e.g., in my Computational Biology class.

I also think that teaching them in an undergrad algorithms class is a good idea in principle. All I am pointing out is that undergrad algorithms classes often have dual goals, and teaching heuristics is, to some extent, at odds with one of those goals. Among other things, I believe this difficulty partially explains the lack of heuristics in algorithms curricula. Of course, the "prevalent mindset" you outlined is also to blame.

Probably the best way to resolve this issue is to separate the goals by creating two classes: one focused on algorithmic techniques, and one on rigorous/theoretical analysis of algorithms. At least this is the approach we will soon introduce at MIT. Time will tell how well it works.

Piotr

PS. Noel, there are other things that one can teach about heuristics. E.g., for k-means, one can show a (simple) proof of local optimality (if the clusters are chosen correctly, the solution is optimal). Average case analysis and relatives also provides insights. Finally, there are variants of k-means that have some provable guarantees.

An absolute hit in my experience is a Java applet animation of an algorithm on various inputs - you can almost see people cheering for k-means to do the right thing.

Michael Mitzenmacher said...

Hey Piotr. I was responding to the point you made that people focus on the goal of having to provide a rigorous/mathematical framework, to the point of excluding heuristics -- which seems to be to be a bad balance.

But I'm very excited to hear about this new course structure at MIT, with a class for algorithmic techniques and a class for mathematical analysis of algorithms. I'd love to talk more about it, but perhaps it's something you'd rather discuss offline than in blog comments.

Anonymous said...

One place that a discussion of heuristics fits very naturally is in the 'coping with NP-completeness' part of a standard algorithms course.

While approximation algorithms with provable worst-case guarantees have been all the vogue in algorithms research and we now have precise quantitative results for both hardness and approximability, the 'coping with NP-completeness' issue doesn't end there and that is where heuristics without worst-case guarantees come in.

While this motivation doesn't cover the cases where the data set is so massive that one runs a linear time heuristic because an exact n^3 algorithm would be too costly, it does allow one to cover lots of ground with heuristics.

One class of heuristics for coping with NP-completeness that nobody has mentioned yet, but is very important in practice, is the class of refinements of backtracking searches. These algorithms, which include classic branch-and-bound, but also many search techniques typically associated with AI, are often easily shown to have exponential running time but careful work can keep the exponents low for many practical problems.
The wide-spread practical use of SAT solvers is a common example of the importance of these heuristics.

Suresh Venkatasubramanian said...

I think Paul's comment is exactly right. Coping with NP-Completeness is a good place to introduce heuristics, precisely because it's a place of last resort, where formal approximations can only go so far.

Apart from SAT solvers, the TSP genetic algorithms are another good example of heuristics that are the best implementations in practice.

Chandra said...

The TSP genetic algorithms are
the best in practice? Suresh, could
you tell us more?

Suresh Venkatasubramanian said...

David Johnson's monumental paper on TSP heuristics (http://www.research.att.com/%7Edsj/papers/TSPchapter.pdf)

demonstrates that among the best performing heuristics for TSP (as compared to various lower bounds like the Held-Karp bound) are algorithms that are GA-like: take an existing tour, and flip a piece in the "sequence". For example, the 2-opt operator takes a tour, picks two non adjacent edges u-v and a-b, and reroutes the tour to go

... u - b - a - v ..... if this new tour has improved cost.

Maybe I'm too generous to GAs by calling this a genetic algorithm, but at some level it is indeed a simple cross over operator.

I even wrote a paper with DSJ (and many others) where we used such heuristics to compute large TSPs in Hamming spaces.

Michael Mitzenmacher said...

Suresh--

I might be mistaken, but I believe 2-OPT and 3-OPT are just local search algorithms, not genetic algorithms. What I believe distinguishes genetic algorithms is that instead of repeatedly improving a single solution one maintains at all times a population of solutions, and new solutions are formed by "mating" two (or more?) existing solutions in the population. There are therefore mating operators (subject to mutations and other variations) and generally some operation to cull the population. 2-OPT and 3-OPT don't involve mating, and would just be local search algorithms.

One of the warnings I give my students, which made more sense about a decade ago, was that genetic algorithms were overhyped. These days, I think it's understood that generally standard local search algorithms, if designed with an intelligent neighborhood structure, usually perform as well or better.

My experience has led me to believe, however, that tabu search offers something over standard local search algorithms. The idea (once you state it) seems so obvious it just has to work -- to get the best exploration (of the state space)/exploitation (move toward a local minima) tradeoff, you should somehow encode in your objective function some "memory" of where you've recently been!

Suresh Venkatasubramanian said...

this is correct. However, take a look at the TSP survey. Two broad conclusions are drawn (I quote from the article, and add my own emphasis):

* Assuming one has enough time to run something more sophisticated than a simple tour construction heuristic, one’s first
choice would probably be an efficient implementation of one of the classic local optimization
algorithms 2-Opt, 3-Opt, and Lin-Kernighan....
The last comes within 1.5% of optimal for random Euclidean instances with as many as a
million cities, and it is capable of doing almost as well for the real-world instances in
TSPLIB.

* If shorter tours are wanted and significantly more time is available, both simulated annealing and genetic algorithms can for many instances find better tours
than could be found in the same time by performing multiple independent runs of Lin-
Kernighan.

Noel said...

Interesting discussion. I work in machine learning, and it seems I have a different definition of what makes a heuristic. To me EM (expectation-maximisation) isn't a heuristic as it's guaranteed to find converge to a local optimum. GAs, however, I consider a heuristic as there are no such guarantees, beyond those that are useless in practice (if you search the whole space you will find the optimum). Likewise I consider MCMC a heuristic.

I find algorithms blogs interesting as you seem to live a very different world to me (much more focus on determinism, and very different maths). So coming from my perspective I was genuinely interested in what you algorithm people would say about heuristics. I'm also now interested in what you'd say about EM etc.

Suresh Venkatasubramanian said...

I'll speak for myself :). To me, an efficient algorithm has a provable polynomial running time bound (this can be a probabilistic bound, but it's still quantifiable), and a provable guarantee of quality.

EM fails on both counts: it doesn't have a provably polynomial time bound in the worst-case (although there are now results showing that it converges in poly time on the "average"), and it gives no guarantee on quality. In fact, all that EM promises is that it terminates at a local optimum.

This is not to say I have a problem with EM: it's a great heuristic. Just that I (and most theory people I suspect) would not consider it to be an algorithm. The discussion you're seeing here reflects differences among us on how useful such a heuristic really is in terms of the pedagogy of algorithms.

As an aside, we definitely don't require that algorithms be deterministic.

Anonymous said...

Breathless press on "evolutionary algorithms" taking over the world in newscientist.com.

Anonymous said...

Very interesting and lively discussion.

In the data mining community we often think of the EM as a framework - "design an EM type solution for your particular problem." If EM is used as the approach for estimating the parameters of a mixture distribution then the K-means algorithm is a special-case of EM with spherical covariance matrix.
So EM is in some sense a more general concept than a heuristic.

Also, provided certain separation conditions hold, Sanjoy Dasgupta and also Arora and Kannan have a paper on approximation guarentees on the mixture estimation solution derived using EM.

Anonymous said...

What about biology inspired heuristics : ""Ant colony systems","Bee swarms" , "Spiders","Immune systems","Particle swarms".
Also heuristics are not only used by computers scientists, but are used in all engineering disciplines: electronics, process control, telecommunications, civilian engineering, etc...

Michael Mitzenmacher said...

Anonymous from August 16 -- you bring up a good point. I'm afraid my notes (and I) are showing my age -- I don't discuss the more new-fangled biological heuristics in my class. I should do some study on them and decide if, like the classics, they should be included. Any pointers to a good reference for them would be appreciated.

Scott said...

Michael, it's possible that the bias of theorists you put your finger on is actually just a side-effect of a different bias: namely, the bias for covering "standard material" (where "standard" = wrapped up with a ribbon by the mid-70's at latest) rather than messy, frontier areas where open problems abound.

To me it seems obvious that, the less we understand about an algorithm's behavior, the more interesting that algorithm should be for theorists. If we can't really explain why local search usually gets within 98% of optimal, then so much the better -- all the more work cut out for us in explaining it!

Dan Spielman said...

I, like Mike, devote one lecture
of my algorithms class to heuristics. My favorite being multi-level algorithms and SAT solvers. I think it is amazing that nowadays people reduce problems to SAT so that they can solve them with off-the-shelf SAT solvers! I think it would be irresponsible to teach that SAT is hard without mentioning that it is often easy.

I also try to mention heuristics throughout the course.

While I love rigorous mathematical analysis, I believe that our end goal is to make things that work. I think it is irresponsible for theorists to disparage heuristics merely because we haven't figured out how to analyze them yet. I also think it makes us look stupid.

--Dan Spielman