Friday, March 13, 2009

Thinking Big (Memory)

I gave one of my standard programming assignments this year -- finding minimum spanning trees on big random graphs. It seems for many students this is the first time they've experienced the possibility of not having enough memory using a straightforward implementation; they actually have to think about memory issues.

Year by year I've slowly increased the size of the graphs I require from students. This year, I think I left it too small; some students got away without having to think too hard about the memory issues. I'll have to up it for next year. Part of the problem is that I leave implementation details -- like what language they program in -- up to them. My framework is that I care about them being able to get the results -- I don't care about the details like which language they use. On the other hand, this looseness leads to large variations in how much students have to worry about various things. Java can handle memory issues for you (at the expense of speed); your laptop might exhibit different performance (including whether or not it runs out of memory) than a campus account (which is where the code should eventually be able to run).

One issue that became big this year was students wanting to use Python. I tried to tell them it was a bad idea, but many students like the high-level language and the sleekness of the resulting code. Unfortunately, standard constructs in Python are memory hogs, causing crashes or impossible slowness. Many of the students who started in Python had to eventually rewrite their code in another language -- or failed to deal with graph sizes that the assignment asked for.

I didn't design the programming assignments in my algorithms class to teach students about programming. But I find these exercises really stretch some of their programming skills, making them think about questions (what's the best language to use? what's the bottleneck here -- time or memory? how can I debug/test my code effectively?) that they need to learn to think about sometime, and perhaps aren't dealt with sufficiently in introductory classes any more.

16 comments:

Thore Husfeldt said...

Michael, how big is big? I’m polishing a set of programming exercises myself, and MST is one of them. What do we want the assignment to focus on? If the graph has only a hundred nodes or so, it’s about the greedy algorithm, and maybe about priority queues and connected components. When is the clever union–find data structure worth implementing, and do I want that to be the focus? — Thore Husfeldt

Pall Melsted said...
This comment has been removed by the author.
Michael Mitzenmacher said...

Pall (and Thore, who I've answered) -- if you have specific questions on the assignment, I'd ask you send them to me personally. I'll reuse this assignment or variations in the future -- don't go giving away all the secrets! :)

Pall Melsted said...

Just out of curiosity, how big is big these days? Second what random graph/weights did you use and did anyone use the fact that it was random to speed up their algorithm?

Anonymous said...

Can you post a link to your assignment so that we can see what it is?

Arvind Narayanan said...

Re. python: that's not true! Python is only a memory hog if you make rookie errors like using lists instead of arrays for storing numerical data. Furthermore, there's the numerical python package, which, with a little bit of experience, can execute at nearly the speed of C as well! I've done several complex data mining projects in python, including algorithms on large graphs; my most recent one which had memory-saving hacks like counters starting out as 8-bit and then increasing in size as their values got bigger. This turned out to be way easier to do than it would be in C. If you're telling them python is a bad idea you're really doing them a disservice.

Michael Mitzenmacher said...

Aravind--

I'm going to just disagree with you. I was quite clear with what I said -- standard constructs in Python are memory hogs. Arrays are -- at least from what I see of what students know -- actually already a "non-standard" Python structure, since most students think everything can be done with lists. Most students haven't seen or used numpy either.

I'm not teaching a class on Python. I don't tell students they can't use Python, and I'm aware the assignment could be done with Python. As I've stated, in my experience, many students like Python but don't really know it well enough to do this assignment properly. I don't think I'm doing any disservice by giving warning them the problems they'll probably face ahead of time.

(Those that were successful did do things like learn to use numpy -- at which point their code was almost the same as in C anyway...)

Anonymous said...

(Those that were successful did do things like learn to use numpy -- at which point their code was almost the same as in C anyway...)

You're just saying words now, aren't you?

Anonymous said...

I don't think I'm doing any disservice by giving warning them the problems they'll probably face ahead of time.

I think that the disservice is if you decline to follow up with: "there are two relatively simple solutions if you choose to use them: numpy and/or the array module."
Students have had success with those in the past.

If you choose to guide them away from Python but choose not to guide them in its proper use (really, just giving them one word: "numpy") then I think that you're letting your bias get in the way of proper teaching. One does not have to teach a Python class to mention the word "numpy".

maacl said...

"I tried to tell them it was a bad idea" - you still fail to explain why you would issue such a warning. Any student using any language the performance characteristics of which they do not know run the risk of failing such an assignment. This has nothing to do with Python, with which the problem type in question can be solved quite elegantly.

Arvind Narayanan said...

I don't know what you mean "non-standard." Arrays are in the standard library, and python documentation emphasizes the need to learn the standard library and even has a "batteries included" slogan to drive home the point. If students don't know about it, that's exactly like using C, not looking into the standard library and implementing printf() yourself. Sorry to carp on this, but it's a bit disturbing that your students think everything should be done with lists. One needs to learn higher level programming languages to be a good computer scientist; why not encourage them to learn Python instead of taking the path of least resistance?

Michael Mitzenmacher said...

Arvind --

I'm not teaching a programming class, nor am I pretending to. Python is not one of the languages covered in depth in their intro programming classes -- they've at best had exposure to it in those classes, and some of them have seen/used it elsewhere and liked it. But I'm aware most of them aren't proficient enough in the language to do the assignment in it, so I warn them (appropriately).

One of the things many people who use Python learn in this exercise is that they don't know enough about Python. That alone is a worthwhile learning experience.

But really, I do wish you'd stop criticizing me for not teaching something I make no claim to be teaching.

clin said...

Arvind makes an interesting point. Just because students find a programming language inadequate to solve a particular problem doesn't imply that the language is the problem. As he points out, it could simply be a lack knowledge about that programming language.

I think the reason he's debating this is because you argued, in your post, that Python was bad for doing the project, and he argued that, if properly used, it's not that bad.

Michael Mitzenmacher said...

Clin --

I think my statements were quite clear. I stated "...standard constructs in Python are memory hogs..." Arvind disagreed, and we found the point of our disagreement was whether arrays and numpy would be considered "standard" for python. In my experience, with students here, they're not. (Heck, see things like http://docs.python.org/tutorial/ , which is the top hit from Google when you search for "python tutorial", and tell me how many times you find the word "array" on that page, vs. how many times you find the word "list". Hint: "array" appears 0 times...) I also stated quite clearly, "...I'm aware the assignment could be done with Python...", so I think we've agreed on that point at that level, and I don't think we were debating that point further.

Aravind expressed dismay students didn't know how to use Python properly. I would agree -- but that's why I warn them and suggest they not use it. He ended by asking, "...why not encourage them to learn Python instead of taking the path of least resistance?" I found this to be inappropriately critical. The answer I thought was clear in my previous statement -- I'm not teaching a class on Python, I don't expect them to learn or have to learn Python, that's not the point of the assignment or the class.

Arvind Narayanan said...

Sorry, sometimes we python fanboys can get a little defensive. Interestingly, you spell my name incorrectly on alternate posts :-)

Michael Mitzenmacher said...

Apologies, Arvind -- I'll pay more attention.

And no offense taken (or meant) by your comments -- I was just defending myself!