The people at Microsoft Research New England are excited to announce that Boaz Barak will be joining them. I imagine the situation is similar to their hiring of Madhu Sudan. They continue to build up a remarkable collection of researchers.
Harvard's Allston complex is officially suspended.
JeffE doesn't post so much these days, so you might have missed this great post with his favorite useless theorem. (I hope this starts a series of "Favorite Useless Theorems" -- please send him, or me, if you have examples of your own.)
Nela Rybowicz, staff senior editor for IEEE (and who I've dealt with for pretty much all of my Transactions on Information Theory papers), informed me that she'll soon be retiring. She'll be missed.
Thursday, December 10, 2009
Wednesday, December 09, 2009
TSA Oops
You may have heard on the news that the TSA (temporarily) put up their Screening Management Standard Operating Procedure on the web. As pointed out on this blog (and elsewhere), they made a small error:
"So the decision to publish it on the Internet is probably a questionable one. On top of that, however, is where the real idiocy shines. They chose to publish a redacted version of the document, hiding all the super-important stuff from the public. But they apparently don’t understand how redaction works in the electronic document world. See, rather than actually removing the offending text from the document they just drew a black box on top of it. Turns out that PDF documents don’t really care about the black box like that and the actual content of the document is still in the file."
Oops. Apparently, somebody hasn't read Harry Lewis's book Blown to Bits; the start of Chapter 3 discusses several similar cases where somebody thought they had redacted something from a PDF file... but didn't.
"So the decision to publish it on the Internet is probably a questionable one. On top of that, however, is where the real idiocy shines. They chose to publish a redacted version of the document, hiding all the super-important stuff from the public. But they apparently don’t understand how redaction works in the electronic document world. See, rather than actually removing the offending text from the document they just drew a black box on top of it. Turns out that PDF documents don’t really care about the black box like that and the actual content of the document is still in the file."
Oops. Apparently, somebody hasn't read Harry Lewis's book Blown to Bits; the start of Chapter 3 discusses several similar cases where somebody thought they had redacted something from a PDF file... but didn't.
Sunday, December 06, 2009
Faculty Ratios
Perhaps the most basic breakdown one can make of a college faculty is into the categories Natural Sciences / Social Sciences / Humanities. (Yes, one could naturally think of "engineering" as a fourth category separate from "natural sciences"; feel free to do so.) So here are some basic questions:
1. What is the faculty breakdown in your institution into these three categories?
2. Do you think that breakdown is appropriate?
3. What do you think the breakdown will be (or should be) ten years from now? Twenty years from now?
4. Do you think your administration is thinking in these terms? Do you think they have a plan to get there? (And if so, are they open about it -- is the faculty involved in these decisions?)
[These questions were inspired by a comment of Harry Lewis over at Shots in the Dark.]
1. What is the faculty breakdown in your institution into these three categories?
2. Do you think that breakdown is appropriate?
3. What do you think the breakdown will be (or should be) ten years from now? Twenty years from now?
4. Do you think your administration is thinking in these terms? Do you think they have a plan to get there? (And if so, are they open about it -- is the faculty involved in these decisions?)
[These questions were inspired by a comment of Harry Lewis over at Shots in the Dark.]
Friday, December 04, 2009
Retirement
No, not for me. But Harvard has announced its plans to encourage faculty to retire. I won't call it an "early retirement" package, since it is for people over 65. Though I suppose that is early for academia.
Note that (according to the article) Harvard's Faculty of Arts and Sciences will have 127 offers to its 720 (junior and senior) faculty. So I conclude (as of next year) 1/6 of Harvard's faculty would be over 65. And the article states that the average age of tenured Harvard professors is 56. Can people from elsewhere tell me if this is unusual? Harvard has a reputation for having an older faculty; this seems to confirm it. Does his suggest a lack of planning somewhere along the line? Or is this a good thing?
I don't expect drastic changes arising from this plan; it will be interesting to see how many faculty take the offer. In general, however, is it a good idea for a university to "encourage" retirement for older faculty? And if so, what means should they use to do it?
Viewed as an optimization problem, one can ask what is the "best" distribution of faculty ages at a university, and what mechanisms could (and should) be used to maintain that distribution?
Thursday, December 03, 2009
Tight Thresholds for Cuckoo Hashing via XORSAT
As I promised sometime back, we now have a paper (up on the arxiv) giving the thresholds for cuckoo hashing, a problem that had been open and was part of my survey (pdf) on open problems in cuckoo hashing.
One interesting thing about our paper is that our (or at least "my") take is that, actually, the thresholds were right there, but people hadn't put all the pieces together. The argument is pretty easy to sketch without any actual equations.
In cuckoo hashing, we have n keys, m > n buckets, and each key is hashed to k possible buckets. The goal is to put each key into one of its k buckets, with each bucket holding at most one key. We can represent this as a random hypergraph, with each key being a hyperedge consisting of k buckets, represented by vertices; our goal is to "orient" each edge to one of its vertices. A natural first step is to repeatedly "peel off" any vertex that doesn't already have an edge oriented to it and has exactly one adjacent unoriented edge, and orient that edge toward it. At the end of this process, we're left with what is called the 2-core of the random hypergraph; every vertex has 2 adjacent edges. Can we orient the remaining edges of the 2-core? In particular, we're interested in the threshold behavior; as m,n grow large, what initial ratio m/n is required to have a suitable mapping of keys to buckets with high probability. (This corresponds to the memory overhead of cuckoo hashing.)
Now consider the random k-XORSAT problem, where we have m variables and n clauses, where each clause is randomly taken from the possible clauses
x_{a_1} + x_{a_2} + ... + x_{a_k} = b_a.
Here the x_{a_i} are distinct variables from the m variables, b_a is a random bit, and addition is modulo 2. The question is whether a random k-XORSAT problem has a solution. Again, we can put this problem in the language of random hypergraphs, with each clause being a hyperedge of k variables, represented by vertices. Again, we can start by oriented edges to vertices as we did with the cuckoo hashing representation; here, a clause has an associated orientation to a variable if that variable is "free" to take on the value to make sure that clause is satisfied. Is the remaining formula represented by the 2-core satisfiable?
The correspondence between the two problems is almost complete. To finish it off, we notice (well, Martin Dietzfelbinger and Rasmus Pagh noticed, in their ICALP paper last year) that if we have a solution to the k-XORSAT problem, there must be a permutation mapping keys to buckets in the corresponding cuckoo hashing problem. This is because if the k-XORSAT problem has a solution, the corresponding matrix for the graph has full rank, which means there must be a submatrix with non-zero determinant, and hence somewhere in the expansion of the determinant there's a non-zero term, which corresponds to an appropriate mapping.
And the threshold behavior of random k-XORSAT problems is known (using the second moment method -- see, for example, this paper by Dubois and Mandler.)
While in some sense it was disappointing that the result was actually right there all along, I was happy to nail down the threshold behavior. In order to add something more to the discussion, our paper does a few additional things.
First, we consider irregular cuckoo hashing (in the spirit of irregular low-density parity-check codes). Suppose that you allow that, on average, your keys have 3.5 buckets. How should you distribute buckets to keys, and what's the threshold behavior? We answer these questions.
Second, what if you have buckets that can hold more than one key? We have a conjecture about the appropriate threshold, and provide evidence for it, using a new fast algorithm for assigning keys to buckets (faster than a matching algorithm).
I should point out that since I originally "announced" we had this result two other papers have appeared that also have nailed down the cuckoo hashing threshold (Frieze/Melsted ; and Fountoulakis/Panagiotou). Each seems to have different takes on the problem.
One interesting thing about our paper is that our (or at least "my") take is that, actually, the thresholds were right there, but people hadn't put all the pieces together. The argument is pretty easy to sketch without any actual equations.
In cuckoo hashing, we have n keys, m > n buckets, and each key is hashed to k possible buckets. The goal is to put each key into one of its k buckets, with each bucket holding at most one key. We can represent this as a random hypergraph, with each key being a hyperedge consisting of k buckets, represented by vertices; our goal is to "orient" each edge to one of its vertices. A natural first step is to repeatedly "peel off" any vertex that doesn't already have an edge oriented to it and has exactly one adjacent unoriented edge, and orient that edge toward it. At the end of this process, we're left with what is called the 2-core of the random hypergraph; every vertex has 2 adjacent edges. Can we orient the remaining edges of the 2-core? In particular, we're interested in the threshold behavior; as m,n grow large, what initial ratio m/n is required to have a suitable mapping of keys to buckets with high probability. (This corresponds to the memory overhead of cuckoo hashing.)
Now consider the random k-XORSAT problem, where we have m variables and n clauses, where each clause is randomly taken from the possible clauses
x_{a_1} + x_{a_2} + ... + x_{a_k} = b_a.
Here the x_{a_i} are distinct variables from the m variables, b_a is a random bit, and addition is modulo 2. The question is whether a random k-XORSAT problem has a solution. Again, we can put this problem in the language of random hypergraphs, with each clause being a hyperedge of k variables, represented by vertices. Again, we can start by oriented edges to vertices as we did with the cuckoo hashing representation; here, a clause has an associated orientation to a variable if that variable is "free" to take on the value to make sure that clause is satisfied. Is the remaining formula represented by the 2-core satisfiable?
The correspondence between the two problems is almost complete. To finish it off, we notice (well, Martin Dietzfelbinger and Rasmus Pagh noticed, in their ICALP paper last year) that if we have a solution to the k-XORSAT problem, there must be a permutation mapping keys to buckets in the corresponding cuckoo hashing problem. This is because if the k-XORSAT problem has a solution, the corresponding matrix for the graph has full rank, which means there must be a submatrix with non-zero determinant, and hence somewhere in the expansion of the determinant there's a non-zero term, which corresponds to an appropriate mapping.
And the threshold behavior of random k-XORSAT problems is known (using the second moment method -- see, for example, this paper by Dubois and Mandler.)
While in some sense it was disappointing that the result was actually right there all along, I was happy to nail down the threshold behavior. In order to add something more to the discussion, our paper does a few additional things.
First, we consider irregular cuckoo hashing (in the spirit of irregular low-density parity-check codes). Suppose that you allow that, on average, your keys have 3.5 buckets. How should you distribute buckets to keys, and what's the threshold behavior? We answer these questions.
Second, what if you have buckets that can hold more than one key? We have a conjecture about the appropriate threshold, and provide evidence for it, using a new fast algorithm for assigning keys to buckets (faster than a matching algorithm).
I should point out that since I originally "announced" we had this result two other papers have appeared that also have nailed down the cuckoo hashing threshold (Frieze/Melsted ; and Fountoulakis/Panagiotou). Each seems to have different takes on the problem.
Tuesday, November 24, 2009
4-year Masters
Harvard, like many other places, has an option by which students (with "Advanced Standing" from AP classes) can obtain a Master's (in some programs) as well as their undergraduate degree in 4 years. The School of Engineering and Applied Sciences, and CS in particular, offers this option.
Every year some students I advise are interested in and want advice about the program. My first question back is always why do they want to do it -- what do they think they'll get out of it? Often, they don't have a good reason; it seems they just see it as an opportunity to get an additional degree, which I don't think is a particularly good reason in and of itself. (Harvard students are, after all, high-achievers and trained to respond to such incentives.) Some possible reasons that come up include:
1) They're going into industry, and they believe that the Master's will give them a higher starting salary (which may, over a lifetime, translate into a substantial benefit).
2) They're going into graduate school, and believe the degree will allow them to move faster through their graduate program, or at least allow them to take fewer classes.
3) They're coming from another area, like economics or chemistry, but have found late in the game that they like computer science, and would like to have a credential in that area in case that is where they move in terms of their career.
4) Their parents have figured out that they can technically graduate in three years and are unwilling to pay for four, but can be convinced to pay for it if there is a Masters degree involved.
Are there other reasons? And how good are these reasons? Is there still a "Master's premium" in starting salaries, even for a "4th year" class-based Master's program? Does starting with a Master's of this kind really get you through graduate school faster anywhere? (Personally, I already think graduate students don't take enough classes, so I'm biased against reason 2.) The third reason -- a student coming in from another field -- is reasonable, but Harvard does now have minors (called "secondary concentrations" here) instead. Pressuring parents is not a reason I can throw support behind, but I can certainly understand it. Maybe it's the best reason on my current list.
Of course, it's important to consider what, if any, are the downsides. Here's my starting list:
1) A loss of flexibility in choosing classes. Doing the two degrees in four years saddles a student with so many requirements they lose the chance to take that art or history class, learn a language, or do some other exploratory things in college. Isn't that part of what college is supposed to be about?
2) It often ends up taking the place of other college experiences, like doing a senior thesis or other research. For students who want to go to graduate school, I'd personally recommend doing a senior thesis over a 4th year master's degree, so that they can begin to get some insight into how research works -- and if they'll really like it.
3) It can be hard. Most graduate courses have large-scale final projects; trying to 6-8 such courses in two or three semesters can be a real challenge, and is certainly a time-commitment.
As always, my biggest goal in advising students on such matters is to make sure they're well-informed and have thought through the various implications of their choices. I'd appreciate any thoughts anyone has on the matter, either from their own personal experience or their experience with students who have done such programs.
Every year some students I advise are interested in and want advice about the program. My first question back is always why do they want to do it -- what do they think they'll get out of it? Often, they don't have a good reason; it seems they just see it as an opportunity to get an additional degree, which I don't think is a particularly good reason in and of itself. (Harvard students are, after all, high-achievers and trained to respond to such incentives.) Some possible reasons that come up include:
1) They're going into industry, and they believe that the Master's will give them a higher starting salary (which may, over a lifetime, translate into a substantial benefit).
2) They're going into graduate school, and believe the degree will allow them to move faster through their graduate program, or at least allow them to take fewer classes.
3) They're coming from another area, like economics or chemistry, but have found late in the game that they like computer science, and would like to have a credential in that area in case that is where they move in terms of their career.
4) Their parents have figured out that they can technically graduate in three years and are unwilling to pay for four, but can be convinced to pay for it if there is a Masters degree involved.
Are there other reasons? And how good are these reasons? Is there still a "Master's premium" in starting salaries, even for a "4th year" class-based Master's program? Does starting with a Master's of this kind really get you through graduate school faster anywhere? (Personally, I already think graduate students don't take enough classes, so I'm biased against reason 2.) The third reason -- a student coming in from another field -- is reasonable, but Harvard does now have minors (called "secondary concentrations" here) instead. Pressuring parents is not a reason I can throw support behind, but I can certainly understand it. Maybe it's the best reason on my current list.
Of course, it's important to consider what, if any, are the downsides. Here's my starting list:
1) A loss of flexibility in choosing classes. Doing the two degrees in four years saddles a student with so many requirements they lose the chance to take that art or history class, learn a language, or do some other exploratory things in college. Isn't that part of what college is supposed to be about?
2) It often ends up taking the place of other college experiences, like doing a senior thesis or other research. For students who want to go to graduate school, I'd personally recommend doing a senior thesis over a 4th year master's degree, so that they can begin to get some insight into how research works -- and if they'll really like it.
3) It can be hard. Most graduate courses have large-scale final projects; trying to 6-8 such courses in two or three semesters can be a real challenge, and is certainly a time-commitment.
As always, my biggest goal in advising students on such matters is to make sure they're well-informed and have thought through the various implications of their choices. I'd appreciate any thoughts anyone has on the matter, either from their own personal experience or their experience with students who have done such programs.
Friday, November 20, 2009
In Reverse
The Crimson is reporting that the Harvard Faculty of Arts and Sciences will plan to decrease the size of the faculty* in response to budget woes. The key point seems to be that "There are now 720 associate professors and professors in FAS, an increase of 20 percent since 2000." The reductions will occur in the standard way -- not filling some positions after retirement, and offering some sort of early retirement package for faculty. (Harvard has, comparatively, a much older faculty than most institutions. Here's a 2004 Crimson article on the subject that pointed out that 7% of the tenured faculty was over age 70.)
I admit, I'd like to see some concrete statements that there's to be a similar if not more extensive effort to decrease the size of administration, although to be fair some of that has also been occurring.
* The first comment, by one menckenlite, to the article seems so funny I have to quote it here:
"Disappointed to learn that Harvard is just reducing the number of faculty members. I thought they were going to get smaller professors so that they did not overload the sidewalks and streets with their over sized egos and girths."
I admit, I'd like to see some concrete statements that there's to be a similar if not more extensive effort to decrease the size of administration, although to be fair some of that has also been occurring.
* The first comment, by one menckenlite, to the article seems so funny I have to quote it here:
"Disappointed to learn that Harvard is just reducing the number of faculty members. I thought they were going to get smaller professors so that they did not overload the sidewalks and streets with their over sized egos and girths."
Wednesday, November 18, 2009
Job Advice?
Not for me, thank goodness.
A very talented graduating senior (who may or may not be Harvard...) has obtained a number of job offers, and asked me if I had any advice on what job would make the best place to start a career -- or look best on a resume. (Similar questions come up most every year.) I explained that besides having some potential conflicts of interest, I was removed enough from the job circuit these days to not have any useful advice. But I could ask others....
Let's consider a number of possible jobs a talented student might easily obtain:
Software Developer at Google
Program Manager at Microsoft
Developer at Facebook
Entry-level position at a tech-oriented "boutique" consulting firm
Something else you'd like to suggest
What advice would you give them on what to choose? Or how to choose, which is probably more useful?
A warning to students: free advice is often worth what you pay for it....
A very talented graduating senior (who may or may not be Harvard...) has obtained a number of job offers, and asked me if I had any advice on what job would make the best place to start a career -- or look best on a resume. (Similar questions come up most every year.) I explained that besides having some potential conflicts of interest, I was removed enough from the job circuit these days to not have any useful advice. But I could ask others....
Let's consider a number of possible jobs a talented student might easily obtain:
Software Developer at Google
Program Manager at Microsoft
Entry-level position at a tech-oriented "boutique" consulting firm
Something else you'd like to suggest
What advice would you give them on what to choose? Or how to choose, which is probably more useful?
A warning to students: free advice is often worth what you pay for it....
Monday, November 16, 2009
Conference Reviewing Update
A few weeks ago, I talked about the reviewing process for NSDI and LATIN. I suppose now is a reasonable time for an update.
LATIN is nearing the end of the reviewing process. I think it went well -- my top ranked papers seem to be being accepted, my low ranked papers are not. There's been some electronic discussion of papers where there was wide disagreement, but we're not having an on-site PC meeting, and overall there's been surprisingly little discussion on my papers. Because LATIN is a "2nd tier" conference, I had previously suggested that I expected there would be some wide deviations among review scores, "corresponding to different opinions about how interesting something is". There were in fact some wide scoring discrepancies, though this may not have been the primary reason. I was a reviewer on multiple papers where one reviewer really didn't seem to "get" the paper -- in most cases, ranking it high when I thought the ranking should be lower. (I imagine the scores will change before the reviews come back to the authors.) I've seen similar problems even in other, stronger theory conferences -- selecting 3 reviewers who are expert on the subject of paper in a broad theory conference is very difficult to consistently get right, especially when subreviewers come into play -- though I think it was more problematic here, where the papers are weaker on average in any case. Finally, I still don't like the easychair interface that much.
The NSDI reviews have been, for me, substantially more interesting, no doubt in part because the papers are more interesting (to me). The "first round" is nearing the end, and at least on my papers, the review scores are remarkably consistent. In cases where they aren't consistent, there's usually a clear reason for the discrepancy that comes out in the reviews, which tend to be longer and more detailed for systems conferences. While that's all very satisfying, at this point I'm hoping to be offered some dramatically more controversial papers to look at for Round 2, or I'll be finding the PC meeting pretty boring. (I should note I have a paper submitted to NSDI, so I reserve the right to either completely trash the reviewing system, or sing its praises ever-higher, depending on the eventual outcome.) Finally, I still like the hotcrp interface a lot.
I get asked to serve on a number of PCs, and usually, I make efforts to serve, because I believe such service is important to the community. But I must say, doing these two at roughly the same time has led me to think I'll be more circumspect in the future. The older I get, the more precious time seems to become, and perhaps I've just reached a point where I think I've done enough PC service that I can be choosier about what I agree to, aiming for things that are more enjoyable. At the same time, I wouldn't want everyone to start acting that way, since then I imagine it would be tough to get good PCs together for the many conferences we have.
LATIN is nearing the end of the reviewing process. I think it went well -- my top ranked papers seem to be being accepted, my low ranked papers are not. There's been some electronic discussion of papers where there was wide disagreement, but we're not having an on-site PC meeting, and overall there's been surprisingly little discussion on my papers. Because LATIN is a "2nd tier" conference, I had previously suggested that I expected there would be some wide deviations among review scores, "corresponding to different opinions about how interesting something is". There were in fact some wide scoring discrepancies, though this may not have been the primary reason. I was a reviewer on multiple papers where one reviewer really didn't seem to "get" the paper -- in most cases, ranking it high when I thought the ranking should be lower. (I imagine the scores will change before the reviews come back to the authors.) I've seen similar problems even in other, stronger theory conferences -- selecting 3 reviewers who are expert on the subject of paper in a broad theory conference is very difficult to consistently get right, especially when subreviewers come into play -- though I think it was more problematic here, where the papers are weaker on average in any case. Finally, I still don't like the easychair interface that much.
The NSDI reviews have been, for me, substantially more interesting, no doubt in part because the papers are more interesting (to me). The "first round" is nearing the end, and at least on my papers, the review scores are remarkably consistent. In cases where they aren't consistent, there's usually a clear reason for the discrepancy that comes out in the reviews, which tend to be longer and more detailed for systems conferences. While that's all very satisfying, at this point I'm hoping to be offered some dramatically more controversial papers to look at for Round 2, or I'll be finding the PC meeting pretty boring. (I should note I have a paper submitted to NSDI, so I reserve the right to either completely trash the reviewing system, or sing its praises ever-higher, depending on the eventual outcome.) Finally, I still like the hotcrp interface a lot.
I get asked to serve on a number of PCs, and usually, I make efforts to serve, because I believe such service is important to the community. But I must say, doing these two at roughly the same time has led me to think I'll be more circumspect in the future. The older I get, the more precious time seems to become, and perhaps I've just reached a point where I think I've done enough PC service that I can be choosier about what I agree to, aiming for things that are more enjoyable. At the same time, I wouldn't want everyone to start acting that way, since then I imagine it would be tough to get good PCs together for the many conferences we have.
Subscribe to:
Posts (Atom)