Monday, November 21, 2011

Round Two Reviewing : An Exercise in Conditional Probabilities

We're in "round 2" of reviews for NSDI, and it's brought up a problem for me I've noticed before.  I worry that, subconsciously, I'm inclined to give papers I read on the second round a higher score, since I'm swayed by the fact that they've in fact made it to the second round.

I wonder if anyone in the PC world has done any testing of this to see if it's a real phenomenon.  Are second round reviews on a set of papers statistically different from the first round of reviews?  I would bet yes, even controlling for the fact that the papers made it to the second round.  In particular, I'd suspect it's much harder for people to give a score of 1 (=reject) to a second round paper.

One could imagine attempting to test for this by sticking a few obvious rejects from the first round into the second round reviews.  Indeed, perhaps one should make this part of the process:  randomly select a few clear rejects to go into round two, and announce that you're doing this to the PC.  Then they might not feel so averse to assigning a score of 1 in the second round.

One joy in the second round reviews is once you submit a review you get to see the first round reviews.  So far, I feel I've been calling them fairly;  when I haven't liked a second round paper, the first round reviews seem to confirm my opinion.  So perhaps (with some effort) I'm keeping my subconscious at bay successfully, and not conditioning on the fact that it's a round 2 review.

Wednesday, November 16, 2011

Public University Budgets

Another topic that arose in conversations during my visit to Wisconsin was the issue of budgets, and in particular the large-scale cuts that many of the best US public school are having to deal with.  It's not hard to find information on this.  My first search on Google yielded this article about what's going on in Wisconsin (taking choice quotes, not the full article;  it's from June).

Wis. Gov. signs budget cutting education \$1.85B

Democrats assailed the budget as an attack on middle class values since it cuts funding for public schools by \$800 million, reduces funding to the UW system by \$250 million and cuts tax credits for poor people.

It also reduces the amount schools can collect from property taxes and other revenue combined, which translates into another education cut of about \$800 million. While schools are seeing deep cuts, Walker's budget extends tax breaks to manufacturers, multistate corporations and investors.

"As a state, we can choose to take the easy road and push off the tough decisions and pass the buck to future generations, or we can step up to the plate and make the tough decisions today," Walker said in prepared remarks. "Our budget chooses to fix our problems now, so that our children and our grandchildren don't face the same challenges we face today."
I'm sure the children and grandchildren he talks about, who will have to face the new challenge of increased global competition with an increasingly better educated non-Wisconsin population instead of the challenges being faced today, will be very appreciative.

Public universities generally have been faring quite badly in the current financial crisis.  I have a deep pro-education bias, unsurprisingly, so I find this depressing.  But also, in my mind, it's just not sound financial sense.  I believe these cuts today will yield a corresponding decline in Wisconsin's economy tomorrow, for some appropriate notion of tomorrow.  A dollar spent on education should be worth... well, I don't know how much it should be worth, but my guess is the multiplier on the dollar is pretty high.  I'd like more information to back that up.  If you know of any studies that demonstrate the payoff for education -- the sort of thing all of us in the education field should have on hand when discussions like this come up -- please leave them in the comments.  It would be nice to have a collection handy.

Tuesday, November 15, 2011

Yes, We Are Hiring (2012)

Harvard CS will be hiring this year.

One tenure-track position is geared toward systems, very broadly defined.

A second tenure-track position is in Applied Math, where we're aiming for a "discrete applied math" person.  The right CS theory person could fit just fine.  And if we see great CS theory people who really want to be in CS rather than AM, we should be able to find a way to make that work.

Tenure-Track Positions in Computer Science and Applied Mathematics

The Harvard School of Engineering and Applied Sciences (SEAS) seeks applicants for positions at the level of tenure-track assistant professor in the fields of Computer Science and Applied Math/Computer Science, with an expected start date of July 1, 2012.

Candidates are required to have a PhD or an equivalent terminal degree, or to be able to certify that they will receive the degree within one year of the expected start date.  In addition, we seek candidates who have an outstanding research record and a strong commitment to undergraduate teaching and graduate training.

Position 1:   Computer Science.  We welcome outstanding applicants in all areas of computer science. We are particularly interested in systems, broadly defined, including compilers, programming languages, distributed systems, databases, networking, and operating systems.  Applicants will apply online at http://academicpositions.harvard.edu/postings/3825.

Position 2:  Applied Math/Computer Science. We welcome outstanding applicants in all areas of applied mathematics or theoretical computer science. We are particularly interested in topics at the boundary or intersection of these fields, including optimization, applied probability, scientific computing, combinatorics and graph theory, approximation algorithms, and numerical analysis. Applicants will apply on-line at http://academicpositions.harvard.edu/postings/3824.

In terms of applications, areas of interest include computational science, engineering, or the social sciences. We encourage applications from candidates whose research examines computational issues raised by very large data sets or massively parallel processing.

The Computer Science and Applied Mathematics programs at Harvard University benefit from outstanding undergraduate and graduate students, an excellent location, significant industrial collaboration, and substantial support from the Harvard School of Engineering and Applied Sciences.  Information about Harvard's current faculty, research, and educational programs is available at http://www.seas.harvard.edu.

Required documents include a CV, a statement of research and teaching interests, up to three representative papers, and names and contact information for at least three references.

Applications will be reviewed as they are received. The review of applications will begin on December 15, 2011, and applicants are strongly encouraged to submit applications by that date; however, applications will continue to be accepted at least until January 15, 2012.

Harvard is an Equal Opportunity/ Affirmative Action Employer.  Applications from women and minority candidates are strongly encouraged.

Monday, November 14, 2011

Public Salary Information

While visiting Wisconsin last week (enjoying very pleasant company and conversation), various issues came up.

For one, I was reminded (or recalled) that as a public university, University of Wisconsin-Madison salaries are available online.  I can understand why salaries of elected public officials, and the people they hire, should be public information.  Transparency in politics is a valuable thing.

But I don't see that professor's salaries should be public.  Perhaps this is merely a personal bias;  I wouldn't want MY salary to be public information.**  I also don't use Facebook, so perhaps I'm just a 20th century privacy-desiring relic.  Perhaps more reasonably, I don't see university faculty as political employees, and therefore think they -- as well as the university -- should enjoy the same privacy for salary information that other employers and employees enjoy.

Perhaps, however, I'm just wrong, and transparency of salary information is good for all.  I'm willing to entertain that thought.  Certainly I think the Taulbee survey that aggregates salary information is useful and good information, for both universities and faculty, as I think there's a shortage of accurate comparative salary information for faculty positions (as compared to other jobs), and the Taulbee survey provides an important information baseline.  Is it so far to go from there to individual's salaries?

** Although perhaps in some sense it is.  I don't believe my NSF grant budgets are publicly accessible information, but at some point, I was informed by my university that a Freedom of Information Act request had been made for one of my funded proposals.  (I don't know why, though I have some suppositions.)  The university filed paperwork to hopefully make sure that personal information, including my salary, would be redacted.

Thursday, November 10, 2011

CAEC: First Cambridge Area Economics and Computation Day

Giorgos suggested I remind people about CAEC, which will be next week (November 18).

Wednesday, November 09, 2011

Programming for Non-Programming Exercises

One of the exercises I assigned last week proved interesting:

Consider n points on a circle, labeled clockwise from 0 to n-1.  Initially a wolf begins at 0 and there is a sheep at each of the remaining n-1 points.  The wolf takes a random walk on the circle;  at each step, it moves with probability 1/2 to one neighbor and with probability 1/2 to the other neighbor.  (0 and n-1 are neighbors.)  The first time the wolf visits any point it eats the sheep that is there.  (The wolf can return to points with no sheep.)  Which sheep is most likely to be the last eaten?

If you haven't seen it before, you might try it;  don't put the answer in the comments, though, since I'll use the problem again.

While grading the assignment, I found a number of students had simulated the process, figured out the answer from the simulations, and then used that knowledge to prove the desired result.  The problem didn't ask for them to do it, but they did it themselves.

That was great (and I told them so).  That's how solving research problems often works for me.  I have to understand what's going on, and in many cases, that understanding comes about by simulating a process to figure out how things behave.  Then I go back and try to prove what I think I'm seeing in the simulations.

My worry, though, is that the students that did it this way were primarily the "non-theorists" in the class, who did it because they knew they didn't know the answer, and thought it was easier to code to figure it out.  And that the "theorists" in the class correspondingly thought they knew the answer (rightly or wrongly) and went ahead with the calculations without doing a simulation.  That's not necessarily a bad thing, certainly not for this problem (which is easy enough), but I'd also like for the theorists to also get into a mindset of doing simulations in this sort of setting, both as a tool to gain insight before trying to prove things and as a check on their proofs.

I think they're probably getting the lesson from other, harder exercises I give.  Still, it was nice that a number of people in the class went that direction (and thought to write it down in their assignment).

Monday, November 07, 2011

A Tale of Talks

A bunch of talks today.

Carla Gomes gave a talk at CRCS (Harvard's Center for Research on Computation and Society) to talk about her work on computational sustainability -- interdisciplinary research with "the overall goal of developing computational models, methods, and tools to help manage the balance between environmental, economic, and societal needs for sustainable development."  How to use optimization, machine learning, and math and computation more generally to help with problems in "the real world", like designing paths for animal migration or designing control systems for energy-efficient buildings.  Fun stuff.

Then I had to take Harvard's M2 shuttle over to the Medical School Area for the Broad Institute's annual retreat.  Some students who I have been working with on a project spanning systems biology, computer science, and statistics were giving a 15-minute presentation of their results.  (More on the work at some later date.)  The scale there is a bit larger than I'm used to;  I think over 1000 people were listening to the talk, which might well make it the most seen presentation of my work (even if we sum over multiple presentations of the same talk).  Happily, the students really nailed it, both in the presentations and the follow-up Q and A.

Then the shuttle back for Mark Zuckerberg's Q and A session at Harvard.  I don't think I've seen him speak before, and he's actually much more well spoken than one might expect if you saw The Social Network.  He was entertaining and captivating, and I'm sure inspired many of our students.  It was a full room -- you needed to get a ticket to get in.  I understand recruiting sessions with students are taking place sometime after.  If there are good writeups I'll link to one here later.

I also have my own talks to work on.  I'll be giving two talks at U. of Wisconsin this week.  One "old" talk on cuckoo hashing, and one "new" talk on verification using streaming interactive proofs.  Come on by if you're in the area.  (Of course, I suspect if you're in the area, you're probably a student or faculty member of U. of Wisconsin.)

Friday, November 04, 2011

Funny E-mail of the Day

I've having some issues getting straight answers over e-mail from an administrator in some Harvard office I'm dealing with.  This morning, I found the following e-mail in my inbox:

Dear Mmichael,
To be clear,

Well, now, this is entirely the problem, isn't it?

Wednesday, November 02, 2011

This Week, We Were Doing Security

If you look on Yelp's engineering blog (http://engineeringblog.yelp.com/2011/10/output-filtering-failure.html), you'll see Yelp's VP of Engineering, Michael Stoppelman, crediting our team (myself, John Byers, and Giorgos Zervas) for finding a privacy ``leak'' in their system that occurred on the mobile version of the Yelp site, m.yelp.com. Their post describes the issue from their point of view. We'd like to elaborate a bit further by presenting how things appeared from our end.

Before beginning, though, we should say that Yelp's team responded in what seems to us to be an exemplary fashion. After we contacted them, Michael Stoppelman and members of the engineering staff listened to our presentation and description of the vulnerability seriously, and, as they describe in their blog post, took immediate action to correct the problem. While it would be fun to have a security horror story to tell (right around Halloween) of a big company not taking the leakage of user information, or us as researchers, seriously, that absolutely was not the case here. Indeed, when we expressed that we should make the issue public after the problem was fixed, both to transparently inform their users and to possibly help prevent a similar problem on other web sites, they agreed to write a blog post about it, and let us read the copy in advance to make changes or offer suggestions -- and except for making sure Harvard, Yale, and Boston University were all credited, we didn't have any to add.

As people may know from our previous work, we have been studying sites such as Yelp, as they provide an interesting case study as a social network that provides economic information in the form of reviews. As part of our research and data collection, Giorgos was looking at their various interfaces, including the Yelp mobile web site. To be clear, he was not ``hacking'' the site in any way, just interacting with it via a standard browser and normal HTTP requests. He found that when he checked a restaurant for reviews, and subsequently clicked on the button asking for more reviews, entire reviewer records were leaked in JSON format, in the manner described in Yelp's blog post. While this data was present in HTTP replies, and was visible to an HTTP logger such as Firebug for Firefox, or via the built-in logger for Chrome, ordinary users accessing the site from a device such as an iPhone would not observe sensitive information, as client-side Javascript displayed only the non-sensitive information (such as the review text, date, and the user's handle).  This example shows the importance of having multiple redundant layers of security when handling personally identifiable information;  in the Yelp post, they describe the redundancies they have added to prevent such leakage in the future.

While there was no financial information involved, it seemed to us to be a severe hole, in that personally identifiable information was being sent in the clear in response to a normal and seemingly not infrequent user request. We spent some time verifying what we saw, checking that we were not mistaken and that the vulnerability could potentially leak information at scale. When we were fully convinced the problem was both real and significant, we contacted Yelp.

We did have concerns as we went; we have heard stories of some businesses blaming the messenger when approached with significant security issues. We were pleased that Yelp responded by thanking us rather than blaming us. In our minds, this was a very positive interaction between university researchers and an Internet business.

Giving credit where credit is due, Giorgos deserves the lauds for finding the problem and thereby protecting a lot of user data.