Having just finished serving on the NSDI PC committee, and being hard at work on the STOC committee, one issue that has been on my mind is consistency in scoring papers. Ideally, when submitting to a conference, it shouldn't matter WHO reads your paper; your score should be intrinsic to your work. Of course, we don't expect -- or even necessarily want -- the complete ideal; there will be differences of opinion that occur for natural reasons. Arguably, for example, "visionary" papers are hard to judge and tend to lead to differences of opinion. So you can take solace that if you get widely varying reviews and your paper is rejected, your paper was probably just too visionary. The more cynical among us might instead think that large variances in opinion have to do with how close the reviewer is to the subfield of the paper. In theory, my take is that referees within a subarea generally give higher marks to papers in that subarea. (At NSDI, I found myself with a different impression; if anything, I thought referees working in a subarea tended to be a bit harsher towards papers in that subarea!)
My impression historically has been that there's more variance on the theory side of the world than on the systems side, particularly for the big conferences like FOCS/STOC/SODA. I was amazed, on the whole, at the consistency of NSDI reviews for papers I was on. STOC reviews are coming in, and I'm looking forward to seeing how consistent things are there, but my early impression is that there's not the same consistency, and it's interesting to hypothesize why. [Perhaps, after it's all over, I'll attempt to gather some real quantitative data on the issue, to see if my impressions match reality.]
I'll throw out a straw man for people to discuss (and attack). In theory, all (reasonable) papers start with the same basic grounding -- you have proofs of some theorems. After that point, though, things get a bit hard to judge. How "nice" are your theorems, in terms of technique/mathematical beauty/novelty? How "big" is your improvement over previous work -- for example, is improving an O(log n) competitive ratio to O(log n/log log n) "important" enough? How practical is your solution (hard to say, when nobody actually implements algorithms or gives performance results...)? A lot of this seems subjective, so naturally there are differing opinions.
In networking or other more practical fields, the same issues -- technique/beauty/novelty, quantity of improvement, practicality -- all come into play. But there's a lot more focus on the bottom line of "Could this lead to a substantial improvement for an important real-world problem." Perhaps this consistency in the criterion for what is a good paper leads to more consistency in reviews?