I'm just about finished reviewing for CoNEXT (Conference on Emerging Networking Experiments and Technologies), and am starting reviewing for ITCS (Innovations in Theoretical Computer Science). One notable variation in the process is the choice of the score scale. For CoNEXT, the program chairs chose a 2-value scale: accept or reject. For ITCS, the program chair chose a 9-point scale. Scoring from 1-9 or 1-10 is not uncommon for theory conferences.
I dislike both approaches, but, in the end, believe that it makes minimal difference, so who am I to complain?
The accept-or-reject choice is a bit too stark. It hides whether you generously thought this paper should possibly get in if there's room, or whether you really are a champion for the paper. A not-too-unusual situation is a paper gets (at least initially) a majority of accept votes -- but nobody really likes the paper, or has confronted its various flaws. (Or, of course, something similar the other way around, although I believe the first case is more common, as it feels better to accept a close call than to reject one.) Fortunately, I think the chairs have been doing an excellent job (at least on the papers I reviewed) encouraging discussion on such papers as needed to get us to the right place. (Apparently, the chairs aren't just looking at the scores, but reading the reviews!) As long as there's actual discussion, I think the problems of the 2-score solution can be mitigated.
The 9 point scale is a bit too diffuse. This is pretty clear. On the description of score semantics we were given, I see:
"1-3 : Strong rejects".
I'm not sure why we need 3 different numbers to represent a strong reject (strong reject, really strong reject, really really strong reject), but there you have it. The boundaries between "weak reject", "a borderline case" and "weak accept" (scores 4-6) also seem vague, and could easily lead to different people using different interpretations. Still, we'll see how it goes. As long as there's good discussion, I think it will all work out here as well.
I prefer the Goldilocks scale of 5 values. I further think "non-linear" scoring is more informative: something like top 5%, top 10%, top 25%, top 50%, bottom 50%, but even scores corresponding to strong accept/weak accept/neutral/weak reject/strong reject seem more useful when trying to make decisions.
Finally, as I have to say whenever I'm reviewing, HotCRP is still the best conference management software (at least for me as a reviewer).