Friday, July 20, 2007

Citation Counts for Tenure

The discussion on citation counts reminded me of the related question of how much of a role this sort of data should play in tenure decisions.

In the world of extremes, one could imagine tenure decisions being based solely on letters without really looking at citation data. A motivation for this approach would be that letters give you a richer picture of how a person is viewed by their peers in the research community, what their work has been about, and what the potential impact of this work will be in the future. On the other hand, the system can be gamed, by making sure positively inclined people get chosen to write letters. (The person up for tenure might not exactly be able to game the system themselves, but certainly a friendly department chair could...) I have to admit, the letter-based approach feels rather "old-boy network" to me, which leaves me a bit uncomfortable.

As another extreme, one could imagine tenure decisions being based solely on numerical data gathered from Google scholar or other sources. A motivation for this approach would be that the numbers supposedly give you an unbiased picture of the impact of a researcher's work, allowing comparisons with other comparable researchers. The data could also be used to gauge the derivative -- how one's work is changing and growing in impact over time. On the other hand, the system can be gamed, by developing groups who purposefully cite each other whenever possible or by working on projects that give better numbers without really giving high impact. I have to admit, I like the numbers, and in some respects I trust them more than letters, but I still don't entirely trust them, either.

My limited experience with promotion decisions is that it makes sense to gather both types of data and make sure they are consistent. When they are not consistent, then the departmental arguments can begin. When asked to write letters, I know I look at the citation data, and would include it in the letter if I felt it was appropriate.

5 comments:

Anonymous said...

The only extreme world is the second one, where the decision is based solely on the numbers.

The first world is quite realistic (and in my opinion desirable). I never look at citation numbers and never heard them being used in hiring or promotions in our department. If I see a letter using citation numbers it makes me suspect that the writer had nothing better to say about the person, and he/she doesn't know the work so well.

The way I see it these numbers are completely meaningless, since by far the most important factor affecting citations is which sub-area the paper is in. Since no two people wrote all their papers in exactly the same areas, I don't see how we can compare people by citation numbers.

In fact numbers, when used improperly, are worse than useless because they give an aura of objectivity. I have yet to see proof that citation numbers can be used properly.

If a dean worries that the chair "stacked the deck" and only asked friendly people, he surely could find out who are the people that are considered the best luminaries in that area and ask them for additional letters.

Alan Fekete said...

I agree very much with the remark above, that citation counts vary enormously between subfields (AI is very high, bioinformatics is even higher).

Another important factor when using citation metrics is that you really need to see how the cited work is used (or not). For example, papers I wrote with Lynch, Merritt and Weihl on nested transactions have 41, 18, 15 and 13 cites in Google scholar; but except for self-cites, almost all are of the form "a different approach to nested transactions is [Fekete et al]" or "we don't cover nested transactions [Fekete et al]"; that is, no-one ever built on the work we did. We just became a necessary reference whenever anyone else worked in the area.

Finally, I should point out that these considerations matter at top schools; but most CS researchers, at most institutions, have never done anything that was cited much. My experience reading grant applications etc is that the majority of researchers in Australia (even the senior faculty) have hardly any papers with more than 5 cites in Google scholar. So other mechanisms are needed to make hiring/promotion decisions for these people.

zaumka said...

I definitely agree that pure citation counts are almost meaningless. However, I also share Michael's sentiment that relying purely on letters is also too one-sided and sometimes even unfair. This is due to several reasons.

1) somebody with inferior communication skills but strong research might get worse letters than somebody charming with more mediocre research. I know of a few examples, but don't want to bring them up here for obvious reasons.

2) the number of letters one gets is quite small in some institutions (say, 5-6), and it's pretty scary to have you career dependent on such a small sample. What if you have bad relationship with one or two people who will almost surely write the letters for you? Do you have to go against your principles and "make up" with those people just because they will write a letter for you?

3) one can often predict who the letter writers will be. And one can play the system to affect the opinion of those people.

4) Many senior people do not follow what is going on too actively. Thus, somebody's fate may be decided by somebody who is no longer active in the field and/or understand what the person being promoted had done. I know some "dead wood" who are routinely asked to write tenure letters. Quite scary.

5) Reputation gets formed early on and is very hard to change. If B did better than A early on (say, as PhD student), but then A dramatically improved during the pre-tenure years, the chance that A would suddenly get rated higher than B is essentially nil. It might take at least 10 years to people to change the bias, and it's too late for the tenure case of person A.

Having said this, I still think that, on average, letters are still very important and give the best overall evaluation about the person. In particular, above problems are usually not deadly (or beneficiary :)), but they could be for some people. Thus, it would be great to have some other meaningful criteria to the system.

The best I can think of (still not perfect) is to assign weight to various conferences, and count person's weighted average. Say, in cryptography, STOC/FOCS get 3, CRYPTO/Eurocrypt/TCC get 2, Asiacrypt/CCS/PKC/SODA get 1, and everything else does not count. Of course, the difficulty with this is that it's hard to agree on the numbers, especially across different areas.

So, pretty much, as bad as letter writing is, it will probably remain the best option for a while...

Michael Mitzenmacher said...

In response to comments thus far, I hope my post makes clear that I find either extreme for evaluating research undesirable. I do understand that there are potential problems with using citation counts, and I certainly never meant to suggest that just summing the numbers would be sufficient. But I find the argument that it's difficult to normalize for sub-area, or conference, or other parameter you'd care to name quite uncompelling. We're computer scientists, and figuring out what to do with incomplete data to obtain accurate predictions is one of our jobs. My extreme view is that suggesting that we should ignore this data when it's so easily available and so obviously correlated (even if only loosely) with research quality arguably borders on malpractice.

Alan's argument that one also needs to see the context of how the papers were cited I find more compelling. I would hope that appropriate tools, or careful eyes reviewing cases, or letters from domain experts would be able to add texture to the citation context.

I think zamuka did an outstanding job bringing up many important drawbacks of relying on letters. I'm sure there are many others. Anonymous #1 says
"In fact numbers, when used improperly, are worse than useless because they give an aura of objectivity." The same could be said for letters... except that, really, the letter process has little semblance of objectivity. Both the writing of the letters and the interpretation of them by the readers is subjective. That's a frightening system to me. (Perhaps I'm overly senstitive; history tells me the Ivy League introduced such "subjective" measures for undergraduate admissions when they wanted to keep Jews out...)

Anonymous said...

Zaumka made good arguments, but I still think letters are the way to go, and citation numbers/counts should not be used other than perhaps "sanity checks".

A good question is what is the role of an evaluation process. I see it as a process to assist people (department, chairs, deans etc..) that are genuinely interested in hiring/promoting the best researcher for the job. I believe that letters are the best resource for such people, if they ask enough of them and ask the right people (people active and knowledgable in the same area, not necessarily the most senior).

The fact that we're computer scientists doesn't mean that we have to look for numbers and apply algorithms in all aspects of life.

It seems that Michael is interested in a more "foolproof" process that is "cryptographically secure" against chairs/deans/etc.. that want to hire unqualified people or discriminate based on irrelevant properties. It may be that numbers are better in that respect, though I doubt that any such foolproof process exists.