Some of my previous posts have led me to think about the following -- something I'm hoping to write a longer piece about in the near future.
In the past few weeks, at Harvard (and elsewhere) there have been reports about the "decline of the humanities". (Whether these reports have any significant bearing in reality is not necessarily important to this post.) But machine learning keeps getting better and better. While we may never be able to predict the exact outcome for an individual student, statistically speaking, as the universities gather more data, they will get better at predicting, for example, what a student will major in. Potentially, with the right sort of tracking, universities may be able to predict reasonably well what jobs students may go into -- heck, they may get a statistically meaningful prediction of their future net worth.* In particular, if we wanted to choose students according to what they were going to major in, in order to keep the humanities supporters happy, we could; while we can already kind of do that now (based on, for example, what student say they want to major in), we'll just keep getting better at it.
This will lead to all sorts of questions. Or, perhaps better said, will make the questions that already to some extent exist more pronounced. First, getting to to the humanities concern, how should we choose our students? Should we have quotas by future major? We could assign departments a permanent percentage (well, an "expected percentage") of the incoming graduates and accept students accordingly? From some faculty members' and administrators' point of view, perhaps this makes sense; we can guarantee a department size, and a suitable faculty/student ratio per department. To me, it seems potentially disastrous, turning the university into a static entity, which perhaps would not in any sense limit any individual student in terms of what they want to study, but would create a less flexible global atmosphere. Again, in some sense, this question exists today; at least some people have responded to the "humanities crisis" by saying that how students are accepted should be changed (to give preference to humanities-interested students), but the question becomes an even more significant challenge once you assume you actually have very strong prediction methods that can allow you to select students in this way more accurately than has been the historical norm.
Of course, going beyond the picayune issue of whether we should choose students according to what they might major in, there's the larger scale question of how we should choose students. Indeed, this question lies at the heart of many an affirmative action lawsuit, with the "reverse affirmative action" side claiming that people of what I will call "white" descent are not admitted in favor of less qualified "non-white" students. (The issue is obviously more complicated than this paragraph can do justice to; for example, the issue of Asian American discrimination arises.) In such discussions, one generally hears the term "merit" -- if only schools just took the top people according to merit and ignored race completely -- but what exactly is merit? Legislators or judges seem to want some sort of formula (usually based on grades and or test scores -- except that, by studying their own big data, some at Google claim that "G.P.A.'s are worthless" for their hiring). Let's suppose our machine learning tools are good enough to estimate merit quite accurately if we define the merit objective function for them.** How should we define it? One particularly intriguing question, is the "merit" of the class simply the sum of merits of the collected individuals -- in which case we should ignore things like what major they want to choose -- or is the merit of the sum different from the sum of the merits? I have some of my own not-completely-worked-out ideas, but again, this seems worth writing a longer essay about to work through the possibilities and implications.
A further interesting question that arises is what sort of information can and should universities gather about applicants, in order to make these predictions. College applications already ask for a lot -- grades, lists of activities, essays, letters of recommendation, test scores, sometimes interviews. Suppose, though, that we could much more clearly predict your "merit" as a future student by parsing your Facebook account, or better yet, your e-mail from the last 3 years. Should we be able to ask for that? Perhaps we can guarantee that our algorithms will return a score only and your actual e-mail will not be examined at all by any human beings. Or, by the time we get to the point where our machine learning algorithms are ready for that data, privacy won't matter to anyone anyway, especially if providing access to the data is needed to get into their choice of school.
In some sense, none of these questions are inherently new. But they appear to become different in kind once you think about the power machine learning will give to systems that make decisions about things like who goes to what university. While the university setting is arguably small, the themes seem quite large, and perhaps the university is the place where some of the thinking behind the larger themes needs to be taking place. And taking place now, before the technology is here and being used without a lot of thought into how it really should be used.
* Obviously, there are countless other potentially more significant uses of machine learning technology. But I work at a university, so this is what has come to mind recently.
** As far as I know, the merit function for Harvard is not "how much will you or your family donate to Harvard in the future". But it could be. Even if we avoid the potential self-interest of universities, to what extent is net worth a suitable metric of merit? I was an undergraduate at Harvard and am now a professor there; Bill Gates was an undergraduate (who notoriously dropped out) and donated a large amount of money for the building I now work in, and apparently has had a few other successes. Extreme cases, to be sure, but how would the merit objective function judge these outcomes?