Tuesday, January 22, 2008

More From SODA, Andrei Broder's talk

Andrei Broder gave an interesting invited talk on the new area of "computational advertising". As Andrei was my mentor for several years, and historically speaking, I am always thrilled by his work and his insights, consider this a gushingly positive review.

The talk started with a wonderfully complimentary introduction by Allan Borodin, who described accurately how Andrei was one of the initial leaders in the area of Web search algorithms, and his early success is a large part of the reason that so many people from our area have become involved in the field.

Andrei described computational advertising is a new scientific sub-discipline, at the intersection of search, information retrieval, machine learning, statistical modeling, etc. The main problem: find the best match between a given user in a given context and a suitable advertisement.

Andrei started with a brief introduction to advertising (brand marketing vs. direct-marketing, the $$$ behind advertising), and then focused on issues related to sponsored search (keyword driven ads) and content-match (context driven ads, e.g. driven by a page content), although he mentioned the field is much bigger than that (auctions, ad exchanges, privacy/legal issues, etc.). He emphasized the point that ads are "information". Irrelevant ads annoy; relevant ads inform and generate interest. Perhaps the most challenging fundamental question in sponsored search is how to find the right ad. Current practice allows broad matches -- if you want to advertise a Seattle hotel, you can broad match to Seattle or Seattle hotels, but this creates lots of false positives and false negatives. (You match too many things you don't want, and not all that you do. As another example, see my post on what pops up when you do searches related to theory.) What you really want is to match queries related to Seattle as a travel destination. How can that be done? How can you price query matches of these types, since it's not based on bidding on specific keywords? Notice that a lot of this now depends on semantic vs. syntactic matching -- understanding the intent. Similar questions arise in content match advertising, where there is the additional trouble of having a publisher of the page content involved. A further aspect is understanding the context of the user, potentially based on user profiles or recent past history.

One way of understanding context, naturally, is to use a search engine. To classify a keyword phrase, one could look at the search engine results, and re-define or interpret the keyword according to these results (say by giving it a category, to match a collection of ads). This can be made fast by classifying pages offline, and then use pages returning from the search query to "vote" on a category for the query (rather than text-analyzing returned pages on the fly). Even a weak classifier of pages does well if you have a reasonable voting scheme! Similar ideas can be used in content ads, using a mix of category and semantic information to find ads to go with a page. Andrei also raised the issue of caching ads; when are queries "close enough" that you can use a cached from a query-ad pair that may live in a cache?

I left the talk with the feeling that there were a lot of challenging "algorithm engineering" problems here. It's often hard to prove things in real-world settings, but one often needs real algorithmic insight to design solid, well-founded approaches to these real-world problems. This is why Google and Yahoo and MSN are hiring a lot of theoretically minded people. For such companies, there is always the aspect that one may have to give up some intellectual purity when working in these areas. You can't prove everything, and you need to make things that work. But this is a tradeoff. Not just a tradeoff in terms of money (though that's certainly there), but in terms of having a large base of users who actually use and gain from your algorithmic ideas. It's a tradeoff typical when working at theory/practice boundary, but here there's a pretty clear economic/users motivation that's really pushing this area forward. I think the talk did an excellent job of introducing the basic problems and motivation to a general (theory) audience.

No comments: