Rob Bensinger comments on Progress and Prizes in AI Alignment

Rob Bensinger 4 Jan 2017 16:58 UTC
4 points

The Berkeley Center for Human-Compatible AI doesn’t seem to have a specific research agenda beyond Stuart Russell.

Stuart Russell was the primary author of the FLI research priorities document, so I’d expect CHCAI’s work to focus in on some of the problems sketched there. Based on CHCAI’s publication page, their focus areas will probably include value learning, human-robot cooperation, and theories of bounded rationality. Right now, Russell’s group is spending a lot of time on cooperative inverse reinforcement learning and corrigibility.

This slide from a recent talk by Critch seems roughly right to me: https://intelligence.org/wp-content/uploads/2017/01/hotspot-slide.png

So, why isn’t there an XPrize for AI safety?

A prize fund is one of the main side-projects MIRI has talked about wanting to do for the last few years, if we could run a sufficiently large one—more or less for the reasons you mention. Ideally the AI safety community would offer a diversity of prizes representing different views about what kinds of progress we’d be most excited by.

If funds for this materialize at some point, the main challenge will be that the most important conceptual breakthroughs right now involve going from mostly informal ideas to crude initial formalisms. This introduces some subjectivity in deciding whether the formalism really captures the key original idea, and also makes it harder for outside researchers to understand what kinds of work we’re looking for. (MIRI’s research team’s focus is exactly on the parts of the problem that are hardest to design a prize for.) It’s easier to come up with benchmarks in areas where there’s already been a decent amount of technical progress, which would be quite valuable on its own, though it means potentially neglecting the most important things to work on.