Marcus Ogren comments on The Learning-Theoretic Agenda: Status 2023

Marcus Ogren 19 Apr 2023 8:22 UTC
5 points
0
(Disclosure: Vanessa is my wife.)
I want to share my thoughts on how the LTA can have a large impact. I think the main plan—to understand agency and intelligence fully enough to construct a provably aligned AI (perhaps modulo a few reasonable assumptions about the real world) - is a good plan. It’s how a competent civilization would go about solving the alignment problem, and a non-negligible chunk of the expected impact of the LTA comes from it working about as planned. But there are also plenty of other, less glamorous ways for it to make a big difference as well.
The LTA gives us tools to think about AI better. Even without the greater edifice of LTA and without a concentrated effort to complete the LTA and build an aligned AI in accordance with it, it can yield insights that help other alignment researchers. The LTA can identify possible problems that need to be solved and currently-unknown pitfalls that could make an AI unsafe (along the lines of the problem of privilege and acausal attack). It can also produce “tools” for solving certain aspects of the alignment problem that could be applied in an ad-hoc manner (such as the individual components of PSI). While this is decidedly inferior to creating a provably aligned AI, it is also far more likely to happen.
As for PSI, I think it’s a promising plan for creating an aligned AI in and of itself; it doesn’t appear to require greatly reduced capabilities and gives the AI an unhackable pointer to human values. But its main significance is as a proof of concept: the LTA has delivered this alignment proposal, and the LTA isn’t even close to being finished. My best guess is that, given enough time, some variant of PSI could be created as a provably aligned AI (modulo assumptions about human psychology, etc.). But I also expect better ideas in the future. PSI demonstrates that considering the fundamental questions of agency can lead to novel and elegant solutions to the alignment problem. Before Vanessa came up with PSI, I thought the main value of her research lay in solving weird-sounding problems (like acausal attack) that originally sounded more like an AI being stupid than like the AI being misaligned. PSI shows that the LTA is much, much more than this.