Seth Herd comments on AI #68: Remarkably Reasonable Reactions

Seth Herd 13 Jun 2024 23:57 UTC
25 points
0
Thanks for doing this! I enjoy it every week, and I think it’s way more useful than it might seem at first glance. The service you’re providing here is allowing people to keep up on the public happenings without spending time on media. This is invaluable for me and very likely many others, as it produces way more research hours for actually thinking about alignment.
And I get a bunch of sensible chuckles. Well done.
One recurrent theme is that lots of people expect alignment to be very hard, while a bunch of other people expect it to happen by default. This is weird, so it draws debate and attention.
I want to highlight one important difference in assumptions: one group thinks alignment means aligning AGI to all of human values. The other group assumes that AGI will follow instructions, just like LLMs and basically every AI system to date have done. The latter group thinks this because they either haven’t thought about real sapient AGI much, often because they intuitively don’t think it’s possible any time soon.
But here’s the weird bit: that second, less-considered group is sort of right, by accident. Instruction-following AGI is easier and more likely than value-aligned AGI on my analysis. Value alignment for a sovereign AGI has to be so reliable that it’s stable for the lifetime of the universe, and through tons of learning and self-improvement. Keeping a human in the loop helps a ton with the early stages—which is where most of the problems come in.
Maybe intent alignment isn’t adequate. But that’s a separate question.
I think it’s debatable whether that sort of alignment gets us through. That second group is typically pretty naive about how safe it would be to have lots of different people in charge of AGIs, each capable of RSI and therefore god-knows-what world-conquering strategies and technologies. Again, this is probably because those people usually don’t really believe that’s possible any time soon.
Just my $.02 on the current public alignment discussion. Keep up the good work.