Seth Herd comments on Another argument against utility-centric alignment paradigms

Seth Herd Sep 23, 2024, 6:25 PM
2 points
0
This post is great; good job getting it out the door without further perfecting it.
It’s important because I think a lot of people are roughly in this camp.
And it’s important because it’s half right in an important way: we should not be focusing alignment discussions on utility maximizers. That’s not the sort of AGI we’re expecting anymore. This is a real cause for increasing optimism above Eliezer’s and thinking about the problem differently than Superintelligence frames it.
But it’s also important because it’s half wrong in an important way: focusing on current AI systems is not the right way to think about alignment with fresh eyes.
Current systems are safe and also mostly useless. We will turn them into goal-directed agents because it’s easy and very very useful.
We can think about aligning those agentic systems based on some of the properties of the deep networks they’ll use. But they’ll also have real, explicit goals. Aligning those goals with humanity’s is the subject of alignment.
For more, see my reply to Thane Ruthenis’ top comment.