I had the same thoughts after listening to the same talk. I think the advantage of utility functions, though, is that they are well-defined mathematical constructs we can reason about and showcase the corner cases that may pop up in other models but would also be easier to miss. AGI, just like all existing intelligences, may not be implemented with a utility function, but the utility function provides a powerful abstraction for reasoning about what we might call more loosely its “preference relation” that, by admitted contradictions, may risk us missing cases where the contradictions do not exist and the preference relation becomes a utility function.
The point being, for the purpose of alignment, studying utility functions makes more sense because your control method can’t possibly work on a preference relation if it can’t even work on the simpler utility function. That real preference relations contain things that prevent the challenges of aligning utility functions in existing intelligences instead provides evidence of how the problem might be solved (at least for some bounded cases).
That makes sense. But it isn’t what Eliezer says in that talk:
There’s a whole set of different ways we could look at agents, but as long as the agents are sufficiently advanced that we have pumped most of the qualitatively bad behavior out of them, they will behave as if they have coherent probability distributions and consistent utility functions.
Basically agree, and it’s nearly the same point I was trying to get at, though by less supposing utility functions are definitely the right thing. I’d leave open more possibility that we’re wrong about utility functions always being the best subclass of preference relations, but even if we’re wrong about that our solutions must at least work for utility functions, they being a smaller set of all possible ways something could decide.
I had the same thoughts after listening to the same talk. I think the advantage of utility functions, though, is that they are well-defined mathematical constructs we can reason about and showcase the corner cases that may pop up in other models but would also be easier to miss. AGI, just like all existing intelligences, may not be implemented with a utility function, but the utility function provides a powerful abstraction for reasoning about what we might call more loosely its “preference relation” that, by admitted contradictions, may risk us missing cases where the contradictions do not exist and the preference relation becomes a utility function.
The point being, for the purpose of alignment, studying utility functions makes more sense because your control method can’t possibly work on a preference relation if it can’t even work on the simpler utility function. That real preference relations contain things that prevent the challenges of aligning utility functions in existing intelligences instead provides evidence of how the problem might be solved (at least for some bounded cases).
That makes sense. But it isn’t what Eliezer says in that talk:
Do you disagree with him on that?
Basically agree, and it’s nearly the same point I was trying to get at, though by less supposing utility functions are definitely the right thing. I’d leave open more possibility that we’re wrong about utility functions always being the best subclass of preference relations, but even if we’re wrong about that our solutions must at least work for utility functions, they being a smaller set of all possible ways something could decide.