Right—as I mentioned near the end of my post, it is clearly easy to specify formal utility functions that are aboutformal systems, like Go or databases. My question is how do you specify a formal utility function that is about the real world? Almost any remotely interesting goal I can think of (such as “get me coffee”) seems impossible to formalize without relying on pre-theoretical notions of what it means for “me” to “have coffee”.
If I was just trying to build an AI, this question wouldn’t be terribly interesting. Obviously, you give the AI the utility function “maximize approval from the human trainer or concurrence with the training set” or whatever. The reason I’m posing the question is that the main research goal of AI safety appears to me to be “how do we close the gap between what humans value and what amoral maximizers do, and how do we prove that we’ve done it correctly.” One strand of research appears to be pursuing this goal through formal reasoning, and I just don’t understand where that can possibly lead, since you can’t formalize the stuff you care about in the first place.
Again, I feel like this is an extremely basic question that I have no doubt people doing the research have thought of, but I haven’t been able to find any previous discussion about it.
The closest thing I can think of is on the capabilities side looking at how to create intension, although it’s never been resolved (which is a big part of why GOFAI failed). You’re right that we mostly assume it will be figured out somehow, but safety research at least does not seem to be much addressing this question.
Right—as I mentioned near the end of my post, it is clearly easy to specify formal utility functions that are about formal systems, like Go or databases. My question is how do you specify a formal utility function that is about the real world? Almost any remotely interesting goal I can think of (such as “get me coffee”) seems impossible to formalize without relying on pre-theoretical notions of what it means for “me” to “have coffee”.
If I was just trying to build an AI, this question wouldn’t be terribly interesting. Obviously, you give the AI the utility function “maximize approval from the human trainer or concurrence with the training set” or whatever. The reason I’m posing the question is that the main research goal of AI safety appears to me to be “how do we close the gap between what humans value and what amoral maximizers do, and how do we prove that we’ve done it correctly.” One strand of research appears to be pursuing this goal through formal reasoning, and I just don’t understand where that can possibly lead, since you can’t formalize the stuff you care about in the first place.
Again, I feel like this is an extremely basic question that I have no doubt people doing the research have thought of, but I haven’t been able to find any previous discussion about it.
The closest thing I can think of is on the capabilities side looking at how to create intension, although it’s never been resolved (which is a big part of why GOFAI failed). You’re right that we mostly assume it will be figured out somehow, but safety research at least does not seem to be much addressing this question.