I think the crux lies elsewhere, as I was sloppy in my wording. It’s not that maximizing some utility function is an issue, as basically anything can be viewed as EU maximization for a sufficiently wild utility function. However, I don’t view that as a meaningful utility function. Rather, it is the ones like e.g. utility functions over states that I think are meaningful, and those are scary. That’s how I think you get classical paperclip maximizers.
When I try and think up a meaningful utility function for GPT-4, I can’t find anything that’s plausible. Which means I don’t think there’s a meaningful prediction-utility function which describes GPT-4′s behaviour. Perhaps that is a crux.
Re utility functions over states, it turns out that we can validly turn utility functions over plans/predictions into utility functions over world states/outcomes (though usually with constraints on how large the domain is, though not always.)
And yeah, I think it’s a crux that I think that at the very least, what GPT-N systems will look like, if they reach AGI/ASI, will probably look like a maximizer for updating given input conditions like prompts.
My main point isn’t that the utility function framing of GPT-4 or GPT-N is wrong, but rather that LWers inferred way too much from how a system would behave, even conditional on expected utility maximization being a coherent frame for AIs, because they don’t logically imply the properties they thought it did without more assumptions that need to be defended.
I think the crux lies elsewhere, as I was sloppy in my wording. It’s not that maximizing some utility function is an issue, as basically anything can be viewed as EU maximization for a sufficiently wild utility function. However, I don’t view that as a meaningful utility function. Rather, it is the ones like e.g. utility functions over states that I think are meaningful, and those are scary. That’s how I think you get classical paperclip maximizers.
When I try and think up a meaningful utility function for GPT-4, I can’t find anything that’s plausible. Which means I don’t think there’s a meaningful prediction-utility function which describes GPT-4′s behaviour. Perhaps that is a crux.
Re utility functions over states, it turns out that we can validly turn utility functions over plans/predictions into utility functions over world states/outcomes (though usually with constraints on how large the domain is, though not always.)
https://www.lesswrong.com/posts/k48vB92mjE9Z28C3s/?commentId=QciMJ9ehR9xbTexcc
And yeah, I think it’s a crux that I think that at the very least, what GPT-N systems will look like, if they reach AGI/ASI, will probably look like a maximizer for updating given input conditions like prompts.
My main point isn’t that the utility function framing of GPT-4 or GPT-N is wrong, but rather that LWers inferred way too much from how a system would behave, even conditional on expected utility maximization being a coherent frame for AIs, because they don’t logically imply the properties they thought it did without more assumptions that need to be defended.