i mean sure but i’d describe both as utility maximizers because maximizing utility is it fact what they consistently do. Dragon God’s claim seems to be that we wouldn’t get an AI that would be particularly well predicted by utility maximization, and this seems straightforwardly false of agents 1 and 2.
If you were trying to design something that acts like agent 2 and were stuck in a mindsight of “it must be maximizing some utility function, let’s just think in utility function terms” you might find it difficult.
(Side point) I’m not sure how much the arguments in Eliezer’s linked post actually apply outside the consequentialist context, so I’m not convinced that coherence necessarily implies a possible utility function for non-consequentialist agents.
It might be that the closest thing to what we want that we can actually figure out how to make actually isn’t coherent. In which case we would face a choice between
making it and hoping that its likely self-modification towards coherence won’t ruin it’s alignment, or
making something else that is coherent to start with but is less aligned
i mean sure but i’d describe both as utility maximizers because maximizing utility is it fact what they consistently do. Dragon God’s claim seems to be that we wouldn’t get an AI that would be particularly well predicted by utility maximization, and this seems straightforwardly false of agents 1 and 2.
Yes, but:
If you were trying to design something that acts like agent 2 and were stuck in a mindsight of “it must be maximizing some utility function, let’s just think in utility function terms” you might find it difficult.
(Side point) I’m not sure how much the arguments in Eliezer’s linked post actually apply outside the consequentialist context, so I’m not convinced that coherence necessarily implies a possible utility function for non-consequentialist agents.
It might be that the closest thing to what we want that we can actually figure out how to make actually isn’t coherent. In which case we would face a choice between
making it and hoping that its likely self-modification towards coherence won’t ruin it’s alignment, or
making something else that is coherent to start with but is less aligned
While (a) is risky (b) seems worse to me.