You could argue that while [building AIs with really weird utility functions] is possible in principle, no one would ever build such an agent. I wholeheartedly agree, but note that this is now an argument based on particular empirical facts about humans (or perhaps agent-building processes more generally).
And if you’re going to argue based on particular empirical facts about what goals we expect, then I don’t think that doing so via coherence arguments helps very much.
And if you’re going to argue based on particular empirical facts about what goals we expect, then I don’t think that doing so via coherence arguments helps very much.
I note that the first sentence of your post is “Rohin Shah has recently criticised Eliezer’s argument that “sufficiently optimised agents appear coherent”, on the grounds that any behaviour can be rationalised as maximisation of the expectation of some utility function.” so it seems worth pointing out that there’s a reasonable way to interpret “sufficiently optimised agents appear coherent” which isn’t subject to that criticism.
Beyond that, as I mentioned, it’s not clear to me what Eliezer was arguing for. (It seems plausible that he considered “sufficiently optimised agents appear coherent”, or the immediate corollary that such agents can be viewed as approximate EU maximizers with utility functions over the O that I defined, interesting in itself as a possibly surprising prediction that we can make about such agents.) What larger conclusion do you think he was arguing for, and why (preferably with citations)? Once we settle that, maybe then we can discuss whether his argumentative strategy was a good one?
From Rohin’s post, a quote which I also endorse:
And if you’re going to argue based on particular empirical facts about what goals we expect, then I don’t think that doing so via coherence arguments helps very much.
I note that the first sentence of your post is “Rohin Shah has recently criticised Eliezer’s argument that “sufficiently optimised agents appear coherent”, on the grounds that any behaviour can be rationalised as maximisation of the expectation of some utility function.” so it seems worth pointing out that there’s a reasonable way to interpret “sufficiently optimised agents appear coherent” which isn’t subject to that criticism.
Beyond that, as I mentioned, it’s not clear to me what Eliezer was arguing for. (It seems plausible that he considered “sufficiently optimised agents appear coherent”, or the immediate corollary that such agents can be viewed as approximate EU maximizers with utility functions over the O that I defined, interesting in itself as a possibly surprising prediction that we can make about such agents.) What larger conclusion do you think he was arguing for, and why (preferably with citations)? Once we settle that, maybe then we can discuss whether his argumentative strategy was a good one?