Can you explain how this comment applies to Zvi’s post? In particular, what is the “subtle claim” that Zvi is not addressing. I don’t particularly care about what MIRI people think, just about the object level.
strawman MIRI: alignment is difficult because AI won’t be able to answer common-sense morality questions
“a child is drowning in a pool nearby. you just bought a new suit. do you save the child?”
actual MIRI: almost by definition a superintelligent AI will know what humans want and value. It just won’t necessarily care. The ‘value pointing’ problem isn’t about pointing to human values in its belief but in its own preferences.
There are several subtleties: belief is selected by reality (having wrong beliefs is punished) and highly constrained, preferences are highly unconstrained (this is a more subtle version of the orthogonality thesis). human value is complex and hard to specify—in particular hitting it by pointing approximately at it (‘in preference space’) is highly unlikely to hit it (and because there is no ‘correction from reality’ like in belief).
strawman Barnett: MIRI believes strawman MIRI and gpt-4 can answer common-sense morality questions so it update.
actual Barnett: i understand the argument that there is a difference between making AI know human values versus caring about those values. I’m arguing that the human value function is in fact not that hard to specify. approximate human utility function is relatively simple and a gpt-4 knows it.
(which is still distinct from saying gpt-4 or some AI will care about it. but at least it belies the claim that human values are hugely complex).
Can you explain how this comment applies to Zvi’s post? In particular, what is the “subtle claim” that Zvi is not addressing. I don’t particularly care about what MIRI people think, just about the object level.
strawman MIRI: alignment is difficult because AI won’t be able to answer common-sense morality questions
“a child is drowning in a pool nearby. you just bought a new suit. do you save the child?”
actual MIRI: almost by definition a superintelligent AI will know what humans want and value. It just won’t necessarily care. The ‘value pointing’ problem isn’t about pointing to human values in its belief but in its own preferences.
There are several subtleties: belief is selected by reality (having wrong beliefs is punished) and highly constrained, preferences are highly unconstrained (this is a more subtle version of the orthogonality thesis). human value is complex and hard to specify—in particular hitting it by pointing approximately at it (‘in preference space’) is highly unlikely to hit it (and because there is no ‘correction from reality’ like in belief).
strawman Barnett: MIRI believes strawman MIRI and gpt-4 can answer common-sense morality questions so it update.
actual Barnett: i understand the argument that there is a difference between making AI know human values versus caring about those values. I’m arguing that the human value function is in fact not that hard to specify. approximate human utility function is relatively simple and a gpt-4 knows it.
(which is still distinct from saying gpt-4 or some AI will care about it. but at least it belies the claim that human values are hugely complex).