A person who does your will because it makes them happy to do so, or because they are irrationally biased to do so, is more reliable than one who does your will as long as his calculus tells him to.
Not if the latter explicitly exhibits the form of that calculus; then you can extrapolate their future decisions yourself, more easily than you can extrapolate the decisions of the former. Higher-order rationality includes finding a decision algorithm which can’t be exploited if known in this manner.
Of course, actually calculating and reliably acting accordingly is a high standard for unmodified humans, and it’s a meaningful question whether incremental progress toward that ideal will lead to a more reliable or less reliable agent. But that’s an empirical question, not a logical one.
Not if the latter explicitly exhibits the form of that calculus; then you can extrapolate their future decisions yourself, more easily than you can extrapolate the decisions of the former.
More easily? It’s more easy to predict decisions based on a calculus, than decisions based on stimulus-response? That’s simply false.
Note that in the fMRI example, it is impossible to examine the calculus. You can only examine the level of bias. There is no way for somebody to say, “Oh, he’s unbiased, but he has an elaborate Yudkowskian utility function that will lead him to act in ways favorable to me.”
Not if the latter explicitly exhibits the form of that calculus; then you can extrapolate their future decisions yourself, more easily than you can extrapolate the decisions of the former. Higher-order rationality includes finding a decision algorithm which can’t be exploited if known in this manner.
Of course, actually calculating and reliably acting accordingly is a high standard for unmodified humans, and it’s a meaningful question whether incremental progress toward that ideal will lead to a more reliable or less reliable agent. But that’s an empirical question, not a logical one.
More easily? It’s more easy to predict decisions based on a calculus, than decisions based on stimulus-response? That’s simply false.
Note that in the fMRI example, it is impossible to examine the calculus. You can only examine the level of bias. There is no way for somebody to say, “Oh, he’s unbiased, but he has an elaborate Yudkowskian utility function that will lead him to act in ways favorable to me.”