I think that in order to understand intelligence, one can’t start by assuming that there’s an outer goal wrapper.
I think many of the arguments that you’re referring to don’t depend on this assumption. For example, a mind that keeps shifting what it’s pursuing, with no coherent outer goal, will still pursue most convergent instrumental goals. It’s simpler to talk about agents with a fixed goal. In particular, it cuts off some arguments like “well but that’s just stupid, if the agent were smarter then it wouldn’t make that mistake”, by being able to formally show that there are logically possible minds that could be arbitrarily capable while still exhibiting the behavior in question.
Regarding the argument from Yudkowsky about coherence and utility, a version I’d agree with is: to the extent that you’re having large consequences, your actions had to “add up” towards having those consequences, which implies that they “point in the same direction”, in the same way implied by Dutch book arguments, so quantitatively your behavior is closer to being describable as optimizing for a utility function.
The point about reflectively stability is that if your behavior isn’t consistent with optimizing a goal function, then you aren’t reflectively stable. (This is very much not a theorem and is hopefully false, cf. satisficers which are at least reflectively consistent: https://arbital.com/p/reflective_stability/ .) Poetically, we could tell stories about global strategicness taking over a non-globally-strategic ecology of mind. In terms of analysis, we want to discuss reflectively stable minds because those have some hope of being analyzable; if it’s not reflectively stable, if superintelligent processes might rewrite the global dynamic, all analytic bets are off (including the property of “has no global strategic goal”).
I think that in order to understand intelligence, one can’t start by assuming that there’s an outer goal wrapper.
I think many of the arguments that you’re referring to don’t depend on this assumption. For example, a mind that keeps shifting what it’s pursuing, with no coherent outer goal, will still pursue most convergent instrumental goals. It’s simpler to talk about agents with a fixed goal. In particular, it cuts off some arguments like “well but that’s just stupid, if the agent were smarter then it wouldn’t make that mistake”, by being able to formally show that there are logically possible minds that could be arbitrarily capable while still exhibiting the behavior in question.
Regarding the argument from Yudkowsky about coherence and utility, a version I’d agree with is: to the extent that you’re having large consequences, your actions had to “add up” towards having those consequences, which implies that they “point in the same direction”, in the same way implied by Dutch book arguments, so quantitatively your behavior is closer to being describable as optimizing for a utility function.
The point about reflectively stability is that if your behavior isn’t consistent with optimizing a goal function, then you aren’t reflectively stable. (This is very much not a theorem and is hopefully false, cf. satisficers which are at least reflectively consistent: https://arbital.com/p/reflective_stability/ .) Poetically, we could tell stories about global strategicness taking over a non-globally-strategic ecology of mind. In terms of analysis, we want to discuss reflectively stable minds because those have some hope of being analyzable; if it’s not reflectively stable, if superintelligent processes might rewrite the global dynamic, all analytic bets are off (including the property of “has no global strategic goal”).