Even if transfer learning is a thing that could work, in any given domain that doesn’t have terrible feedback loops, would it not be more efficient to just apply the deliberate practice and metacognition to the domain itself? Like, if I’m trying to learn how to solve puzzle games, would it not be more efficient to just practice solving puzzle games than to do physics problems and try to generalise? Or if you think that this sort of general rationality training is only important for ‘specialising in problems we don’t understand’ type stuff with bad feedback loops, how would you even figure out whether or not it’s working given the bad feedback loops? Like sure, maybe you measure how well people perform at some legibly measurable tasks after the rationality training and they perform a bit better, but the goal in the first place was to use the rationality training’s good feedback loops to improve in domains with bad feedback loops, and those domains seem likely to be different enough that a lot of rationality lessons or whatever just don’t generalise well.
It just feels to me like the world where transfer learning works well enough to be worth the investment looks a lot different wrt how specialised the people who are best at X are for any given X. I can’t off the top of my head think of anyone who became the best at their thing by learning very general skills first and then applying them to their domain, rather than just focusing really hard on whatever their thing was.
This seems like it’s equivocating between planning in the sense of “the agent, who assigns some (possibly negative) value to following any given arrow, plans out which sequence of arrows it should follow to accumulate the most value*” and planning in the sense of “the agent’s accumulated-value is a state function”. The former lets you take the detour in the first planning example (in some cases) while spiralling endlessly down the money-pump helix in the cyclical preferences example; the point of money pump arguments is to get the latter sort of planning from the former by ruling out (among other things) cyclical preferences. And it does so by pointing out exactly the contradiction you’re asking about; “the fact that we can’t meaningfully define planning over [cyclic preferences]” (with “planning” here referring to having your preferences be a state function) is precisely this.
The anthropomorphisation of the agent as thinking in terms of states instead of just arrows (or rather, due to its ability to plan, the subset of arrows that are available) is the error here, I think. Note that this means the agent might not end up at C in the first example if, eg, the arrow from A to B is worse than the arrow from B to C is good. But C isn’t its “favourite state” like you claim; it doesn’t care about states! It just follows arrows. It seems like you’re sneaking in a notion of planning that is basically just assuming something like a utility function and then acting surprised when it yields a contradiction when applied to cycles.
*I guess I’m already assuming that the arrows have value and not just direction, but I think the usual money pump arguments kinda do this already (by stating the possibility of exchanging money for arrow); I’m not sure how you’d get any sort of planning without at least this