There seems to two major counter-claims to your project:
Feedback loops can’t work for nebulous domains, so this whole thing is misguided.
Transfer learning is impossible and you can’t get better at rationality by grinding LeetCode equivalents.
(There’s also the third major counter-claim that this can’t work for alignment research, but I assume that’s actually irrelevant since your main point seems to be about rationality training.)
My take is that these two claims stem from inappropriately applying an outcome-oriented mindset to a process-oriented problem. That is, the model seems to be: “we wanted to learn X and applied Feedback Loops™️ but it didn’t work, so there!” instead of “feedback-loopiness seems like an important property of a learning approach we can explicitly optimise for”.
In fact, we can probably factor out several senses of ‘feedback loops’ (henceforth just floops) that seem to be leading a lot of people to talk past each other in this thread:
Floops as reality pushing back against movement, e.g. the result of swinging a bat, the change in an animation when you change a slider in an Explorable Explanation
Floops where the feedback is quick but nebulous (e.g. persuasion, flirting)
Floops where the feedback is clear but slow (e.g. stock market)
Floops as reinforcement, i.e. the cycle Goal → Attempt → Result
Floops as OODA loops (less legible, more improvisational than previous item)
Floops where you don’t necessary choose the Goal (e.g. competitive multiplayer games, dealing with the death of a loved one)
Floops which are not actually loops, but a single Goal → Attempt → Result run (e.g., getting into your target uni)
Floops which are about getting your environment to support you doing one thing over and over again (e.g. writing habits, deliberate practice)
Floops which are cumulative (e.g. math)
Floops where it’s impossible to get sufficiently fine-grained feedback without the right paradigm (e.g. chemistry before Robert Boyle)
Floops where you don’t necessarily know the Goal going in (e.g. doing 1-on-1s at EA Global)
Floops where failure is ruinous and knocks you out of the game (e.g. high-rise parkour)
Anti-floops where the absence of an action is the thing that moves you towards the Goal
Floops that are too complex to update on a single result (e.g. planning, designing a system)
When someone says “you can’t possibly apply floops to research”, I imagine they’re coming from a place where they interpret goal-orientedness as an inherent requirement of floopiness. There are many bounded, close-ended things that one can use floops for that can clearly help the research process: stuff like grinding the prerequisites and becoming fluent with certain techniques (cf. Feynman’s toolbox approach to physics), writing papers quickly, developing one’s nose (e.g. by trying to forecast the number of citations of a new paper), etc.
This claim is independent of whether or not the person utilising floops is good enough to get better quickly. I think it’s not controversial to claim that you can never get a person with profound mental disabilities and who is not a savant at technical subjects to discover a new result in quantum field theory, but this is also irrelevant when talking about people who are baseline capable enough to worry about this things on LessWrong dot com in the first place.
On the other end of the spectrum, the reductio version of being against floops: that everyone was literally born with all the capabilities they would ever need in life and Learning Is Actually a Myth, seems blatantly false too. Optimising for floopiness seems to me merely trying to find a happy medium in between.
All modern games have two floops built-in: a core game loop that gets completed in under a minute, and a larger game loop that makes you come back for more. Or in the context of my project, Blackbelt:
The idea is you can design bespoke tests-of-skill to serve as your core game loop (e.g., a text box with a word counter underneath, the outputs of your Peloton bike, literally just checking a box like with a TODO list) and have the deliberately status-oriented admission to a private channel be the larger, overarching hook. I think this approach generalises well to things that are not just alignment, because floops can be found in both calculating determinants and doing on-the-fly Fermi calculations for setting your base rates, and who wouldn’t want to be in the company of people who obsess endlessly about numbers between 0 and 1?
There seems to two major counter-claims to your project:
Feedback loops can’t work for nebulous domains, so this whole thing is misguided.
Transfer learning is impossible and you can’t get better at rationality by grinding LeetCode equivalents.
(There’s also the third major counter-claim that this can’t work for alignment research, but I assume that’s actually irrelevant since your main point seems to be about rationality training.)
My take is that these two claims stem from inappropriately applying an outcome-oriented mindset to a process-oriented problem. That is, the model seems to be: “we wanted to learn X and applied Feedback Loops™️ but it didn’t work, so there!” instead of “feedback-loopiness seems like an important property of a learning approach we can explicitly optimise for”.
In fact, we can probably factor out several senses of ‘feedback loops’ (henceforth just floops) that seem to be leading a lot of people to talk past each other in this thread:
Floops as reality pushing back against movement, e.g. the result of swinging a bat, the change in an animation when you change a slider in an Explorable Explanation
Floops where the feedback is quick but nebulous (e.g. persuasion, flirting)
Floops where the feedback is clear but slow (e.g. stock market)
Floops as reinforcement, i.e. the cycle Goal → Attempt → Result
Floops as OODA loops (less legible, more improvisational than previous item)
Floops where you don’t necessary choose the Goal (e.g. competitive multiplayer games, dealing with the death of a loved one)
Floops which are not actually loops, but a single Goal → Attempt → Result run (e.g., getting into your target uni)
Floops which are about getting your environment to support you doing one thing over and over again (e.g. writing habits, deliberate practice)
Floops which are cumulative (e.g. math)
Floops where it’s impossible to get sufficiently fine-grained feedback without the right paradigm (e.g. chemistry before Robert Boyle)
Floops where you don’t necessarily know the Goal going in (e.g. doing 1-on-1s at EA Global)
Floops where failure is ruinous and knocks you out of the game (e.g. high-rise parkour)
Anti-floops where the absence of an action is the thing that moves you towards the Goal
Floops that are too complex to update on a single result (e.g. planning, designing a system)
When someone says “you can’t possibly apply floops to research”, I imagine they’re coming from a place where they interpret goal-orientedness as an inherent requirement of floopiness. There are many bounded, close-ended things that one can use floops for that can clearly help the research process: stuff like grinding the prerequisites and becoming fluent with certain techniques (cf. Feynman’s toolbox approach to physics), writing papers quickly, developing one’s nose (e.g. by trying to forecast the number of citations of a new paper), etc.
This claim is independent of whether or not the person utilising floops is good enough to get better quickly. I think it’s not controversial to claim that you can never get a person with profound mental disabilities and who is not a savant at technical subjects to discover a new result in quantum field theory, but this is also irrelevant when talking about people who are baseline capable enough to worry about this things on LessWrong dot com in the first place.
On the other end of the spectrum, the reductio version of being against floops: that everyone was literally born with all the capabilities they would ever need in life and Learning Is Actually a Myth, seems blatantly false too. Optimising for floopiness seems to me merely trying to find a happy medium in between.
On an unrelated note, I wrote about how to package and scalably transfer floops a while back: https://www.lesswrong.com/posts/3CsynkTxNEdHDexTT/how-i-learned-to-stop-worrying-and-love-skill-trees
All modern games have two floops built-in: a core game loop that gets completed in under a minute, and a larger game loop that makes you come back for more. Or in the context of my project, Blackbelt:
IMAGE
The idea is you can design bespoke tests-of-skill to serve as your core game loop (e.g., a text box with a word counter underneath, the outputs of your Peloton bike, literally just checking a box like with a TODO list) and have the deliberately status-oriented admission to a private channel be the larger, overarching hook. I think this approach generalises well to things that are not just alignment, because floops can be found in both calculating determinants and doing on-the-fly Fermi calculations for setting your base rates, and who wouldn’t want to be in the company of people who obsess endlessly about numbers between 0 and 1?