Dalcy comments on Feedbackloop-first Rationality

Dalcy 8 Aug 2023 2:23 UTC
5 points
1
I am very interested in this, especially in the context of alignment research and solving not-yet-understood problems in general. Since I have no strong commitments this month (and was going to do something similar to this anyways), I will try this every day for the next two weeks and report back on how it goes (writing this comment as a commitment mechanism!)
Have a large group of people attempt to practice problems from each domain, randomizing the order that they each tackle the problems in. (The ideal version of this takes a few months)
...
As part of each problem, they do meta-reflection on “how to think better”, aiming specifically to extract general insights and intuitions. They check what processes seemed to actually lead to the answer, even when they switch to a new domain they haven’t studied before.
Within this upper-level feedback loop (at the scale of whole problems, taking hours or days), I’m guessing a lower-level loop would involve something like cognitive strategy tuning to get real-time feedback as you’re solving the problems?
- Raemon 8 Aug 2023 2:45 UTC
  4 points
  2
  Parent
  Yeah. I have another post brewing that a) sort of apologizes for the sort of excessive number of feedback loops going on here, b) explaining in detail why they are necessary and how they fit together. But here is a rough draft of it.
  The most straightforward loops you have, before you have get into Cognitive Tuning, are:
  1. Object level “did I succeed at this task?”
  2. Have I gotten better at this task as I’ve practiced it?
  3. As I try out other domains I’m unpracticed on, do I seem to be able to apply skills I learned from previous ones and at least subjectively feel like I’m transfer learning?
  4. Hypothetical expensive science experiment: if I do the exhaustive experiment described in this blogpost, do I verifiably get better at some kind of transfer learning?
  For connecting it to your real life, there’s an additional set of loops like:
  1. When I reflect on what my actual problems or skill-limitations are at my day job, what sort of exercises do I think would help? (these can be “more exercise-like” or “more like just adding a reflection step to my existing day-job”)
  2. When I do those exercises, does it seem like they improve my situation with my day-job or main project?
  3. Does that transfer seem to stick / remain relevant over time?
  Re: “tuning your cognitive algorithms”, these sort of slot inside the object level #1 exercise in each of the previous lists. Within an exercise (or real world task), you can notice “do I seem to be stuck? Does it feel like my current train of thought is useful? Do I have a creeping sense that I’m going down a unproductive rabbit hole and rationalizing it as progress?”
  But there is a danger to over-relying on these internal, subjective feedback loops. So there’s an additional upper level loop of, after getting an exercise right (or wrong), asking “which of my metacognitive intuitions actually turned out to be right?”, and becoming calibrated on how trustworthy those are. (And hopefully making them more trustworthy)