Yeah, my impression is similarly that focus on feedback loops is closer to “the core thing that’s gone wrong so far with alignment research,” than to “the core thing that’s been missing.” I wouldn’t normally put it this way, since I think many types of feedback loops are great, and since obviously in the end alignment research is useless unless it helps us better engineer AI systems in the actual territory, etc.
(And also because some examples of focus on tight feedback loops, like Faraday’s research, strike me as exceedingly excellent, although I haven’t really figured out yet why his work seems so much closer to the spirit we need than e.g. thinking physics problems).
Like, all else equal, it clearly seems better to have better empirical feedback; I think my objection is mostly that in practice, focus on this seems to lead people to premature formalization, or to otherwise constraining their lines of inquiry to those whose steps are easy to explain/justify along the way.
Another way to put this: most examples I’ve seen of people trying to practice attending to tight feedback have involved them focusing on trivial problems, like simple video games or toy already-solved science problems, and I think this isn’t a coincidence. So while I share your sense Raemon that transfer learning seems possible here, my guess is that this sort of practice mostly transfers within the domain of other trivial problems, where solutions (or at least methods for locating solutions) are already known, and hence where it’s easy to verify you’re making progress along the way.
Another way to put this: most examples I’ve seen of people trying to practice attending to tight feedback have involved them focusing on trivial problems, like simple video games or toy already-solved science problems
One thing is I just… haven’t actually seen instances of feedbackloops on already-solved-science-problems being used? Maybe they are used and I haven’t run into them. but I’ve barely heard of anyone tackling exercises with the frame “get 95% accuracy on Thinking-Physics-esque problems, taking as long as you want to think, where the primary thing you’re grading yourself on is ‘did you invent better ways of thinking?’”. So it seemed like the obvious place to start.
(And also because some examples of focus on tight feedback loops, like Faraday’s research, strike me as exceedingly excellent, although I haven’t really figured out yet why his work seems so much closer to the spirit we need than e.g. thinking physics problems).
I just meant that Faraday’s research strikes me as counterevidence for the claim I was making—he had excellent feedback loops, yet also seems to me to have had excellent pre-paradigmatic research taste/next-question-generating skill of the sort my prior suggests generally trades off against strong focus on quickly-checkable claims. So maybe my prior is missing something!
Yeah, my impression is similarly that focus on feedback loops is closer to “the core thing that’s gone wrong so far with alignment research,” than to “the core thing that’s been missing.” I wouldn’t normally put it this way, since I think many types of feedback loops are great, and since obviously in the end alignment research is useless unless it helps us better engineer AI systems in the actual territory, etc.
(And also because some examples of focus on tight feedback loops, like Faraday’s research, strike me as exceedingly excellent, although I haven’t really figured out yet why his work seems so much closer to the spirit we need than e.g. thinking physics problems).
Like, all else equal, it clearly seems better to have better empirical feedback; I think my objection is mostly that in practice, focus on this seems to lead people to premature formalization, or to otherwise constraining their lines of inquiry to those whose steps are easy to explain/justify along the way.
Another way to put this: most examples I’ve seen of people trying to practice attending to tight feedback have involved them focusing on trivial problems, like simple video games or toy already-solved science problems, and I think this isn’t a coincidence. So while I share your sense Raemon that transfer learning seems possible here, my guess is that this sort of practice mostly transfers within the domain of other trivial problems, where solutions (or at least methods for locating solutions) are already known, and hence where it’s easy to verify you’re making progress along the way.
One thing is I just… haven’t actually seen instances of feedbackloops on already-solved-science-problems being used? Maybe they are used and I haven’t run into them. but I’ve barely heard of anyone tackling exercises with the frame “get 95% accuracy on Thinking-Physics-esque problems, taking as long as you want to think, where the primary thing you’re grading yourself on is ‘did you invent better ways of thinking?’”. So it seemed like the obvious place to start.
Can you say more about what you mean here?
I just meant that Faraday’s research strikes me as counterevidence for the claim I was making—he had excellent feedback loops, yet also seems to me to have had excellent pre-paradigmatic research taste/next-question-generating skill of the sort my prior suggests generally trades off against strong focus on quickly-checkable claims. So maybe my prior is missing something!