Hmmm. It does seem like I should probably rewrite this post. But to clarify things in the meantime:
it’s not obvious to me that this is a realistic target, and I’d be surprised if it took fewer than 10 person-years to achieve.
I do think the knowledge should ‘cover’ all the athlete’s ingrained instincts in your example, but I think the propositions are allowed to look like “it’s a good idea to do x in case y”.
it’s not obvious to me that this is a realistic target
Perhaps I should instead have said: it’d be good to explain to people why this might be a useful/realistic target. Because if you need propositions that cover all the instincts, then it seems like you’re basically asking for people to revive GOFAI.
(I’m being unusually critical of your post because it seems that a number of safety research agendas lately have become very reliant on highly optimistic expectations about progress on interpretability, so I want to make sure that people are forced to defend that assumption rather than starting an information cascade.)
OK, the parenthetical helped me understand where you’re coming from. I think a re-write of this post should (in part) make clear that I think a massive heroic effort would be necessary to make this happen, but sometimes massive heroic efforts work, and I have no special private info that makes it seem more plausible than it looks a priori.
Hmmm. It does seem like I should probably rewrite this post. But to clarify things in the meantime:
it’s not obvious to me that this is a realistic target, and I’d be surprised if it took fewer than 10 person-years to achieve.
I do think the knowledge should ‘cover’ all the athlete’s ingrained instincts in your example, but I think the propositions are allowed to look like “it’s a good idea to do x in case y”.
Perhaps I should instead have said: it’d be good to explain to people why this might be a useful/realistic target. Because if you need propositions that cover all the instincts, then it seems like you’re basically asking for people to revive GOFAI.
(I’m being unusually critical of your post because it seems that a number of safety research agendas lately have become very reliant on highly optimistic expectations about progress on interpretability, so I want to make sure that people are forced to defend that assumption rather than starting an information cascade.)
OK, the parenthetical helped me understand where you’re coming from. I think a re-write of this post should (in part) make clear that I think a massive heroic effort would be necessary to make this happen, but sometimes massive heroic efforts work, and I have no special private info that makes it seem more plausible than it looks a priori.
Actually, hmm. My thoughts are not really in equilibrium here.
(Also: such a rewrite would be a combination of ‘what I really meant’ and ‘what the comments made me realize I should have really meant’)