Perhaps the bot knows different things at different times and your job is to figure out (a) what it always knows and (b) a way to quickly find out everything it knows at a certain point in time.
I think at this point you’ve pushed the word “know” to a point where it’s not very well-defined; I’d encourage you to try to restate the original post while tabooing that word.
This seems particularly valuable because there are some versions of “know” for which the goal of knowing everything a complex model knows seems wildly unmanageable (for example, trying to convert a human athlete’s ingrained instincts into a set of propositions). So before people start trying to do what you suggested, it’d be good to explain why it’s actually a realistic target.
Hmmm. It does seem like I should probably rewrite this post. But to clarify things in the meantime:
it’s not obvious to me that this is a realistic target, and I’d be surprised if it took fewer than 10 person-years to achieve.
I do think the knowledge should ‘cover’ all the athlete’s ingrained instincts in your example, but I think the propositions are allowed to look like “it’s a good idea to do x in case y”.
it’s not obvious to me that this is a realistic target
Perhaps I should instead have said: it’d be good to explain to people why this might be a useful/realistic target. Because if you need propositions that cover all the instincts, then it seems like you’re basically asking for people to revive GOFAI.
(I’m being unusually critical of your post because it seems that a number of safety research agendas lately have become very reliant on highly optimistic expectations about progress on interpretability, so I want to make sure that people are forced to defend that assumption rather than starting an information cascade.)
OK, the parenthetical helped me understand where you’re coming from. I think a re-write of this post should (in part) make clear that I think a massive heroic effort would be necessary to make this happen, but sometimes massive heroic efforts work, and I have no special private info that makes it seem more plausible than it looks a priori.
Perhaps the bot knows different things at different times and your job is to figure out (a) what it always knows and (b) a way to quickly find out everything it knows at a certain point in time.
I think at this point you’ve pushed the word “know” to a point where it’s not very well-defined; I’d encourage you to try to restate the original post while tabooing that word.
This seems particularly valuable because there are some versions of “know” for which the goal of knowing everything a complex model knows seems wildly unmanageable (for example, trying to convert a human athlete’s ingrained instincts into a set of propositions). So before people start trying to do what you suggested, it’d be good to explain why it’s actually a realistic target.
Hmmm. It does seem like I should probably rewrite this post. But to clarify things in the meantime:
it’s not obvious to me that this is a realistic target, and I’d be surprised if it took fewer than 10 person-years to achieve.
I do think the knowledge should ‘cover’ all the athlete’s ingrained instincts in your example, but I think the propositions are allowed to look like “it’s a good idea to do x in case y”.
Perhaps I should instead have said: it’d be good to explain to people why this might be a useful/realistic target. Because if you need propositions that cover all the instincts, then it seems like you’re basically asking for people to revive GOFAI.
(I’m being unusually critical of your post because it seems that a number of safety research agendas lately have become very reliant on highly optimistic expectations about progress on interpretability, so I want to make sure that people are forced to defend that assumption rather than starting an information cascade.)
OK, the parenthetical helped me understand where you’re coming from. I think a re-write of this post should (in part) make clear that I think a massive heroic effort would be necessary to make this happen, but sometimes massive heroic efforts work, and I have no special private info that makes it seem more plausible than it looks a priori.
Actually, hmm. My thoughts are not really in equilibrium here.
(Also: such a rewrite would be a combination of ‘what I really meant’ and ‘what the comments made me realize I should have really meant’)