It’s important to have a model of whether AlphaGo is trustworthy before you should trust it. Knowing either (a) it beat all the grandmasters or (b) its architecture and amount of compute it used, is necessary for me to take on its policies. (This is sort of the point of Inadequate Equilibria—you need to make models of the trustworthiness of experts.)
Btw in some cases, it _may_ make sense to throw away part of you model and replace it just by the opinion of the expert. I’m not sure I can describe it clearly without drawing
I think I’d say something like: it may become possible to download the policy network of AlphaGo—to learn what abstractions it’s using and what it pays attention to. And AlphaGo may not be able to tell you what experiences it had that lead to the policy network (it’s hard to communicate tens of thousands of game’s worth of experience). Yet you should probably just replace your Go models with the policy network of AlphaGo if you’re offered the choice.
A thing I’m currently confused on regarding this, is how much one is able to further update after downloading the policy network of such an expert. How much evidence should persuade you to change your mind, as opposed to you expecting that info to be built into the policy network you have?
From human perspective, such parts of our cognitive infrastructure are often not really accessible, giving their results in the form of “intuition”. What’s worse, other part of our cognitive infrastructure are very good in making fake stories explaining the intuitions, so if you are motivated enough your mind may generate plausible but in fact fake models for your intuition. In worst case you discard the good intuition when someone shows you problems in the fake model.
You’re right, it does seem that a bunch of important stuff is in the subtle, hard-to-make-explicit reasoning that we do. I’m confused about how to deal with this, but I’ll say two things:
This is where the most important stuff is, try to focus your cognitive work on making this explicit and trading these models. Oliver likes to call this ‘communicating taste’.
Practice not dismissing the intuitions in yourself—use bucket protections if need-be.
I’m reminded of the feeling when I’m processing a new insight, and someone asks me to explain it immediately, and I try to quickly convey this simple insight, but words I say end up making no sense (and this is not because I don’t understand it). If I can’t communicate something, one of the key skills here is acting on your models anyway, even if were you to explain your reasoning it might sound inconsistent or like you were making a bucket error.
And yeah, for the people who you think its really valuable to download the policy networks from, it’s worth spending the time building intuitions that match theirs. I feel like I often look at a situation using my inner Hansonian-lens, and predict what he’ll say (sometimes successfully), even while I can’t explicitly state the principles Robin is using.
I don’t think the things I’ve just said constitute a clear solution to the problem you raised, and I think my original post is missing some key part that you correctly pointed to.
In retrospect, I think this comment of mine didn’t address Jan’s key point, which is that we often form intuitions/emotions by running a process analagous to aggregating data into a summary statistic and then throwing away the data. Now the evidence we saw is quite incommunicable—we no longer have the evidence ourselves.
Ray Arnold gave me a good example the other day of two people—one an individualist libertarian, the other a communitarian Christian. In the example these two people deeply disagree on how society should be set up, and this is entirely because they’re two identical RL systems built on different training sets (one has repeatedly seen the costs of trying to trust others with your values, and the other has repeatedly seen it work out brilliantly). Their brains have compressed the data into a single emotion, that they feel in groups trying to coordinate (say). Overall they might be able to introspect enough to communicate the causes of their beliefs, but they might not—they might just be stuck this way (until we reach the glorious transhumanist future, that is). Scott might expect them to say they just have fundamental value differences.
I agree that I have not in the OP given a full model of the different parts of the brain, how they do reasoning, and which parts are (or aren’t) in principle communicable or trustworthy. I at least claim that I’ve pointed to a vague mechanism that’s more true than the simple model where everyone just has the outputs of their beliefs. There are important gears that are hard-but-possible to communicate, and they’re generally worth focusing on over and above the credences they output. (Will write more on this in a future post about Aumann’s Agreement Theorem.)
(I found your comment really clear and helpful.)
It’s important to have a model of whether AlphaGo is trustworthy before you should trust it. Knowing either (a) it beat all the grandmasters or (b) its architecture and amount of compute it used, is necessary for me to take on its policies. (This is sort of the point of Inadequate Equilibria—you need to make models of the trustworthiness of experts.)
I think I’d say something like: it may become possible to download the policy network of AlphaGo—to learn what abstractions it’s using and what it pays attention to. And AlphaGo may not be able to tell you what experiences it had that lead to the policy network (it’s hard to communicate tens of thousands of game’s worth of experience). Yet you should probably just replace your Go models with the policy network of AlphaGo if you’re offered the choice.
A thing I’m currently confused on regarding this, is how much one is able to further update after downloading the policy network of such an expert. How much evidence should persuade you to change your mind, as opposed to you expecting that info to be built into the policy network you have?
You’re right, it does seem that a bunch of important stuff is in the subtle, hard-to-make-explicit reasoning that we do. I’m confused about how to deal with this, but I’ll say two things:
This is where the most important stuff is, try to focus your cognitive work on making this explicit and trading these models. Oliver likes to call this ‘communicating taste’.
Practice not dismissing the intuitions in yourself—use bucket protections if need-be.
I’m reminded of the feeling when I’m processing a new insight, and someone asks me to explain it immediately, and I try to quickly convey this simple insight, but words I say end up making no sense (and this is not because I don’t understand it). If I can’t communicate something, one of the key skills here is acting on your models anyway, even if were you to explain your reasoning it might sound inconsistent or like you were making a bucket error.
And yeah, for the people who you think its really valuable to download the policy networks from, it’s worth spending the time building intuitions that match theirs. I feel like I often look at a situation using my inner Hansonian-lens, and predict what he’ll say (sometimes successfully), even while I can’t explicitly state the principles Robin is using.
I don’t think the things I’ve just said constitute a clear solution to the problem you raised, and I think my original post is missing some key part that you correctly pointed to.
In retrospect, I think this comment of mine didn’t address Jan’s key point, which is that we often form intuitions/emotions by running a process analagous to aggregating data into a summary statistic and then throwing away the data. Now the evidence we saw is quite incommunicable—we no longer have the evidence ourselves.
Ray Arnold gave me a good example the other day of two people—one an individualist libertarian, the other a communitarian Christian. In the example these two people deeply disagree on how society should be set up, and this is entirely because they’re two identical RL systems built on different training sets (one has repeatedly seen the costs of trying to trust others with your values, and the other has repeatedly seen it work out brilliantly). Their brains have compressed the data into a single emotion, that they feel in groups trying to coordinate (say). Overall they might be able to introspect enough to communicate the causes of their beliefs, but they might not—they might just be stuck this way (until we reach the glorious transhumanist future, that is). Scott might expect them to say they just have fundamental value differences.
I agree that I have not in the OP given a full model of the different parts of the brain, how they do reasoning, and which parts are (or aren’t) in principle communicable or trustworthy. I at least claim that I’ve pointed to a vague mechanism that’s more true than the simple model where everyone just has the outputs of their beliefs. There are important gears that are hard-but-possible to communicate, and they’re generally worth focusing on over and above the credences they output. (Will write more on this in a future post about Aumann’s Agreement Theorem.)