So similarly, a human could try to understand Alice’s values in two ways. The first, equivalent to what you describe here for AI, is to just apply whatever learning algorithm their brain uses when observing Alice, and form an intuitive notion of “Alice’s values”. And the second is to apply explicit philosophical reasoning to this problem. So sure, you can possibly go a long way towards understanding Alice’s values by just doing the former, but is that enough to avoid disaster? (See Two Neglected Problems in Human-AI Safety for the kind of disaster I have in mind here.)
(I keep bringing up metaphilosophy but I’m pretty much resigned to be living in a part of the multiverse where civilization will just throw the dice and bet on AI safety not depending on solving it. What hope is there for our civilization to do what I think is the prudent thing, when no professional philosophers, even ones in EA who are concerned about AI safety, ever talk about it?)
I mostly agree with you here. I don’t think the chances of alignment by default are high. There are marginal gains to be had, but to get a high probability of alignment in the long term we will probably need actual understanding of the relevant philosophical problems.
So similarly, a human could try to understand Alice’s values in two ways. The first, equivalent to what you describe here for AI, is to just apply whatever learning algorithm their brain uses when observing Alice, and form an intuitive notion of “Alice’s values”. And the second is to apply explicit philosophical reasoning to this problem. So sure, you can possibly go a long way towards understanding Alice’s values by just doing the former, but is that enough to avoid disaster? (See Two Neglected Problems in Human-AI Safety for the kind of disaster I have in mind here.)
(I keep bringing up metaphilosophy but I’m pretty much resigned to be living in a part of the multiverse where civilization will just throw the dice and bet on AI safety not depending on solving it. What hope is there for our civilization to do what I think is the prudent thing, when no professional philosophers, even ones in EA who are concerned about AI safety, ever talk about it?)
I mostly agree with you here. I don’t think the chances of alignment by default are high. There are marginal gains to be had, but to get a high probability of alignment in the long term we will probably need actual understanding of the relevant philosophical problems.