I think the reason this approach has been avoided is that we don’t even know how to specify what a solution to alignment looks like.
So the failure case is that we accomplish all this, the public gets excited, and they build a giant titanium box to prevent the AI to escape, completely missing the point. Even if they do understand for the most part, that doesn’t mean we will be able to incentive good AI alignment solutions over bad in an institutional way. To do so, we would need to encode a solution to AI alignment into the institution itself.
There are three possible futures: 1) nobody ever cares and nothing happens until AI ruin, 2) the public is finally spooked by capabilities advancement and the government acts, but out of ignorance does something like building a literal box, and 3) the public and the government gain an appreciation of the reality of the situation and take actually useful actions. What I was trying to convey is that Future 3 surely has a higher probability in a universe where we decide to think about how to increase its probability, relative to a universe in which we don’t think about it and let the default outcome happen.
And however low our probability of reaching a good solution, surely it’s higher than the probability that the public and the government will reach a good solution on their own. If we don’t have enough information to take probabilities-increasing action, it seems like it would be useful to think until we either do have enough information to take probabilities-increasing action, or have enough information to decide that the optimal strategy is to not act. What worries me is that our strategy doesn’t appear to have been thought about very much at all.
I think the reason this approach has been avoided is that we don’t even know how to specify what a solution to alignment looks like.
So the failure case is that we accomplish all this, the public gets excited, and they build a giant titanium box to prevent the AI to escape, completely missing the point. Even if they do understand for the most part, that doesn’t mean we will be able to incentive good AI alignment solutions over bad in an institutional way. To do so, we would need to encode a solution to AI alignment into the institution itself.
There are three possible futures: 1) nobody ever cares and nothing happens until AI ruin, 2) the public is finally spooked by capabilities advancement and the government acts, but out of ignorance does something like building a literal box, and 3) the public and the government gain an appreciation of the reality of the situation and take actually useful actions. What I was trying to convey is that Future 3 surely has a higher probability in a universe where we decide to think about how to increase its probability, relative to a universe in which we don’t think about it and let the default outcome happen.
And however low our probability of reaching a good solution, surely it’s higher than the probability that the public and the government will reach a good solution on their own. If we don’t have enough information to take probabilities-increasing action, it seems like it would be useful to think until we either do have enough information to take probabilities-increasing action, or have enough information to decide that the optimal strategy is to not act. What worries me is that our strategy doesn’t appear to have been thought about very much at all.