The problem is not that you can “just meditate and come to good conclusions”, the problem is that “technical knowledge about actual machine learning results” doesn’t seem like good path either.
Like, we can get from NN trained to do modular addition the fact that it performs Fourier transform, because we clearly know what Fourier transform is, but I don’t see any clear path to get from neural network the fact that its output is both useful and safe, because we don’t have any practical operationalization of what “useful and safe” is. If we had solution to MIRI problem “which program being run on infinitely large computer produces aligned outcome”, we could try to understand how good NN in approximating this program, using aforementioned technical knowledge, and have substantial hope, for example.
I think the answer to the question of how well realistic NN-like systems with finite compute approximate the results of hypothetical utility maximizers with infinite compute is “not very well at all”.
So the MIRI train of thought, as I understand it, goes something like
You cannot predict the specific moves that a superhuman chess-playing AI will make, but you can predict that the final board state will be one in which the chess-playing AI has won.
The chess AI is able to do this because it can accurately predict the likely outcomes of its own actions, and so is able to compute the utility of each of its possible actions and then effectively do an argmax over them to pick the best one, which results in the best outcome according to its utility function.
Similarly, you will not be able to predict the specific actions that a “sufficiently powerful” utility maximizer will make, but you can predict that its utility function will be maximized.
For most utility functions about things in the real world, the configuration of matter that maximizes that utility function is not a configuration of matter that supports human life.
Actual future AI systems that will show up in the real world in the next few decades will be “sufficiently powerful” utility maximizers, and so this is a useful and predictive model of what the near future will look like.
I think the last few years in ML have made points 2 and 5 look particularly shaky here. For example, the actual architecture of the SOTA chess-playing systems doesn’t particularly resemble a cheaper version of the optimal-with-infinite-compute thing of “minmax over tree search”, but instead seems to be a different thing of “pile a bunch of situation-specific heuristics on top of each other, and then tweak the heuristics based on how well they do in practice”.
Which, for me at least, suggests that looking at what the optimal-with-infinite-compute thing would do might not be very informative for what actual systems which will show up in the next few decades will do.
The problem is not that you can “just meditate and come to good conclusions”, the problem is that “technical knowledge about actual machine learning results” doesn’t seem like good path either.
Like, we can get from NN trained to do modular addition the fact that it performs Fourier transform, because we clearly know what Fourier transform is, but I don’t see any clear path to get from neural network the fact that its output is both useful and safe, because we don’t have any practical operationalization of what “useful and safe” is. If we had solution to MIRI problem “which program being run on infinitely large computer produces aligned outcome”, we could try to understand how good NN in approximating this program, using aforementioned technical knowledge, and have substantial hope, for example.
I think the answer to the question of how well realistic NN-like systems with finite compute approximate the results of hypothetical utility maximizers with infinite compute is “not very well at all”.
So the MIRI train of thought, as I understand it, goes something like
You cannot predict the specific moves that a superhuman chess-playing AI will make, but you can predict that the final board state will be one in which the chess-playing AI has won.
The chess AI is able to do this because it can accurately predict the likely outcomes of its own actions, and so is able to compute the utility of each of its possible actions and then effectively do an
argmax
over them to pick the best one, which results in the best outcome according to its utility function.Similarly, you will not be able to predict the specific actions that a “sufficiently powerful” utility maximizer will make, but you can predict that its utility function will be maximized.
For most utility functions about things in the real world, the configuration of matter that maximizes that utility function is not a configuration of matter that supports human life.
Actual future AI systems that will show up in the real world in the next few decades will be “sufficiently powerful” utility maximizers, and so this is a useful and predictive model of what the near future will look like.
I think the last few years in ML have made points 2 and 5 look particularly shaky here. For example, the actual architecture of the SOTA chess-playing systems doesn’t particularly resemble a cheaper version of the optimal-with-infinite-compute thing of “minmax over tree search”, but instead seems to be a different thing of “pile a bunch of situation-specific heuristics on top of each other, and then tweak the heuristics based on how well they do in practice”.
Which, for me at least, suggests that looking at what the optimal-with-infinite-compute thing would do might not be very informative for what actual systems which will show up in the next few decades will do.