My understanding of MIRI’s undisclosed work is that they expect there are probably not findable ways to make sufficiently advanced machine learning systems not bring in powerful optimisers that exploit your system, and that gradient descent is not a feasible way to create optimisation power that is securable in this way, so they’re working on an alternative basis for dong optimisation. It sounds like they’re making some progress, but I can’t know if it’s fast enough or going to work out.
As to whether the work is likely to make AI systems more useful, MIRI’s write-up explains their thinking, under the header “It is difficult to predict whether successful deconfusion work could spark capability advances”. They give two examples of basic work that helps you reason in basic ways about a system, where one (interval arithmetic, which allows you to place hard bounds on the possible error of your calculations) has no ability to speed up a system, but the other (probability theory, which is central to understanding modern image classifiers) does. They’re not clear which kind of research they’ll end up producing, but I can see why one would be only helpful for security whereas the other is also helpful for capabilities research.
MIRI’s Undisclosed Work and Security
My understanding of MIRI’s undisclosed work is that they expect there are probably not findable ways to make sufficiently advanced machine learning systems not bring in powerful optimisers that exploit your system, and that gradient descent is not a feasible way to create optimisation power that is securable in this way, so they’re working on an alternative basis for dong optimisation. It sounds like they’re making some progress, but I can’t know if it’s fast enough or going to work out.
As to whether the work is likely to make AI systems more useful, MIRI’s write-up explains their thinking, under the header “It is difficult to predict whether successful deconfusion work could spark capability advances”. They give two examples of basic work that helps you reason in basic ways about a system, where one (interval arithmetic, which allows you to place hard bounds on the possible error of your calculations) has no ability to speed up a system, but the other (probability theory, which is central to understanding modern image classifiers) does. They’re not clear which kind of research they’ll end up producing, but I can see why one would be only helpful for security whereas the other is also helpful for capabilities research.