Then we could try to predict which areas will see big gains from neural networks in the next few years, and which parts of Friendliness become easy or hard as a result. Is anyone at MIRI working on that?
If they did that, then what? Try to convince NN researchers to attack the parts of Friendliness that look hard? That seems difficult for MIRI to do given where they’ve invested in building their reputation (i.e., among decision theorists and mathematicians instead of in the ML community). (It would really depend on people trusting their experience and judgment since it’s hard to see how much one could offer in the form of either mathematical proof or clearly relevant empirical evidence.) You’d have a better chance if the work was carried out by some other organization. But even if that organization got NN researchers to take its results seriously, what incentives do they have to attack parts of Friendliness that seem especially hard, instead of doing what they’ve been doing, i.e., racing as fast as they can for the next milestone in capability?
Or is the idea to bet on the off chance that building an FAI with NN turns out to be easy enough that MIRI and like-minded researchers can solve the associated Friendliness problems themselves and then hand the solutions to whoever ends up leading the AGI race, and they can just plug the solutions in at little cost to their winning the race?
Or you’re suggesting aiming/hoping for some feasible combination of both, I guess. It seems pretty similar to what Paul Christiano is doing, except he has “generic AI technology” in place of “NN” above. To me, the chance of success of this approach seems low enough that it’s not obviously superior to what MIRI is doing (namely, in my view, betting on the off chance that the contrarian AI approach they’re taking ends up being much easier/better than the mainstream approach, which is looking increasingly unlikely but still not impossible).
If they did that, then what? Try to convince NN researchers to attack the parts of Friendliness that look hard? That seems difficult for MIRI to do given where they’ve invested in building their reputation (i.e., among decision theorists and mathematicians instead of in the ML community). (It would really depend on people trusting their experience and judgment since it’s hard to see how much one could offer in the form of either mathematical proof or clearly relevant empirical evidence.) You’d have a better chance if the work was carried out by some other organization. But even if that organization got NN researchers to take its results seriously, what incentives do they have to attack parts of Friendliness that seem especially hard, instead of doing what they’ve been doing, i.e., racing as fast as they can for the next milestone in capability?
Or is the idea to bet on the off chance that building an FAI with NN turns out to be easy enough that MIRI and like-minded researchers can solve the associated Friendliness problems themselves and then hand the solutions to whoever ends up leading the AGI race, and they can just plug the solutions in at little cost to their winning the race?
Or you’re suggesting aiming/hoping for some feasible combination of both, I guess. It seems pretty similar to what Paul Christiano is doing, except he has “generic AI technology” in place of “NN” above. To me, the chance of success of this approach seems low enough that it’s not obviously superior to what MIRI is doing (namely, in my view, betting on the off chance that the contrarian AI approach they’re taking ends up being much easier/better than the mainstream approach, which is looking increasingly unlikely but still not impossible).