My feedback is that safety based on total understanding is a boondoggle. (This feedback is also only approximately aimed at you, take what you will.)
Blue sky research is fine, and in a lot of academia the way you get blue sky research funded is by promising the grantmaker that if you just understand the thing, that will directly let you solve some real-world problem, but usually that’s BS. And it’s BS that it’s easy to fall into sorta-believing because it’s convenient. Still, I think one can sometimes make a good case for blue sky research on transformers that doesn’t promise that we’re going to get safety by totally understanding a model.
In AI safety, blue sky research is less good than normal because we have to be doing differential technological development—advancing safe futures more than unsafe ones. So ideally, research agendas do have some argument about why they differentially advance safety. And if you find some argument about how to advance safety (other than the total understanding thing), ideally that should even inform what research you do.
My feedback is that safety based on total understanding is a boondoggle. (This feedback is also only approximately aimed at you, take what you will.)
Blue sky research is fine, and in a lot of academia the way you get blue sky research funded is by promising the grantmaker that if you just understand the thing, that will directly let you solve some real-world problem, but usually that’s BS. And it’s BS that it’s easy to fall into sorta-believing because it’s convenient. Still, I think one can sometimes make a good case for blue sky research on transformers that doesn’t promise that we’re going to get safety by totally understanding a model.
In AI safety, blue sky research is less good than normal because we have to be doing differential technological development—advancing safe futures more than unsafe ones. So ideally, research agendas do have some argument about why they differentially advance safety. And if you find some argument about how to advance safety (other than the total understanding thing), ideally that should even inform what research you do.