“I don’t want to talk about (blah) aspect of how I think future AGI will be built, because all my opinions are either wrong or infohazards—the latter because (if correct) they might substantially speed the arrival of AGI, which gives us less time for safety / alignment research.”
It seems to me that infohazards are the unstated controversy behind this post. The researchers you are in debate with don’t believe in infohazards, or more precisely they believe that framing problems as infohazards makes progress impossible since you can’t solve an engineering problem if you are not allowed to talk about it freely.
Presumably in the endgame, there will be no infohazards since all the important dangerous secrets are already widely known, or it’s too late to keep secrets anyway. I think most researchers would prefer to work in an environment where they didn’t have to deal with censorship. Therefore, if we can work as if it was the endgame already, then we might make more progress. That is the impetus behind getting to the endgame.
I feel like this comment combines my “Bad Argument 1” with “Bad Argument 2”! If it doesn’t, then what am I missing? Or if it does, then do you think one or both of my “Bad Arguments” are not actually bad arguments?
Let’s say I have Secret Knowledge X, and let’s assume (generously!) that this knowledge is correct as opposed to wrong. And let’s say that if I share Secret Knowledge X, it would enable you to figure out Alignment Idea Y. But also assume that Secret Knowledge X is a key ingredient for building AGI.
Your proposal is: I should share Secret Knowledge X so that you can get to work on Alignment Idea Y.
My counter-proposal is: Somebody is going to publish Secret Knowledge X sooner or later on arxiv. And when they do, then you can go right ahead and figure out Alignment Idea Y. And in the meantime, there are plenty of other productive alignment-related things that you can do with your time. I listed some of them in the post.
(Alternatively, maybe nobody will ever publish Secret Knowledge X, but rather it will be discovered at DeepMind and kept secret from competitors. In that case, then someone on the DeepMind Safety Team can figure out Alignment Idea Y. And by the way, I’m super happy that in this scenario, DeepMind can go slower and spend more time on endgame safety, thanks to the fact that Secret Knowledge X has remained secret.)
Great post!
It seems to me that infohazards are the unstated controversy behind this post. The researchers you are in debate with don’t believe in infohazards, or more precisely they believe that framing problems as infohazards makes progress impossible since you can’t solve an engineering problem if you are not allowed to talk about it freely.
Presumably in the endgame, there will be no infohazards since all the important dangerous secrets are already widely known, or it’s too late to keep secrets anyway. I think most researchers would prefer to work in an environment where they didn’t have to deal with censorship. Therefore, if we can work as if it was the endgame already, then we might make more progress. That is the impetus behind getting to the endgame.
I feel like this comment combines my “Bad Argument 1” with “Bad Argument 2”! If it doesn’t, then what am I missing? Or if it does, then do you think one or both of my “Bad Arguments” are not actually bad arguments?
Let’s say I have Secret Knowledge X, and let’s assume (generously!) that this knowledge is correct as opposed to wrong. And let’s say that if I share Secret Knowledge X, it would enable you to figure out Alignment Idea Y. But also assume that Secret Knowledge X is a key ingredient for building AGI.
Your proposal is: I should share Secret Knowledge X so that you can get to work on Alignment Idea Y.
My counter-proposal is: Somebody is going to publish Secret Knowledge X sooner or later on arxiv. And when they do, then you can go right ahead and figure out Alignment Idea Y. And in the meantime, there are plenty of other productive alignment-related things that you can do with your time. I listed some of them in the post.
(Alternatively, maybe nobody will ever publish Secret Knowledge X, but rather it will be discovered at DeepMind and kept secret from competitors. In that case, then someone on the DeepMind Safety Team can figure out Alignment Idea Y. And by the way, I’m super happy that in this scenario, DeepMind can go slower and spend more time on endgame safety, thanks to the fact that Secret Knowledge X has remained secret.)