Your counterarguments were pretty good—I agree with numbers 2-4. It seems like this is an expensive strategy that, in cases where we’re already fairly doomed, multiplies our chances of success from epsilon to 1.1 times epsilon.
For 3, I think your rebuttal doesn’t really make a strong case that building aligned AI is so unlikely that we can neglect costs to that case.
For 4, your rebuttal seems to be this outside view argument for why online discussion isn’t that important (therefore stopping it wouldn’t be costly), which neglects the actual fact on the ground that yes, online forums actually are important.
But maybe you can rescue an argument for lower-cost implementation of this general idea, just trying to reduce information leakage about auditing measures into the training dataset, rather than trying to make them as low as reasonably achievable.
Thanks so much for your helpful comment, Charlie! I really appreciate it.
It is likely that our cruxes are the following. I think that (1) we probably cannot predict the precise moment the AGI becomes agentic and/or dangerous, (2) we probably won’t have a strong credence that a specific alignment plan will succeed, and (3) AGI takeoff will be slow enough that secrecy can be a key difference-maker in whether we die or not.
So, I expect we will have alignment plan numbers 1, 2, 3, and so on. We will try alignment plan 1, but it will probably not succeed (and hopefully we can see signs of it not succeeding early enough that we shut it down and try alignment plan 2). If we can safely empirically iterate, we will find an alignment plan N that works.
This is risky and we could very well die (although I think the probability is not unconditionally 100%). This is why I think not building AGI is by far the best strategy (Corollary: I place a lot of comparative optimism on AI governance and coordination efforts.). The above discussion is conditional on trying to build an aligned AGI.
I think with extensive discussion, planning, and execution, we can have a Manhattan-Project-esque shift in research norms that maintains much of the ease-of-research for us AI safety researchers, but achieves secrecy and thereby valuable AI safety plans. If this can be achieved with a not-too-high of a resource cost, I think it is likely a good idea: and I think there is at least a small probability that it will “result in an x-risk reduction that is, on a per-dollar level, maximal among past and current EA projects on x-risk reduction.”
Your counterarguments were pretty good—I agree with numbers 2-4. It seems like this is an expensive strategy that, in cases where we’re already fairly doomed, multiplies our chances of success from epsilon to 1.1 times epsilon.
For 3, I think your rebuttal doesn’t really make a strong case that building aligned AI is so unlikely that we can neglect costs to that case.
For 4, your rebuttal seems to be this outside view argument for why online discussion isn’t that important (therefore stopping it wouldn’t be costly), which neglects the actual fact on the ground that yes, online forums actually are important.
But maybe you can rescue an argument for lower-cost implementation of this general idea, just trying to reduce information leakage about auditing measures into the training dataset, rather than trying to make them as low as reasonably achievable.
Thanks so much for your helpful comment, Charlie! I really appreciate it.
It is likely that our cruxes are the following. I think that (1) we probably cannot predict the precise moment the AGI becomes agentic and/or dangerous, (2) we probably won’t have a strong credence that a specific alignment plan will succeed, and (3) AGI takeoff will be slow enough that secrecy can be a key difference-maker in whether we die or not.
So, I expect we will have alignment plan numbers 1, 2, 3, and so on. We will try alignment plan 1, but it will probably not succeed (and hopefully we can see signs of it not succeeding early enough that we shut it down and try alignment plan 2). If we can safely empirically iterate, we will find an alignment plan N that works.
This is risky and we could very well die (although I think the probability is not unconditionally 100%). This is why I think not building AGI is by far the best strategy (Corollary: I place a lot of comparative optimism on AI governance and coordination efforts.). The above discussion is conditional on trying to build an aligned AGI.
I think with extensive discussion, planning, and execution, we can have a Manhattan-Project-esque shift in research norms that maintains much of the ease-of-research for us AI safety researchers, but achieves secrecy and thereby valuable AI safety plans. If this can be achieved with a not-too-high of a resource cost, I think it is likely a good idea: and I think there is at least a small probability that it will “result in an x-risk reduction that is, on a per-dollar level, maximal among past and current EA projects on x-risk reduction.”