Thanks so much for your helpful comment, Charlie! I really appreciate it.
It is likely that our cruxes are the following. I think that (1) we probably cannot predict the precise moment the AGI becomes agentic and/or dangerous, (2) we probably won’t have a strong credence that a specific alignment plan will succeed, and (3) AGI takeoff will be slow enough that secrecy can be a key difference-maker in whether we die or not.
So, I expect we will have alignment plan numbers 1, 2, 3, and so on. We will try alignment plan 1, but it will probably not succeed (and hopefully we can see signs of it not succeeding early enough that we shut it down and try alignment plan 2). If we can safely empirically iterate, we will find an alignment plan N that works.
This is risky and we could very well die (although I think the probability is not unconditionally 100%). This is why I think not building AGI is by far the best strategy (Corollary: I place a lot of comparative optimism on AI governance and coordination efforts.). The above discussion is conditional on trying to build an aligned AGI.
I think with extensive discussion, planning, and execution, we can have a Manhattan-Project-esque shift in research norms that maintains much of the ease-of-research for us AI safety researchers, but achieves secrecy and thereby valuable AI safety plans. If this can be achieved with a not-too-high of a resource cost, I think it is likely a good idea: and I think there is at least a small probability that it will “result in an x-risk reduction that is, on a per-dollar level, maximal among past and current EA projects on x-risk reduction.”
Thanks so much for your helpful comment, Charlie! I really appreciate it.
It is likely that our cruxes are the following. I think that (1) we probably cannot predict the precise moment the AGI becomes agentic and/or dangerous, (2) we probably won’t have a strong credence that a specific alignment plan will succeed, and (3) AGI takeoff will be slow enough that secrecy can be a key difference-maker in whether we die or not.
So, I expect we will have alignment plan numbers 1, 2, 3, and so on. We will try alignment plan 1, but it will probably not succeed (and hopefully we can see signs of it not succeeding early enough that we shut it down and try alignment plan 2). If we can safely empirically iterate, we will find an alignment plan N that works.
This is risky and we could very well die (although I think the probability is not unconditionally 100%). This is why I think not building AGI is by far the best strategy (Corollary: I place a lot of comparative optimism on AI governance and coordination efforts.). The above discussion is conditional on trying to build an aligned AGI.
I think with extensive discussion, planning, and execution, we can have a Manhattan-Project-esque shift in research norms that maintains much of the ease-of-research for us AI safety researchers, but achieves secrecy and thereby valuable AI safety plans. If this can be achieved with a not-too-high of a resource cost, I think it is likely a good idea: and I think there is at least a small probability that it will “result in an x-risk reduction that is, on a per-dollar level, maximal among past and current EA projects on x-risk reduction.”