B. Civilization allocates more resources for alignment and is more conservative pushing capabilities.
C. This reallocation is sufficient to solve and deploy aligned AGI before the world is destroyed.
I think that a warning shot is unlikely (P(A) < 10%), but won’t get into that here.
I am guessing that P(B | A) is the biggest crux. The OP primarily considers the ability of governments to implement policy that moves our civilization further from AGI ruin, but I think that the ML community is both more important and probably significantly easier to shift than government. I basically agree with this post as it pertains to government updates based on warning shots.
I anticipate that a warning shot would get most capabilities researchers to a) independently think about alignment failures and think about the alignment failures that their models will cause, and b) take the EA/LessWrong/MIRI/Alignment sphere’s worries a lot more seriously. My impression is that OpenAI seems to be much more worried about misuse risk than accident risk: if alignment is easy, then the composition of the lightcone is primarily determined by the values of the AGI designers. Right now, there are ~100 capabilities researchers vs ~30 alignment researchers at OpenAI. I think a warning shot would dramatically update them towards worry towards worry about accident risk, and therefore I anticipate that OpenAI would drastically shift most of their resources to alignment research. I would guess P(B|A) ~= 80%.
P(C | A, B) primarily depends on alignment difficulty, of which I am pretty uncertain, and also how large the reallocation in B is, which I am anticipating to be pretty large. The bar for destroying the world gets lower and lower every year, but this would give us a lot more time, but I think we get several years of AGI capabiliity before we deploy it. I’m estimating P(C | A, B) ~= 70%, but this is very low resilience.
Right now, there are ~100 capabilities researchers vs ~30 alignment researchers at OpenAI.
I don’t want to derail this thread, but I do really want to express my disbelief at this number before people keep quoting it. I definitely don’t know 30 people at OpenAI who are working on making AI not kill everyone, and it seems kind of crazy to assert that there are (and I think assertions that there are are the result of some pretty adversarial dynamics I am sad about).
I think a warning shot would dramatically update them towards worry towards worry about accident risk, and therefore I anticipate that OpenAI would drastically shift most of their resources to alignment research. I would guess P(B|A) ~= 80%.
I would like to take bets here, though we are likely to run into doomsday-market problems, though there are ways around that.
Disclaimer: writing quickly.
Consider the following path:
A. There is an AI warning shot.
B. Civilization allocates more resources for alignment and is more conservative pushing capabilities.
C. This reallocation is sufficient to solve and deploy aligned AGI before the world is destroyed.
I think that a warning shot is unlikely (P(A) < 10%), but won’t get into that here.
I am guessing that P(B | A) is the biggest crux. The OP primarily considers the ability of governments to implement policy that moves our civilization further from AGI ruin, but I think that the ML community is both more important and probably significantly easier to shift than government. I basically agree with this post as it pertains to government updates based on warning shots.
I anticipate that a warning shot would get most capabilities researchers to a) independently think about alignment failures and think about the alignment failures that their models will cause, and b) take the EA/LessWrong/MIRI/Alignment sphere’s worries a lot more seriously. My impression is that OpenAI seems to be much more worried about misuse risk than accident risk: if alignment is easy, then the composition of the lightcone is primarily determined by the values of the AGI designers. Right now, there are ~100 capabilities researchers vs ~30 alignment researchers at OpenAI. I think a warning shot would dramatically update them towards worry towards worry about accident risk, and therefore I anticipate that OpenAI would drastically shift most of their resources to alignment research. I would guess P(B|A) ~= 80%.
P(C | A, B) primarily depends on alignment difficulty, of which I am pretty uncertain, and also how large the reallocation in B is, which I am anticipating to be pretty large. The bar for destroying the world gets lower and lower every year, but this would give us a lot more time, but I think we get several years of AGI capabiliity before we deploy it. I’m estimating P(C | A, B) ~= 70%, but this is very low resilience.
I don’t want to derail this thread, but I do really want to express my disbelief at this number before people keep quoting it. I definitely don’t know 30 people at OpenAI who are working on making AI not kill everyone, and it seems kind of crazy to assert that there are (and I think assertions that there are are the result of some pretty adversarial dynamics I am sad about).
I would like to take bets here, though we are likely to run into doomsday-market problems, though there are ways around that.