This is a good concept. I built a similar Squiggle model a few weeks ago* (although it’s still a rough draft), I hadn’t realized you’d beaten me to it. So I guess you won the race to build an arms race model? :P
If I’m reading this right, it looks like the model assumes that if the US doesn’t race, then China gets TAI first with 100% probability. That seems wrong to me. Race dynamics mean that when you go faster, the other party also goes faster. If the US slows down, there’s a good chance China also slows down.
Also, regarding specific values, the model’s average P(doom) values are:
10% if race + US wins
20% if race + China wins
15% if no race + China wins
That doesn’t sound right to me. Racing is very bad for safety and right now the US leaders are not going a good job, so I think P(doom | no race & China wins) is less than P(doom | race & US wins). Although I think this is pretty debatable.
*My model found that racing was bad and I had to really contort the parameter values to reverse that result. I haven’t thought much about the model construction so there could be unfair built-in assumptions.
This is a good concept. I built a similar Squiggle model a few weeks ago* (although it’s still a rough draft), I hadn’t realized you’d beaten me to it. So I guess you won the race to build an arms race model? :P
Good thing building these kinds of models doesn’t kill everyone ;-)
If I’m reading this right, it looks like the model assumes that if the US doesn’t race, then China gets TAI first with 100% probability. That seems wrong to me. Race dynamics mean that when you go faster, the other party also goes faster. If the US slows down, there’s a good chance China also slows down.
Hm, that’s a good point. I don’t know how to express that cleanly, but there are other intermediate options in which the US moves slower, but still enough that there’s a >50% chance of them getting TAI first, or they pull the brakes & alarms so that the PRC also slows down. I don’t know how to model this, maybe I would if I knew differential equations better?
Also, regarding specific values, the model’s average P(doom) values are: […] That doesn’t sound right to me.
For transparency, my current personal p(doom)≈60% (mostly race scenarios), and p(doom|no race)≈45%. My guess is that transparent CoTs, mild optimization, trying to automate mechinterp, control schemes &c “eat” some of the probability of extinction, but then you’re stuck in worlds where doing obvious things doesn’t actually help you align superintelligences and the problem is genuinely hard. So racing is pretty bad for safety. (I used different values in this post because I wanted to take the perspective of the median reader).
I saw your model on squigglehub, but didn’t dig into it too deeply. I encourage you to post it on here with or without an explanation :-)
Hm, that’s a good point. I don’t know how to express that cleanly, but there are other intermediate options in which the US moves slower, but still enough that there’s a >50% chance of them getting TAI first, or they pull the brakes & alarms so that the PRC also slows down.
You could model it as a binary P(US wins | US races) and P(US wins | US does not race). A continuum would be more accurate but I think a binary is basically fine.
I saw your model on squigglehub, but didn’t dig into it too deeply. I encourage you to post it on here with or without an explanation :-)
Posting the model is on my to-do list but I am not very satisfied with it right now so I want to fix it up some more. I want to make a bigger model that looks at all the main effects of slowing down, not just race dynamics, although perhaps that’s too ambitious.
This is a good concept. I built a similar Squiggle model a few weeks ago* (although it’s still a rough draft), I hadn’t realized you’d beaten me to it. So I guess you won the race to build an arms race model? :P
If I’m reading this right, it looks like the model assumes that if the US doesn’t race, then China gets TAI first with 100% probability. That seems wrong to me. Race dynamics mean that when you go faster, the other party also goes faster. If the US slows down, there’s a good chance China also slows down.
Also, regarding specific values, the model’s average P(doom) values are:
10% if race + US wins
20% if race + China wins
15% if no race + China wins
That doesn’t sound right to me. Racing is very bad for safety and right now the US leaders are not going a good job, so I think P(doom | no race & China wins) is less than P(doom | race & US wins). Although I think this is pretty debatable.
*My model found that racing was bad and I had to really contort the parameter values to reverse that result. I haven’t thought much about the model construction so there could be unfair built-in assumptions.
Good thing building these kinds of models doesn’t kill everyone ;-)
Hm, that’s a good point. I don’t know how to express that cleanly, but there are other intermediate options in which the US moves slower, but still enough that there’s a >50% chance of them getting TAI first, or they pull the brakes & alarms so that the PRC also slows down. I don’t know how to model this, maybe I would if I knew differential equations better?
For transparency, my current personal p(doom)≈60% (mostly race scenarios), and p(doom|no race)≈45%. My guess is that transparent CoTs, mild optimization, trying to automate mechinterp, control schemes &c “eat” some of the probability of extinction, but then you’re stuck in worlds where doing obvious things doesn’t actually help you align superintelligences and the problem is genuinely hard. So racing is pretty bad for safety. (I used different values in this post because I wanted to take the perspective of the median reader).
I saw your model on squigglehub, but didn’t dig into it too deeply. I encourage you to post it on here with or without an explanation :-)
You could model it as a binary P(US wins | US races) and P(US wins | US does not race). A continuum would be more accurate but I think a binary is basically fine.
Posting the model is on my to-do list but I am not very satisfied with it right now so I want to fix it up some more. I want to make a bigger model that looks at all the main effects of slowing down, not just race dynamics, although perhaps that’s too ambitious.