I don’t see how you get default failure without a model. In fact, I don’t see how you get there without the standard model, where an accident means you get a super intelligence with a random goal from an unfriendly prior—but that’s precisely the model that is being contested!
I can kiiinda see default 50-50 as “model free”, though I’m not sure if I buy it.
It’s unclear to me what it would even mean to get a prediction without a “model”. Not sure if you meant to imply that, but I’m not claiming that it makes sense to view AI safety as default-failure in absence of a model (ie in absence of details & reasons to think AI risk is default failure).
If I can make my point a bit more carefully: I don’t think this post successfully surfaces the bits of your model that hypothetical Bob doubts. The claim that “historical accidents are a good reference class for existential catastrophe” is the primary claim at issue. If they were a good reference class, very high risk would obviously be justified, in my view.
Given that your post misses this, I don’t think it succeeds as an defence of high P(doom).
I think a defence of high P(doom) that addresses the issue above would be quite valuable.
Also, for what it’s worth, I treat “I’ve gamed this out a lot and it seems likely to me” as very weak evidence except in domains where I have a track record of successful predictions or proving theorems that match my intuitions. Before I have learned to do either of these things, my intuitions are indeed pretty unreliable!
Yeah I don’t think the arguments in this post on its own should convince that P(doom) is high you if you’re skeptical. There’s lots to say here that doesn’t fit into the post, eg an object-level argument for why AI alignment is “default-failure” / “disjunctive”.
I don’t see how you get default failure without a model. In fact, I don’t see how you get there without the standard model, where an accident means you get a super intelligence with a random goal from an unfriendly prior—but that’s precisely the model that is being contested!
I can kiiinda see default 50-50 as “model free”, though I’m not sure if I buy it.
It’s unclear to me what it would even mean to get a prediction without a “model”. Not sure if you meant to imply that, but I’m not claiming that it makes sense to view AI safety as default-failure in absence of a model (ie in absence of details & reasons to think AI risk is default failure).
If I can make my point a bit more carefully: I don’t think this post successfully surfaces the bits of your model that hypothetical Bob doubts. The claim that “historical accidents are a good reference class for existential catastrophe” is the primary claim at issue. If they were a good reference class, very high risk would obviously be justified, in my view.
Given that your post misses this, I don’t think it succeeds as an defence of high P(doom).
I think a defence of high P(doom) that addresses the issue above would be quite valuable.
Also, for what it’s worth, I treat “I’ve gamed this out a lot and it seems likely to me” as very weak evidence except in domains where I have a track record of successful predictions or proving theorems that match my intuitions. Before I have learned to do either of these things, my intuitions are indeed pretty unreliable!
Yeah I don’t think the arguments in this post on its own should convince that P(doom) is high you if you’re skeptical. There’s lots to say here that doesn’t fit into the post, eg an object-level argument for why AI alignment is “default-failure” / “disjunctive”.