I disagree with Critch that we should expect single/single delegation(/alignment) to be solved “by default” because of economic incentives. I think economic incentives will not lead to it being solved well-enough, soon enough (e.g. see: https://www.lesswrong.com/posts/DmLg3Q4ZywCj6jHBL/capybaralet-s-shortform?commentId=wBc2cZaDEBX2rb4GQ) I guess Critch might put this in the “multi/multi” camp, but I think it’s more general (e.g. I attribute a lot of the risk here to human irrationality/carelessness)
RE: “I find the argument less persuasive because we do have governance, regulations, national security etc. that would already be trying to mitigate issues that arise in multi-multi contexts, especially things that could plausibly cause extinction”… 1) These are all failing us when it comes to, e.g. climate change. 2) I don’t think we should expect our institutions to keep up with rapid technological progress (you might say they are already failing to...). My thought experiment from the paper is: “imagine if everyone woke up 1000000x smarter tomorrow.” Our current institutions would likely not survive the day and might or might not be improved quickly enough to keep ahead of bad actors / out-of-control conflict spirals.
I disagree with Critch that we should expect single/single delegation(/alignment) to be solved “by default” because of economic incentives. I think economic incentives will not lead to it being solved well-enough, soon enough
Indeed, this is where my 10% comes from, and may be a significant part of the reason I focus on intent alignment whereas Critch would focus on multi/multi stuff.
My thought experiment from the paper is: “imagine if everyone woke up 1000000x smarter tomorrow.”
Basically all of my arguments for “we’ll be fine” rely on not having a huge discontinuity like that, so while I roughly agree with your prediction in that thought experiment, it’s not very persuasive.
(The arguments do not rely on technological progress remaining at its current pace.)
keep up with rapid technological progress (you might say they are already failing to...)
At least in the US, our institutions are succeeding at providing public infrastructure (roads, water, electricity...), not having nuclear war, ensuring children can read, and allowing me to generally trust the people around me despite not knowing them. Deepfakes and facial recognition are small potatoes compared to that.
These are all failing us when it comes to, e.g. climate change.
I agree this is overall a point against my position (though I probably don’t think it is as strong as you think it is).
Regarding ARCHES, as an author:
I disagree with Critch that we should expect single/single delegation(/alignment) to be solved “by default” because of economic incentives. I think economic incentives will not lead to it being solved well-enough, soon enough (e.g. see:
https://www.lesswrong.com/posts/DmLg3Q4ZywCj6jHBL/capybaralet-s-shortform?commentId=wBc2cZaDEBX2rb4GQ) I guess Critch might put this in the “multi/multi” camp, but I think it’s more general (e.g. I attribute a lot of the risk here to human irrationality/carelessness)
RE: “I find the argument less persuasive because we do have governance, regulations, national security etc. that would already be trying to mitigate issues that arise in multi-multi contexts, especially things that could plausibly cause extinction”… 1) These are all failing us when it comes to, e.g. climate change. 2) I don’t think we should expect our institutions to keep up with rapid technological progress (you might say they are already failing to...). My thought experiment from the paper is: “imagine if everyone woke up 1000000x smarter tomorrow.” Our current institutions would likely not survive the day and might or might not be improved quickly enough to keep ahead of bad actors / out-of-control conflict spirals.
Indeed, this is where my 10% comes from, and may be a significant part of the reason I focus on intent alignment whereas Critch would focus on multi/multi stuff.
Basically all of my arguments for “we’ll be fine” rely on not having a huge discontinuity like that, so while I roughly agree with your prediction in that thought experiment, it’s not very persuasive.
(The arguments do not rely on technological progress remaining at its current pace.)
At least in the US, our institutions are succeeding at providing public infrastructure (roads, water, electricity...), not having nuclear war, ensuring children can read, and allowing me to generally trust the people around me despite not knowing them. Deepfakes and facial recognition are small potatoes compared to that.
I agree this is overall a point against my position (though I probably don’t think it is as strong as you think it is).