“It would make sense to pay that cost if necessary” makes more sense than “we should expect to pay that cost”, thanks.
it sounds like you view it as a bad plan?
Basically, yes. I have a draft post outlining some of my objections to that sort of plan; hopefully it won’t sit in my drafts as long as the last similar post did.
(I could be off, but it sounds like either you expect solving AI philosophical competence to come pretty much hand in hand with solving intent alignment (because you see them as similar technical problems?), or you expect not solving AI philosophical competence (while having solved intent alignment) to lead to catastrophe (thus putting us outside the worlds in which x-risks are reliably ‘solved’ for), perhaps in the way Wei Dai has talked about?)
I expect whatever ends up taking over the lightcone to be philosophically competent. I haven’t thought very hard about the philosophical competence of whatever AI succeeds at takeover (conditional on that happening), or, separately, the philosophical competence of the stupidest possible AI that could succeed at takeover with non-trivial odds. I don’t think solving intent alignment necessarily requires that we have also figured out how to make AIs philosophically competent, or vice-versa; I also haven’t though about how likely we are to experience either disjunction.
I think solving intent alignment without having made much more philosophical progress is almost certainly an improvement to our odds, but is not anywhere near sufficient to feel comfortable, since you still end up stuck in a position where you want to delegate “solve philosophy” to the AI, but you can’t because you can’t check its work very well. And that means you’re stuck at whatever level of capabilities you have, and are still approximately a sitting duck waiting for someone else to do something dumb with their own AIs (like point them at recursive self-improvement).
I expect whatever ends up taking over the lightcone to be philosophically competent.
I agree that conditional on that happening, this is plausible, but also it’s likely that some of the answers from such a philosophically competent being to be unsatisfying to us.
One example is that such a philosophically competent AI might tell you that CEV either doesn’t exist, or if it does is so path-dependent that it cannot resolve moral disagreements, which is actually pretty plausible under my model of moral philosophy.
“It would make sense to pay that cost if necessary” makes more sense than “we should expect to pay that cost”, thanks.
Basically, yes. I have a draft post outlining some of my objections to that sort of plan; hopefully it won’t sit in my drafts as long as the last similar post did.
I expect whatever ends up taking over the lightcone to be philosophically competent. I haven’t thought very hard about the philosophical competence of whatever AI succeeds at takeover (conditional on that happening), or, separately, the philosophical competence of the stupidest possible AI that could succeed at takeover with non-trivial odds. I don’t think solving intent alignment necessarily requires that we have also figured out how to make AIs philosophically competent, or vice-versa; I also haven’t though about how likely we are to experience either disjunction.
I think solving intent alignment without having made much more philosophical progress is almost certainly an improvement to our odds, but is not anywhere near sufficient to feel comfortable, since you still end up stuck in a position where you want to delegate “solve philosophy” to the AI, but you can’t because you can’t check its work very well. And that means you’re stuck at whatever level of capabilities you have, and are still approximately a sitting duck waiting for someone else to do something dumb with their own AIs (like point them at recursive self-improvement).
I agree that conditional on that happening, this is plausible, but also it’s likely that some of the answers from such a philosophically competent being to be unsatisfying to us.
One example is that such a philosophically competent AI might tell you that CEV either doesn’t exist, or if it does is so path-dependent that it cannot resolve moral disagreements, which is actually pretty plausible under my model of moral philosophy.