I think the point under contention isn’t whether current egregores are (in some sense) “optimizing” for things that would score poorly according to human values (they are), but whether the things they’re optimizing for have some (clear, substantive) relation to the things a misaligned AGI will end up optimizing for, such that an intervention on the whole egregores situation would have a substantial probability of impacting the eventual AGI.
To this question I think the answer is a fairly clear “no”, though of course this doesn’t invalidate the possibility that investigating how to deal with egregores may result in some non-trivial insights for the alignment problem.
I think the point under contention isn’t whether current egregores are (in some sense) “optimizing” for things that would score poorly according to human values (they are), but whether the things they’re optimizing for have some (clear, substantive) relation to the things a misaligned AGI will end up optimizing for, such that an intervention on the whole egregores situation would have a substantial probability of impacting the eventual AGI.
To this question I think the answer is a fairly clear “no”, though of course this doesn’t invalidate the possibility that investigating how to deal with egregores may result in some non-trivial insights for the alignment problem.
I agree with you.
I also don’t think it matters whether the AGI will optimize for something current egregores care about.
What matters is whether current egregores will in fact create AGI.
The fear around AI risk is that the answer is “inevitably yes”.
The current egregores are actually no better at making AGI egregore-aligned than humans are at making it human-aligned.
But they’re a hell of a lot better at making AGI accidentally, and probably at all.
So if we don’t sort out how to align egregores, we’re fucked — and so are the egregores.
I think I see what you mean. A new AI won’t be under the control of egregores. It will be misaligned to them as well. That makes sense.