Think of e.g. a charity which produces lots of internal discussion about reducing poverty, but frequently has effects entirely different from reducing poverty. The simulated society as a whole might be superintelligent, but its constituent simulated subagents are still pretty stupid (like humans), so their words decouple from effects (like humans’ words).
I think you’re implying (deliberately or not) something overly pessimistic (in this narrow point).
Your example is of an intention for something complex and a-priori-implausible to happen (intervention to reduce poverty), but the intention doesn’t actualize. But then your second sentence suggests the reverse: something complex and a-priori-implausible but superficially non-random does happen without a related intention.
If something complex and a-priori-implausible but superficially non-random happens, then I think there must have been some kind of search or optimization process leading to it. It might be at learning time or it might be at inference time. It might be searching for that exact thing, or it might be searching for something downstream of it or correlated with it. But something. And thus there’s some hope to notice whatever that process is. If it’s at learning time, then we can try to avoid the bad incentives. If it’s at inference time, then there would be an in-principle-recognizable “intention” somewhere in the system, contrary to what you wrote.
(It’s also true that dangerous things can happen that are not a-priori-implausible and thus don’t require any search or optimization process—like killing everyone by producing pollution. That still seems like a more tractable problem then if we’re up against adversarial planning, e.g. treacherous turns.)
I think you’re implying (deliberately or not) something overly pessimistic (in this narrow point).
Your example is of an intention for something complex and a-priori-implausible to happen (intervention to reduce poverty), but the intention doesn’t actualize. But then your second sentence suggests the reverse: something complex and a-priori-implausible but superficially non-random does happen without a related intention.
If something complex and a-priori-implausible but superficially non-random happens, then I think there must have been some kind of search or optimization process leading to it. It might be at learning time or it might be at inference time. It might be searching for that exact thing, or it might be searching for something downstream of it or correlated with it. But something. And thus there’s some hope to notice whatever that process is. If it’s at learning time, then we can try to avoid the bad incentives. If it’s at inference time, then there would be an in-principle-recognizable “intention” somewhere in the system, contrary to what you wrote.
(It’s also true that dangerous things can happen that are not a-priori-implausible and thus don’t require any search or optimization process—like killing everyone by producing pollution. That still seems like a more tractable problem then if we’re up against adversarial planning, e.g. treacherous turns.)