Another way of thinking about this is: there are a bunch of possible civilizations represented by different nodes in a graph. Each node has weighted edges to other nodes, representing the probabilities that it passes control to each of these other nodes. It also has a weighted edge to one or more “sink” states, representing cases where it does not hand control to another civilization (and instead goes extinct, builds an aligned AI, prevents unaligned AI in some other way, etc). These nodes form a Markov chain, similar to PageRank.
The relevant question is: as a node, how should we pass probability mass to other nodes given pretty limited information about them, such that (taking UDT considerations into account) lots of probability mass ends up in good sink states, in particular sink states similar to those that result from us aligning AI?
One possible strategy here is a CliqueBot type strategy, where we try to find a set of nodes such that (a) enough nodes have some chance of actually solving alignment (b) when they don’t solve alignment, they mostly pass control to other nodes in this set, and (c) it’s pretty easy to tell if a node is in this set without simulating it in detail. This is unlikely to be the optimal strategy, though.
This is a good question. I worry that OP isn’t even considering that the simulated civilization might decide to build their own AI (aligned or not). Maybe the idea is to stop the simulation before the civilization reaches that level of technology. But then, they might not have enough time to make any decisions useful to us.
Another way of thinking about this is: there are a bunch of possible civilizations represented by different nodes in a graph. Each node has weighted edges to other nodes, representing the probabilities that it passes control to each of these other nodes. It also has a weighted edge to one or more “sink” states, representing cases where it does not hand control to another civilization (and instead goes extinct, builds an aligned AI, prevents unaligned AI in some other way, etc). These nodes form a Markov chain, similar to PageRank.
The relevant question is: as a node, how should we pass probability mass to other nodes given pretty limited information about them, such that (taking UDT considerations into account) lots of probability mass ends up in good sink states, in particular sink states similar to those that result from us aligning AI?
One possible strategy here is a CliqueBot type strategy, where we try to find a set of nodes such that (a) enough nodes have some chance of actually solving alignment (b) when they don’t solve alignment, they mostly pass control to other nodes in this set, and (c) it’s pretty easy to tell if a node is in this set without simulating it in detail. This is unlikely to be the optimal strategy, though.
This is a good question. I worry that OP isn’t even considering that the simulated civilization might decide to build their own AI (aligned or not). Maybe the idea is to stop the simulation before the civilization reaches that level of technology. But then, they might not have enough time to make any decisions useful to us.