This obvious problem has an obvious solution: since you are simulating the target civilization, you can run extensive tests to see if they seem nice — i.e. if they are the kind of civilization that is willing to give an alien simulation control rather than risk extinction — and only let them take over if they are.
Importantly, you mostly care about giving control to a target civilization that checks for _its_ target civilization being “nice” before passing control to it. It’s bad to hand control to a CooperateBot (who hands control to a random civilization); that would itself be equivalent to handing control to a random civilization. There is some complex nested inference that happens here: you have to make inferences about them by simulating small versions of them, while they make inferences about others (and transitively, you) by simulating small versions of them.
Logical induction gives a hint on how to do this, but the generalization bounds are pretty bad (roughly, the number of effective data points you get is at best proportional to the logarithm of the size of the thing you are predicting), and it it still only a hint; we don’t have a good formal decision theory yet.
Another way of thinking about this is: there are a bunch of possible civilizations represented by different nodes in a graph. Each node has weighted edges to other nodes, representing the probabilities that it passes control to each of these other nodes. It also has a weighted edge to one or more “sink” states, representing cases where it does not hand control to another civilization (and instead goes extinct, builds an aligned AI, prevents unaligned AI in some other way, etc). These nodes form a Markov chain, similar to PageRank.
The relevant question is: as a node, how should we pass probability mass to other nodes given pretty limited information about them, such that (taking UDT considerations into account) lots of probability mass ends up in good sink states, in particular sink states similar to those that result from us aligning AI?
One possible strategy here is a CliqueBot type strategy, where we try to find a set of nodes such that (a) enough nodes have some chance of actually solving alignment (b) when they don’t solve alignment, they mostly pass control to other nodes in this set, and (c) it’s pretty easy to tell if a node is in this set without simulating it in detail. This is unlikely to be the optimal strategy, though.
This is a good question. I worry that OP isn’t even considering that the simulated civilization might decide to build their own AI (aligned or not). Maybe the idea is to stop the simulation before the civilization reaches that level of technology. But then, they might not have enough time to make any decisions useful to us.
Logical induction gives a hint on how to do this, but the generalization bounds are pretty bad (roughly, the number of effective data points you get is at best proportional to the logarithm of the size of the thing you are predicting), and it it still only a hint; we don’t have a good formal decision theory yet.
This doesn’t seem to require any complicated machinery.
From a decision-theoretic perspective, you want to run scheme X: let the people out with whatever probability you think that a civilization of them would run scheme X. If instead of running scheme X they come up with some more clever idea, you can evaluate that idea when you see it. If they come up with scheme Y that is very similar to scheme X, probably that’s fine.
Logically, the hard part is predicting whether they’d try to run scheme X, without actually needing to simulate them simulating others. That doesn’t look very hard though (especially given that you can e.g. mock up the result of their project).
I don’t see why you don’t think this is hard. We have a pretty poor understanding of alien evolutionary and social dynamics, so why would we expect our beliefs about the aliens to be even a little accurate? This is plausibly about as difficult as alignment (if you could make these predictions, then maybe you could use them to align AIs made out of evolutionary simulations).
There’s also a problem with the definition of X: it’s recursive, so it doesn’t reduce to a material/logical statement, and there’s some machinery involved in formalizing it. This doesn’t seem impossible but it’s a complication.
We have a simulation of the aliens, who are no smarter than we are, so we run it and see what they do. We can’t formally define X, but it seems to me that if we are looking in at someone building an AI, we can see whether they are doing it.
What part do you think is most difficult? Understanding what the aliens are doing at all? The aliens running subtle deception schemes after realizing they are in a simulation (but making it look to us like they don’t realize they are in a simulation)? Making a judgment call about whether their behavior counts at X?
(Of course, actually spawning an alien civilization sounds incredibly difficult, the subsequent steps don’t sound so hard to me.)
If we’re simulating them perfectly, then they are in total using less compute than us. And their simulation of a third civilization in total uses less compute than they use. So this is only a way to pass control to smaller civilizations. This is insufficient, so I was thinking that it is actually necessary to predict what they do given a less-than-perfect simulation, which is a difficult problem.
Actually I think I misinterpreted your comment: you are proposing deciding whether to hand over control _without_ simulating them actually running AI code. But this is actually an extrapolation: before most of the cognitive work of their civilization is done by AI, their own computation of simulations will be interleaved with their own biological computation. So you can’t just simulate them up to the point where they hand off control to AI, since there will be a bunch of computation before then too.
Basically, you are predicting what the aliens do in situation X, without actually being able to simulate situation X yourself. This is an extrapolation, and understanding of alien social dynamics would be necessary to predict this accurately.
Basically, you are predicting what the aliens do in situation X, without actually being able to simulate situation X yourself. This is an extrapolation, and understanding of alien social dynamics would be necessary to predict this accurately.
I agree that you’d need to do some reasoning rather than being able to simulate the entire world and see what happens. But you can really afford quite a lot of simulating if you had the ability to rerun evolution, so you can e.g. probe the entire landscape of what they would do under different conditions (including what groups would do). The hardest things to simulate are the results of the experiments they run, but again you can probe the entire range of possibilities. You can also probably recruit aliens to help explain what the important features of the situation are and how the key decisions are likely to be made, if you can’t form good models by using the extensive simulations.
The complexity of simulating “a human who thinks they’ve seen expensive computation X” seems much closer to the complexity of simulating a human brain than to the complexity of simulating X.
Importantly, you mostly care about giving control to a target civilization that checks for _its_ target civilization being “nice” before passing control to it. It’s bad to hand control to a CooperateBot (who hands control to a random civilization); that would itself be equivalent to handing control to a random civilization. There is some complex nested inference that happens here: you have to make inferences about them by simulating small versions of them, while they make inferences about others (and transitively, you) by simulating small versions of them.
Logical induction gives a hint on how to do this, but the generalization bounds are pretty bad (roughly, the number of effective data points you get is at best proportional to the logarithm of the size of the thing you are predicting), and it it still only a hint; we don’t have a good formal decision theory yet.
Another way of thinking about this is: there are a bunch of possible civilizations represented by different nodes in a graph. Each node has weighted edges to other nodes, representing the probabilities that it passes control to each of these other nodes. It also has a weighted edge to one or more “sink” states, representing cases where it does not hand control to another civilization (and instead goes extinct, builds an aligned AI, prevents unaligned AI in some other way, etc). These nodes form a Markov chain, similar to PageRank.
The relevant question is: as a node, how should we pass probability mass to other nodes given pretty limited information about them, such that (taking UDT considerations into account) lots of probability mass ends up in good sink states, in particular sink states similar to those that result from us aligning AI?
One possible strategy here is a CliqueBot type strategy, where we try to find a set of nodes such that (a) enough nodes have some chance of actually solving alignment (b) when they don’t solve alignment, they mostly pass control to other nodes in this set, and (c) it’s pretty easy to tell if a node is in this set without simulating it in detail. This is unlikely to be the optimal strategy, though.
This is a good question. I worry that OP isn’t even considering that the simulated civilization might decide to build their own AI (aligned or not). Maybe the idea is to stop the simulation before the civilization reaches that level of technology. But then, they might not have enough time to make any decisions useful to us.
This doesn’t seem to require any complicated machinery.
From a decision-theoretic perspective, you want to run scheme X: let the people out with whatever probability you think that a civilization of them would run scheme X. If instead of running scheme X they come up with some more clever idea, you can evaluate that idea when you see it. If they come up with scheme Y that is very similar to scheme X, probably that’s fine.
Logically, the hard part is predicting whether they’d try to run scheme X, without actually needing to simulate them simulating others. That doesn’t look very hard though (especially given that you can e.g. mock up the result of their project).
I don’t see why you don’t think this is hard. We have a pretty poor understanding of alien evolutionary and social dynamics, so why would we expect our beliefs about the aliens to be even a little accurate? This is plausibly about as difficult as alignment (if you could make these predictions, then maybe you could use them to align AIs made out of evolutionary simulations).
There’s also a problem with the definition of X: it’s recursive, so it doesn’t reduce to a material/logical statement, and there’s some machinery involved in formalizing it. This doesn’t seem impossible but it’s a complication.
We have a simulation of the aliens, who are no smarter than we are, so we run it and see what they do. We can’t formally define X, but it seems to me that if we are looking in at someone building an AI, we can see whether they are doing it.
What part do you think is most difficult? Understanding what the aliens are doing at all? The aliens running subtle deception schemes after realizing they are in a simulation (but making it look to us like they don’t realize they are in a simulation)? Making a judgment call about whether their behavior counts at X?
(Of course, actually spawning an alien civilization sounds incredibly difficult, the subsequent steps don’t sound so hard to me.)
If we’re simulating them perfectly, then they are in total using less compute than us. And their simulation of a third civilization in total uses less compute than they use. So this is only a way to pass control to smaller civilizations. This is insufficient, so I was thinking that it is actually necessary to predict what they do given a less-than-perfect simulation, which is a difficult problem.
Actually I think I misinterpreted your comment: you are proposing deciding whether to hand over control _without_ simulating them actually running AI code. But this is actually an extrapolation: before most of the cognitive work of their civilization is done by AI, their own computation of simulations will be interleaved with their own biological computation. So you can’t just simulate them up to the point where they hand off control to AI, since there will be a bunch of computation before then too.
Basically, you are predicting what the aliens do in situation X, without actually being able to simulate situation X yourself. This is an extrapolation, and understanding of alien social dynamics would be necessary to predict this accurately.
I agree that you’d need to do some reasoning rather than being able to simulate the entire world and see what happens. But you can really afford quite a lot of simulating if you had the ability to rerun evolution, so you can e.g. probe the entire landscape of what they would do under different conditions (including what groups would do). The hardest things to simulate are the results of the experiments they run, but again you can probe the entire range of possibilities. You can also probably recruit aliens to help explain what the important features of the situation are and how the key decisions are likely to be made, if you can’t form good models by using the extensive simulations.
The complexity of simulating “a human who thinks they’ve seen expensive computation X” seems much closer to the complexity of simulating a human brain than to the complexity of simulating X.