Logical induction gives a hint on how to do this, but the generalization bounds are pretty bad (roughly, the number of effective data points you get is at best proportional to the logarithm of the size of the thing you are predicting), and it it still only a hint; we don’t have a good formal decision theory yet.
This doesn’t seem to require any complicated machinery.
From a decision-theoretic perspective, you want to run scheme X: let the people out with whatever probability you think that a civilization of them would run scheme X. If instead of running scheme X they come up with some more clever idea, you can evaluate that idea when you see it. If they come up with scheme Y that is very similar to scheme X, probably that’s fine.
Logically, the hard part is predicting whether they’d try to run scheme X, without actually needing to simulate them simulating others. That doesn’t look very hard though (especially given that you can e.g. mock up the result of their project).
I don’t see why you don’t think this is hard. We have a pretty poor understanding of alien evolutionary and social dynamics, so why would we expect our beliefs about the aliens to be even a little accurate? This is plausibly about as difficult as alignment (if you could make these predictions, then maybe you could use them to align AIs made out of evolutionary simulations).
There’s also a problem with the definition of X: it’s recursive, so it doesn’t reduce to a material/logical statement, and there’s some machinery involved in formalizing it. This doesn’t seem impossible but it’s a complication.
We have a simulation of the aliens, who are no smarter than we are, so we run it and see what they do. We can’t formally define X, but it seems to me that if we are looking in at someone building an AI, we can see whether they are doing it.
What part do you think is most difficult? Understanding what the aliens are doing at all? The aliens running subtle deception schemes after realizing they are in a simulation (but making it look to us like they don’t realize they are in a simulation)? Making a judgment call about whether their behavior counts at X?
(Of course, actually spawning an alien civilization sounds incredibly difficult, the subsequent steps don’t sound so hard to me.)
If we’re simulating them perfectly, then they are in total using less compute than us. And their simulation of a third civilization in total uses less compute than they use. So this is only a way to pass control to smaller civilizations. This is insufficient, so I was thinking that it is actually necessary to predict what they do given a less-than-perfect simulation, which is a difficult problem.
Actually I think I misinterpreted your comment: you are proposing deciding whether to hand over control _without_ simulating them actually running AI code. But this is actually an extrapolation: before most of the cognitive work of their civilization is done by AI, their own computation of simulations will be interleaved with their own biological computation. So you can’t just simulate them up to the point where they hand off control to AI, since there will be a bunch of computation before then too.
Basically, you are predicting what the aliens do in situation X, without actually being able to simulate situation X yourself. This is an extrapolation, and understanding of alien social dynamics would be necessary to predict this accurately.
Basically, you are predicting what the aliens do in situation X, without actually being able to simulate situation X yourself. This is an extrapolation, and understanding of alien social dynamics would be necessary to predict this accurately.
I agree that you’d need to do some reasoning rather than being able to simulate the entire world and see what happens. But you can really afford quite a lot of simulating if you had the ability to rerun evolution, so you can e.g. probe the entire landscape of what they would do under different conditions (including what groups would do). The hardest things to simulate are the results of the experiments they run, but again you can probe the entire range of possibilities. You can also probably recruit aliens to help explain what the important features of the situation are and how the key decisions are likely to be made, if you can’t form good models by using the extensive simulations.
The complexity of simulating “a human who thinks they’ve seen expensive computation X” seems much closer to the complexity of simulating a human brain than to the complexity of simulating X.
This doesn’t seem to require any complicated machinery.
From a decision-theoretic perspective, you want to run scheme X: let the people out with whatever probability you think that a civilization of them would run scheme X. If instead of running scheme X they come up with some more clever idea, you can evaluate that idea when you see it. If they come up with scheme Y that is very similar to scheme X, probably that’s fine.
Logically, the hard part is predicting whether they’d try to run scheme X, without actually needing to simulate them simulating others. That doesn’t look very hard though (especially given that you can e.g. mock up the result of their project).
I don’t see why you don’t think this is hard. We have a pretty poor understanding of alien evolutionary and social dynamics, so why would we expect our beliefs about the aliens to be even a little accurate? This is plausibly about as difficult as alignment (if you could make these predictions, then maybe you could use them to align AIs made out of evolutionary simulations).
There’s also a problem with the definition of X: it’s recursive, so it doesn’t reduce to a material/logical statement, and there’s some machinery involved in formalizing it. This doesn’t seem impossible but it’s a complication.
We have a simulation of the aliens, who are no smarter than we are, so we run it and see what they do. We can’t formally define X, but it seems to me that if we are looking in at someone building an AI, we can see whether they are doing it.
What part do you think is most difficult? Understanding what the aliens are doing at all? The aliens running subtle deception schemes after realizing they are in a simulation (but making it look to us like they don’t realize they are in a simulation)? Making a judgment call about whether their behavior counts at X?
(Of course, actually spawning an alien civilization sounds incredibly difficult, the subsequent steps don’t sound so hard to me.)
If we’re simulating them perfectly, then they are in total using less compute than us. And their simulation of a third civilization in total uses less compute than they use. So this is only a way to pass control to smaller civilizations. This is insufficient, so I was thinking that it is actually necessary to predict what they do given a less-than-perfect simulation, which is a difficult problem.
Actually I think I misinterpreted your comment: you are proposing deciding whether to hand over control _without_ simulating them actually running AI code. But this is actually an extrapolation: before most of the cognitive work of their civilization is done by AI, their own computation of simulations will be interleaved with their own biological computation. So you can’t just simulate them up to the point where they hand off control to AI, since there will be a bunch of computation before then too.
Basically, you are predicting what the aliens do in situation X, without actually being able to simulate situation X yourself. This is an extrapolation, and understanding of alien social dynamics would be necessary to predict this accurately.
I agree that you’d need to do some reasoning rather than being able to simulate the entire world and see what happens. But you can really afford quite a lot of simulating if you had the ability to rerun evolution, so you can e.g. probe the entire landscape of what they would do under different conditions (including what groups would do). The hardest things to simulate are the results of the experiments they run, but again you can probe the entire range of possibilities. You can also probably recruit aliens to help explain what the important features of the situation are and how the key decisions are likely to be made, if you can’t form good models by using the extensive simulations.
The complexity of simulating “a human who thinks they’ve seen expensive computation X” seems much closer to the complexity of simulating a human brain than to the complexity of simulating X.