Sufficiently to hide its 49 years of running amok—from an intelligence greater than itself? It would have to move materials around to the places they would have been otherwise, reconstruct all the art and engineering that would have happened in those 49 years, create de novo humans and human societies with false memories of long term existence.
I agree that it would be very difficult to obscure 49 years of very different history, but there is a perverse incentive for the RIAI to effectively increase the duration and scope of its privacy box with respect to S by its choice of O; perhaps allowing the comet to fragment into a dense dust cloud around the Earth that effectively obscures Earth from the rest of the Universe for 49 years. If it only has to worry about Earth it is conceivable that it could simulate S | ~X while s | X actually happens and revert the state to S | ~X at the end of s1.
To make this even less probable, we could allow S to be less noisy the further we move away from the Earth (in current frame of reference). So the data S would be very fine far away from us (in space), and more noisy closer. This makes it virtually impossible to run amok undetectably.
That might actually be detrimental if the factor was large enough to allow noticeable local deviations between S|X and S|~X. If it’s okay for a star on the other side of the galaxy to have a 1 meter error, what does that imply for the local error in the position of a person?
Your s1 and s2 ideas are good, though we could also make s thicker: make it the whole space-time interval between s1 and s2.
Good idea. A thick s would prevent a “blip” of activity that the RIAI could try to sneak in after s1 that would dissipate by s2. My guess is that an RIAI would work by proving its non-effect on S, not by actually computing S, so I don’t think it would be intractable to thicken s to some degree. It would be nice if s could extend to t at infinity, but the uncertainty in S would make that virtually impossible. Perhaps the required certainty of P could decay with passing time such that “P(S(t)^a|X,O^a,B)/P(S(t)^a|¬X,Õ^a,B) | a” converged toward 1 as t went to infinity, but require the ratio to be ~1 over all t in s.
Setting the start of s to an hour after the RIAI starts (when it’s supposed to turn its privacy box into lots of entropy) would also limit the time it had to run amok.
Defining O more generally may indeed be the key to the problem. For this example, though, it seems sufficient to define O to be the output of a particular wire, and make sure the rest of the setup is such that we don’t get any useful information except via this wire’s output (so no peeking!).
Just don’t let it PWM-encode anything else into the control signal sent over the wire. That was my reasoning behind limiting O to the minimum necessary bandwidth; let all the output bits be useful for solving the problem with no room for additional signals.
What if the RIAI can observe O? Suppose it makes a decision to output O^impossible if and only if it does not observe O^impossible. X starts the RIAI and it outputs O^impossible, observes O^impossible, assumes Õ^impossible from its earlier decision, concludes ~X (an impossible event is less likely than the RIAI being mistaken about its identity), and is now outside the scope of reduced impact. There may be other ways for the RIAI to exploit self-interaction with its output, perhaps by triggering a restart from the output (aim the laser at the part of the wire after X that turns it back on) which allows it to conclude that it has started despite ~X and is thus a different AI free from the restrictions of reduced impact.
perhaps allowing the comet to fragment into a dense dust cloud around the Earth that effectively obscures Earth from the rest of the Universe for 49 years.
That’s an interesting thought, I’ll have to consider it more.
Your ideas on thickening s or decaying the certainty could be interesting.
Setting the start of s to an hour after the RIAI starts (when it’s supposed to turn its privacy box into lots of entropy) would also limit the time it had to run amok.
The idea with the 50 year delay is to be sure that no trace remain of the contents of the box. If we have a highly destructive ending to the privacy box, we can move s closer—but I wanted to move away from my previous “clouds of anti-matter plus bomb” scenario.
What if the RIAI can observe O? Suppose it...
That sounds like anthropic reasoning, which I banned the AI from using (or else counterfactuals in which it doesn’t exist make no sense—and the hypothetical super AI (which doesn’t exist) makes no sense either).
The idea with the 50 year delay is to be sure that no trace remain of the contents of the box. If we have a highly destructive ending to the privacy box, we can move s closer—but I wanted to move away from my previous “clouds of anti-matter plus bomb” scenario.
Instead of conditioning on the start of the AI perhaps its possible to directly measure the effect of the AI by restricting the contents of the privacy box directly. Let U be the region containing everything but the privacy box, let PB be the region containing the privacy box, and let O contain the output state. Ensure U intersect PB is O. Let U(t), PB(t), and O(t) be functions that yield the information content of the regions after time t (including the changes to the regions as things move around), and let U(0), PB(0), and O(0) be defined based on the initial background information B. Let T be the time at which the AI should be done and have altered O to indicate its output. Let Õ(t) represent a particular random state of the output. Let U_B(t) be information content of a universe U(t) + PB(0), e.g. the time-evolution of the universe while keeping PB constant. So long as P(S | U(T), PB(T), O(T)) / P(S | U_B(T), Õ(T)) = 1 I think that the AI’s impact is reduced to the region of the privacy box for the duration of T, and O. The equation implies that U(T) + PB(T) = U_B(T), or in other words that the privacy box in its final state is indistinguishable from its state in the background information. So long as the AI is restartable with a new background B and time 0, it doesn’t matter if the AI is accidentally started or ran more than once as long as it can correctly identify PB. It becomes more difficult (hopefully impossible) to directly observe whether the AI started or not, but a portion of O could be set aside for signalling the status of the AI.
I am not completely confident that the above approach works or that it covers the loopholes that your approach did, so it might make sense to add the privacy box conditional to the original one so that the AI has to satisfy both conditions.
I agree that it would be very difficult to obscure 49 years of very different history, but there is a perverse incentive for the RIAI to effectively increase the duration and scope of its privacy box with respect to S by its choice of O; perhaps allowing the comet to fragment into a dense dust cloud around the Earth that effectively obscures Earth from the rest of the Universe for 49 years. If it only has to worry about Earth it is conceivable that it could simulate S | ~X while s | X actually happens and revert the state to S | ~X at the end of s1.
That might actually be detrimental if the factor was large enough to allow noticeable local deviations between S|X and S|~X. If it’s okay for a star on the other side of the galaxy to have a 1 meter error, what does that imply for the local error in the position of a person?
Good idea. A thick s would prevent a “blip” of activity that the RIAI could try to sneak in after s1 that would dissipate by s2. My guess is that an RIAI would work by proving its non-effect on S, not by actually computing S, so I don’t think it would be intractable to thicken s to some degree. It would be nice if s could extend to t at infinity, but the uncertainty in S would make that virtually impossible. Perhaps the required certainty of P could decay with passing time such that “P(S(t)^a|X,O^a,B)/P(S(t)^a|¬X,Õ^a,B) | a” converged toward 1 as t went to infinity, but require the ratio to be ~1 over all t in s.
Setting the start of s to an hour after the RIAI starts (when it’s supposed to turn its privacy box into lots of entropy) would also limit the time it had to run amok.
Just don’t let it PWM-encode anything else into the control signal sent over the wire. That was my reasoning behind limiting O to the minimum necessary bandwidth; let all the output bits be useful for solving the problem with no room for additional signals.
What if the RIAI can observe O? Suppose it makes a decision to output O^impossible if and only if it does not observe O^impossible. X starts the RIAI and it outputs O^impossible, observes O^impossible, assumes Õ^impossible from its earlier decision, concludes ~X (an impossible event is less likely than the RIAI being mistaken about its identity), and is now outside the scope of reduced impact. There may be other ways for the RIAI to exploit self-interaction with its output, perhaps by triggering a restart from the output (aim the laser at the part of the wire after X that turns it back on) which allows it to conclude that it has started despite ~X and is thus a different AI free from the restrictions of reduced impact.
That’s an interesting thought, I’ll have to consider it more.
Your ideas on thickening s or decaying the certainty could be interesting.
The idea with the 50 year delay is to be sure that no trace remain of the contents of the box. If we have a highly destructive ending to the privacy box, we can move s closer—but I wanted to move away from my previous “clouds of anti-matter plus bomb” scenario.
That sounds like anthropic reasoning, which I banned the AI from using (or else counterfactuals in which it doesn’t exist make no sense—and the hypothetical super AI (which doesn’t exist) makes no sense either).
Instead of conditioning on the start of the AI perhaps its possible to directly measure the effect of the AI by restricting the contents of the privacy box directly. Let U be the region containing everything but the privacy box, let PB be the region containing the privacy box, and let O contain the output state. Ensure U intersect PB is O. Let U(t), PB(t), and O(t) be functions that yield the information content of the regions after time t (including the changes to the regions as things move around), and let U(0), PB(0), and O(0) be defined based on the initial background information B. Let T be the time at which the AI should be done and have altered O to indicate its output. Let Õ(t) represent a particular random state of the output. Let U_B(t) be information content of a universe U(t) + PB(0), e.g. the time-evolution of the universe while keeping PB constant. So long as P(S | U(T), PB(T), O(T)) / P(S | U_B(T), Õ(T)) = 1 I think that the AI’s impact is reduced to the region of the privacy box for the duration of T, and O. The equation implies that U(T) + PB(T) = U_B(T), or in other words that the privacy box in its final state is indistinguishable from its state in the background information. So long as the AI is restartable with a new background B and time 0, it doesn’t matter if the AI is accidentally started or ran more than once as long as it can correctly identify PB. It becomes more difficult (hopefully impossible) to directly observe whether the AI started or not, but a portion of O could be set aside for signalling the status of the AI.
I am not completely confident that the above approach works or that it covers the loopholes that your approach did, so it might make sense to add the privacy box conditional to the original one so that the AI has to satisfy both conditions.
I’ll think about this—for the moment, it doesn’t seem to add much, to X vs ¬X, but I may be wrong...