To me the part that stands out the most is the computation of P() by the AI.
This module comes in two versions: the module P, which is an idealised version, which has almost unlimited storage space and time with which to answer the question
From this description, it seems that P is described as essentially omniscient. It knows the locations and velocity of every particle in the universe, and it has unlimited computational power. Regardless of whether possessing and computing with such information is possible, the AI will model P as being literally omniscient. I see no reason that P could not hypothetically reverse the laws of physics and thus would always return 1 or 0 for any statement about reality.
Of course, you could add noise to the inputs to P, or put a strict limit on P’s computational power, or model it as a hypothetical set of sensors which is very fine-grained but not omniscient, but this seems like another set of free variables in the model, in addition to lambda, which could completely undo the entire setup if any were set wrong, and there’s no natural choice for any of them.
From this description, it seems that P is described as essentially omniscient. It knows the locations and velocity of every particle in the universe, and it has unlimited computational power.
It has pretty unlimited computational power, but doesn’t know locations and velocities of particles. When fed with S, it has the noisy info about one slice of the universe.
I see no reason that P could not hypothetically reverse the laws of physics and thus would always return 1 or 0 for any statement about reality.
That’s not a problem—even if P is omniscient, P’ still has to estimate it’s expected output from its own limited perspective. As long as this estimate is reasonable, the omniscience of P doesn’t cause a problem (and remember that P is fed noisy data).
Of course, you could add noise to the inputs to P
Yes, the data S is noisy. The amount of noise needs to be decided upon, but as long as we don’t but stupid amounts of noise, the default error is “P’ concludes P is too effective, can distinguish very well between X and ¬X, so the AI does nothing (ie its entire motivation reduces to minimising the penalty function as much as it can)”.
even if P is omniscient, P’ still has to estimate it’s expected output from its own limited perspective. As long as this estimate is reasonable, the omniscience of P doesn’t cause a problem (and remember that P is fed noisy data).
Don’t you have to get the exact level of noise that will prevent the AI from hiding from P without letting P reconstruct the AI’s actions if it does allow itself to be destroyed? An error in either direction can be catastrophic. If the noise is to high, the AI takes over the world. If the noise is to low, E’(P(Sᵃ|X,Oᵃ,B)/P(Sᵃ|¬X,Õᵃ,B) | a) is going to be very far from 1 no matter what, so there is no reason to expect that optimizing it is still equivalent to reducing impact.
Don’t you have to get the exact level of noise that will prevent the AI from hiding from P without letting P reconstruct the AI’s actions if it does allow itself to be destroyed?
It’s not so sensitive. The AI’s actions in the box are very hard to detect from the perspective of fifty years, with minimal noise. The AI expanding dangerously across the universe would be easy to detect, even with a lot of noise (if nothing else, because humans would have recorded this and broadcast messages about it).
To me the part that stands out the most is the computation of P() by the AI.
From this description, it seems that P is described as essentially omniscient. It knows the locations and velocity of every particle in the universe, and it has unlimited computational power. Regardless of whether possessing and computing with such information is possible, the AI will model P as being literally omniscient. I see no reason that P could not hypothetically reverse the laws of physics and thus would always return 1 or 0 for any statement about reality.
Of course, you could add noise to the inputs to P, or put a strict limit on P’s computational power, or model it as a hypothetical set of sensors which is very fine-grained but not omniscient, but this seems like another set of free variables in the model, in addition to lambda, which could completely undo the entire setup if any were set wrong, and there’s no natural choice for any of them.
It has pretty unlimited computational power, but doesn’t know locations and velocities of particles. When fed with S, it has the noisy info about one slice of the universe.
That’s not a problem—even if P is omniscient, P’ still has to estimate it’s expected output from its own limited perspective. As long as this estimate is reasonable, the omniscience of P doesn’t cause a problem (and remember that P is fed noisy data).
Yes, the data S is noisy. The amount of noise needs to be decided upon, but as long as we don’t but stupid amounts of noise, the default error is “P’ concludes P is too effective, can distinguish very well between X and ¬X, so the AI does nothing (ie its entire motivation reduces to minimising the penalty function as much as it can)”.
Don’t you have to get the exact level of noise that will prevent the AI from hiding from P without letting P reconstruct the AI’s actions if it does allow itself to be destroyed? An error in either direction can be catastrophic. If the noise is to high, the AI takes over the world. If the noise is to low, E’(P(Sᵃ|X,Oᵃ,B)/P(Sᵃ|¬X,Õᵃ,B) | a) is going to be very far from 1 no matter what, so there is no reason to expect that optimizing it is still equivalent to reducing impact.
It’s not so sensitive. The AI’s actions in the box are very hard to detect from the perspective of fifty years, with minimal noise. The AI expanding dangerously across the universe would be easy to detect, even with a lot of noise (if nothing else, because humans would have recorded this and broadcast messages about it).