Well here is my take on how AIXI would handle these sorts of situations:
First, let’s assume it lives in a universe so which in any time t is in a state S(t) which is computable in terms of t. Now, AIXI finds this function S(t) interesting because it can be used to predict the input bits. More precisely, AIXI generates some function f which locate the machine running it and returns the input bits in that machine, and generates the model where its inputs in time t is f(S(t)). This function f is AIXI’s phenomenological bridge, it is naturally emergent from the formalism of AIXI. This does not take into account how AIXI’s model active has its future inputs depend on its current output, which would make the model more complicated.
Now suppose that AIXI considers an action with the result that in some time t’ the machine computing it in no longer exists. Then AIXI would be able to compute S(t’), but f(S(t’)) would no longer be well defined. What would AIXI do then? It will have to start using a different model for its inputs. Weather it will perform such an action depends on its predictions of its reward signal in this alternative. The exact result would be unpredictable, but one possible regularity would be that if it receives a very low reward signal then by regression to the mean it would expect to do better in these alternative models and would be in favor of actions which lead to its host machine’s destruction.
However, it gets more complicated than that. While in our intuitive models the AIXI’s input is no longer well defined when its host machine is destroyed, in its internal model the function f would probably be defined everywhere. For example, if its input are stored in a string of capacitors, its function f may be “electric fields in point x0, …, xn”, which is defined even when the capacitors are destroyed or displaced. A more interesting example would be if its inputs are generated from perfect-fidelity measurement of the physical world. Then the most favored hypothesis for f may be that f(s) is the measurement of those observables, and AIXI’s actions would optimize the physical parameter corresponding to its reward circuit regardless of what it predicts would happen to its reward circuit.
It gets even more interesting. Suppose such an AIXI predicts that its input stream will be tampered with. What would it do? Here, the part of its model which depends on its own output, which I previously ignored, becomes crucial. It would be reasonable for it to think as follows: When the inputs for its machine don’t match the physical parameters these inputs are supposed to measure, then AIXI’s prediction for its future input no longer matches the inputs the machine receives. Therefore the machine’s actions should no longer match AIXI’s intentions, but AIXI’s reward signal will still be at the mercy of this machine. This would generally be assigned a suboptimal utility and be avoided. However, AIXI’s model for its output circuit may be that it influences the physical state even after its host machine no longer implements it. In that case, it would not be reluctant to tamper with its input circuit.
Overall, AIXI’s actions eerily resemble the way humans behave.
I affirm all of this response except the last sentence. I don’t think humans go wrong in quite the same way...
No?
Scenario 1:
would be in favor of actions which lead to its host machine’s destruction.
Soldiers do that when they volunteer to go on suicide missions.
Scenario 2:
actions would optimize the physical parameter corresponding to its reward circuit regardless of what it predicts would happen to its reward circuit.
That’s the reason people write wills.
Scenario 3:
AIXI’s model for its output circuit may be that it influences the physical state even after its host machine no longer implements it. In that case, it would not be reluctant to tamper with its input circuit.
That’s how addicts behave. Or even non-addicts when they choose to imbibe (and possibly drive afterward).
These are some wide analogies. But analogies are not the best approach to reason about something if we already know important details, which happen to be different.
The specific details of human thinking and acting are different from the specific details of AIXI functioning. Sometimes an analogical thing happens. Sometimes not. And the only way to know when the situation is analogical, is when you already know it.
I agree that these analogies might be superficial, I simply noted that they exist in reply to Eliezer stating “I don’t think humans go wrong in quite the same way...”
The specific details of human thinking and acting are different from the specific details of AIXI functioning.
Do we really know the “specific details of human thinking and acting” to make this statement?
Do we really know the “specific details of human thinking and acting” to make this statement?
I believe we know quite enough to consider is pretty unlikely that human brain stores an infinite number of binary descriptions of Turing machines along with their probabilities which are initialized by Somonoff induction at birth (or perhaps at conception) and later updated on evidence according to the Bayes theorem.
Even if words like “inifinity” or “incomputable” are not convincing enough (okay, perhaps the human brain runs the AIXI algorithm with some unimportant rounding), there are things like human-specific biases generated by evolutionary pressures—which is one of the main points of this whole website.
Even if words like “inifinity” or “incomputable” are not convincing enough
Presumably any realizable version of AIXI, like AIXItl, would have to use a finite amount of computations, so no.
there are things like human-specific biases generated by evolutionary pressures
Right. However some of those could be due to improper weighting of some of the models, or poor priors, etc. I am not sure that the case is as closed as you seem to imply.
Well here is my take on how AIXI would handle these sorts of situations:
First, let’s assume it lives in a universe so which in any time t is in a state S(t) which is computable in terms of t. Now, AIXI finds this function S(t) interesting because it can be used to predict the input bits. More precisely, AIXI generates some function f which locate the machine running it and returns the input bits in that machine, and generates the model where its inputs in time t is f(S(t)). This function f is AIXI’s phenomenological bridge, it is naturally emergent from the formalism of AIXI. This does not take into account how AIXI’s model active has its future inputs depend on its current output, which would make the model more complicated.
Now suppose that AIXI considers an action with the result that in some time t’ the machine computing it in no longer exists. Then AIXI would be able to compute S(t’), but f(S(t’)) would no longer be well defined. What would AIXI do then? It will have to start using a different model for its inputs. Weather it will perform such an action depends on its predictions of its reward signal in this alternative. The exact result would be unpredictable, but one possible regularity would be that if it receives a very low reward signal then by regression to the mean it would expect to do better in these alternative models and would be in favor of actions which lead to its host machine’s destruction.
However, it gets more complicated than that. While in our intuitive models the AIXI’s input is no longer well defined when its host machine is destroyed, in its internal model the function f would probably be defined everywhere. For example, if its input are stored in a string of capacitors, its function f may be “electric fields in point x0, …, xn”, which is defined even when the capacitors are destroyed or displaced. A more interesting example would be if its inputs are generated from perfect-fidelity measurement of the physical world. Then the most favored hypothesis for f may be that f(s) is the measurement of those observables, and AIXI’s actions would optimize the physical parameter corresponding to its reward circuit regardless of what it predicts would happen to its reward circuit.
It gets even more interesting. Suppose such an AIXI predicts that its input stream will be tampered with. What would it do? Here, the part of its model which depends on its own output, which I previously ignored, becomes crucial. It would be reasonable for it to think as follows: When the inputs for its machine don’t match the physical parameters these inputs are supposed to measure, then AIXI’s prediction for its future input no longer matches the inputs the machine receives. Therefore the machine’s actions should no longer match AIXI’s intentions, but AIXI’s reward signal will still be at the mercy of this machine. This would generally be assigned a suboptimal utility and be avoided. However, AIXI’s model for its output circuit may be that it influences the physical state even after its host machine no longer implements it. In that case, it would not be reluctant to tamper with its input circuit.
Overall, AIXI’s actions eerily resemble the way humans behave.
These scenarios call for SMBC-like comic strip illustrations. Maybe ping Zach?
I affirm all of this response except the last sentence. I don’t think humans go wrong in quite the same way...
No?
Scenario 1:
Soldiers do that when they volunteer to go on suicide missions.
Scenario 2:
That’s the reason people write wills.
Scenario 3:
That’s how addicts behave. Or even non-addicts when they choose to imbibe (and possibly drive afterward).
These are some wide analogies. But analogies are not the best approach to reason about something if we already know important details, which happen to be different.
The specific details of human thinking and acting are different from the specific details of AIXI functioning. Sometimes an analogical thing happens. Sometimes not. And the only way to know when the situation is analogical, is when you already know it.
I agree that these analogies might be superficial, I simply noted that they exist in reply to Eliezer stating “I don’t think humans go wrong in quite the same way...”
Do we really know the “specific details of human thinking and acting” to make this statement?
I believe we know quite enough to consider is pretty unlikely that human brain stores an infinite number of binary descriptions of Turing machines along with their probabilities which are initialized by Somonoff induction at birth (or perhaps at conception) and later updated on evidence according to the Bayes theorem.
Even if words like “inifinity” or “incomputable” are not convincing enough (okay, perhaps the human brain runs the AIXI algorithm with some unimportant rounding), there are things like human-specific biases generated by evolutionary pressures—which is one of the main points of this whole website.
Seriously, the case is closed.
Presumably any realizable version of AIXI, like AIXItl, would have to use a finite amount of computations, so no.
Right. However some of those could be due to improper weighting of some of the models, or poor priors, etc. I am not sure that the case is as closed as you seem to imply.