I believe AIXI is much more inspectable than you make it out to be. I think it is important to challenge your claim here because Holden appears to have trusted your expertise and hereby concede an important part of the argument.
AIXI’s utility judgements are based a Solomonoff prior, which are based on the computer programs which return the input data. Computer programs are not black-boxes. A system implementing AIXI can easily also return a sample of typical expected future histories and the programs compressing these histories. By examining these programs, we can figure out what implicit model the AIXI system has of its world. These programs are optimized for shortness so they are likely to be very obfuscated, but I don’t expect them to be incomprehensible (after all, they’re not optimized for incomprehensibility). Even just sampling expected histories without their compressions is likely to be very informative. In the case of AIXItl the situation is better in the sense that it’s output at any give time is guaranteed to be generated by just one length <l subprogram, and this subprogram comes with a proof justifying its utility judgement. It’s also worse in that there is no way to sample its expected future histories. However, I expect the proof provided would implicitly contain such information. If either the programs or the proofs cannot be understood by humans, the programmers can just reject them and look at the next best candidates.
As for “What will be its effect on _?”, this can be answered as well. I already stated that with AIXI you can sample future histories. This is because AIXI has a specific known prior it implements for its future histories, namely Solomonoff induction. This ability may seem limited because it only shows the future sensory data, but sensory data can be whatever you feed AIXI as input. If you want it to a have a realistic model of the world, this includes a lot of relevant information. For example, if you feed it the entire database of Wikipedia, it can give likely future versions of Wikipedia which already provides a lot of details on the effect of its actions.
Can you be a bit more specific in your interpretation of AIXI here?
Here are my assumptions, let me know where you have different assumptions:
Traditional-AIXI is assumed to exists in the same universe as the human who wants to use AIXI to solve some problem.
Traditional-AIXI has a fixed input channel (e.g. it’s connected to a webcam, and/or it receives keyboard signals from the human, etc.)
Traditional-AIXI has a fixed output channel (e.g. it’s connected to a LCD monitor, or it can control a robot servo arm, or whatever).
The human has somehow pre-provided Traditional-AIXI with some utility function.
Traditional-AIXI operates in discrete time steps.
In the first timestep that elapses since Traditional-AIXI is activated, Traditional-AIXI examines the input it receives. It considers all possible programs that take pair (S, A) and emits an output P, where S is the prior state, A is an action to take, and P is the predicted output of taking the action A in state S. Then it discards all programs that would not have produced the input it received, regardless of what S or A it was given. Then it weighs the remaining program according to their Kolmorogov complexity. This is basically the Solomonoff induction step.
Now Traditional-AIXI has to make a decision about an output to generate. It considers all possible outputs it could produce, and feeds it to the programs under consideration, to produce a predicted next time step. Traditional-AIXI then calculates the expected utility of each output (using its pre-programmed utility function), picks the one with the highest utility, and emits that output. Note that it has no idea how any of its outputs would the universe, so this is essentially a uniformly random choice.
In the next timestep, Traditional-AIXI reads its inputs again, but this time taking into account what output it has generated in the previous step. It can now start to model correlation, and eventually causation, between its input and outputs. It has a previous state S and it knows what action A it took in its last step. It can further discard more programs, and narrow the possible models that describes the universe it finds itself in.
How does Tool-AIXI work in contrast to this? Holden seems to want to avoid having any utility function pre-defined at all. However, presumably Tool-AIXI still receives inputs and still produces outputs (probably Holden intends not to allow Tool-AIXI to control a robot servo arm, but he might intend for Tool-AIXI to be able to control an LCD monitor, or at the very least, produce some sort of text file as output).
Does Tool-AIXI proceed in discrete time steps gathering input? Or do we prevent Tool-AIXI from running until a user is ready to submit a curated input to Tool-AIXI? If the latter, how quickly to we expect Tool-AIXI to be able to formulate an reasonable model of our universe?
How does Tool-AIXI choose what output to produce, if there’s no utility function?
If we type in “Tool-AIXI, please give me a cure for cancer” onto a keyboard attached to Tool-AIXI and submit that as an input, do we think that a model that encodes ASCII, the English language, bio-organisms, etc. has a lower kolmogorov complexity than a model that says “we live in a universe where we receive exactly this hardcoded stream of bytes”?
Does Tool-AIXI model the output it produces (whether that be pixels on a screen, or bytes to a file) as an action, or does it somehow prevent itself from modelling its output as if it were an action that had some effect on the universe that it exists in? If the former, then isn’t this just an agenty Oracle AI? If the latter, then what kind of programs is it generate for its model (surely not programs that take (S, A) pairs as inputs, or else what would it use for A when evaluating its plans and predicting the future)?
I believe AIXI is much more inspectable than you make it out to be. I think it is important to challenge your claim here because Holden appears to have trusted your expertise and hereby concede an important part of the argument.
AIXI’s utility judgements are based a Solomonoff prior, which are based on the computer programs which return the input data. Computer programs are not black-boxes. A system implementing AIXI can easily also return a sample of typical expected future histories and the programs compressing these histories. By examining these programs, we can figure out what implicit model the AIXI system has of its world. These programs are optimized for shortness so they are likely to be very obfuscated, but I don’t expect them to be incomprehensible (after all, they’re not optimized for incomprehensibility). Even just sampling expected histories without their compressions is likely to be very informative. In the case of AIXItl the situation is better in the sense that it’s output at any give time is guaranteed to be generated by just one length <l subprogram, and this subprogram comes with a proof justifying its utility judgement. It’s also worse in that there is no way to sample its expected future histories. However, I expect the proof provided would implicitly contain such information. If either the programs or the proofs cannot be understood by humans, the programmers can just reject them and look at the next best candidates.
As for “What will be its effect on _?”, this can be answered as well. I already stated that with AIXI you can sample future histories. This is because AIXI has a specific known prior it implements for its future histories, namely Solomonoff induction. This ability may seem limited because it only shows the future sensory data, but sensory data can be whatever you feed AIXI as input. If you want it to a have a realistic model of the world, this includes a lot of relevant information. For example, if you feed it the entire database of Wikipedia, it can give likely future versions of Wikipedia which already provides a lot of details on the effect of its actions.
Can you be a bit more specific in your interpretation of AIXI here?
Here are my assumptions, let me know where you have different assumptions:
Traditional-AIXI is assumed to exists in the same universe as the human who wants to use AIXI to solve some problem.
Traditional-AIXI has a fixed input channel (e.g. it’s connected to a webcam, and/or it receives keyboard signals from the human, etc.)
Traditional-AIXI has a fixed output channel (e.g. it’s connected to a LCD monitor, or it can control a robot servo arm, or whatever).
The human has somehow pre-provided Traditional-AIXI with some utility function.
Traditional-AIXI operates in discrete time steps.
In the first timestep that elapses since Traditional-AIXI is activated, Traditional-AIXI examines the input it receives. It considers all possible programs that take pair (S, A) and emits an output P, where S is the prior state, A is an action to take, and P is the predicted output of taking the action A in state S. Then it discards all programs that would not have produced the input it received, regardless of what S or A it was given. Then it weighs the remaining program according to their Kolmorogov complexity. This is basically the Solomonoff induction step.
Now Traditional-AIXI has to make a decision about an output to generate. It considers all possible outputs it could produce, and feeds it to the programs under consideration, to produce a predicted next time step. Traditional-AIXI then calculates the expected utility of each output (using its pre-programmed utility function), picks the one with the highest utility, and emits that output. Note that it has no idea how any of its outputs would the universe, so this is essentially a uniformly random choice.
In the next timestep, Traditional-AIXI reads its inputs again, but this time taking into account what output it has generated in the previous step. It can now start to model correlation, and eventually causation, between its input and outputs. It has a previous state S and it knows what action A it took in its last step. It can further discard more programs, and narrow the possible models that describes the universe it finds itself in.
How does Tool-AIXI work in contrast to this? Holden seems to want to avoid having any utility function pre-defined at all. However, presumably Tool-AIXI still receives inputs and still produces outputs (probably Holden intends not to allow Tool-AIXI to control a robot servo arm, but he might intend for Tool-AIXI to be able to control an LCD monitor, or at the very least, produce some sort of text file as output).
Does Tool-AIXI proceed in discrete time steps gathering input? Or do we prevent Tool-AIXI from running until a user is ready to submit a curated input to Tool-AIXI? If the latter, how quickly to we expect Tool-AIXI to be able to formulate an reasonable model of our universe?
How does Tool-AIXI choose what output to produce, if there’s no utility function?
If we type in “Tool-AIXI, please give me a cure for cancer” onto a keyboard attached to Tool-AIXI and submit that as an input, do we think that a model that encodes ASCII, the English language, bio-organisms, etc. has a lower kolmogorov complexity than a model that says “we live in a universe where we receive exactly this hardcoded stream of bytes”?
Does Tool-AIXI model the output it produces (whether that be pixels on a screen, or bytes to a file) as an action, or does it somehow prevent itself from modelling its output as if it were an action that had some effect on the universe that it exists in? If the former, then isn’t this just an agenty Oracle AI? If the latter, then what kind of programs is it generate for its model (surely not programs that take (S, A) pairs as inputs, or else what would it use for A when evaluating its plans and predicting the future)?