To note something on making AIXI based tool: Instead of calculating rewards sum over the whole future (something that is simultaneously impractical, computationally expensive, and would only serve to impair performance on task at hand), one could use the single-step reward, with 1 for button being pressed any time and 0 for button not being pressed ever. It is still not entirely a tool, but it has very bounded range of unintended behaviour (much harder to speculate of the terminator scenario). In the Hutter’s paper he outlines several not-quite-intelligences before arriving at AIXI.
[edit2: also I do not believe that even with the large sum a really powerful AIXI-tl would be intelligently dangerous rather than simply clever at breaking the hardware that’s computing it. All the valid models in AIXI-tl that affect the choice of actions have to magically insert actions being probed into some kind of internal world model. The hardware that actually makes those actions, complete with sensory apparatus, is incidental; a useless power drain; a needless fire hazard endangering the precious reward pathway]
With regards to utility functions, the utility functions in the AI sense are real valued functions taken over the world model, not functions like number of paperclips in the world. The latter function, unsafe or safe, would be incredibly difficult or impossible to define using conventional methods. It would suffice for accelerating the progress to have an algorithm that can take in an arbitrary function and find it’s maximum; while it would indeed seem to be “very difficult” to use that to cure cancer, it could be plugged into existing models and very quickly be used to e.g. design cellular machinery that would keep repairing the DNA alterations.
Likewise, the speculative tool that can understand phrase ‘how to cure cancer’ and phrase ‘what is the curing time of epoxy’ would have to pick up most narrow least objectionable interpretation of the ‘cure cancer’ phrase to merely answer something more useful than ‘cancer is not a type of epoxy or glue, it does not cure’; it seems that not seeing killing everyone as a valid interpretation comes in as necessary consequence of ability to process language at all.
All the valid models in AIXI-tl that affect the choice of actions have to magically insert actions being probed into some kind of internal world model. The hardware that actually makes those actions, complete with sensory apparatus, is incidental; a useless power drain; a needless fire hazard endangering the precious reward pathway
If the past sensory data include information about the internal workings, then there will be a striking correlation between the outputs that the workings would produce on their own (for physical reasons) and the AI’s outputs. That rules out (or drives down expected utility of acting upon) all but very crazy hypotheses about how the Cartesian interaction works. Wrecking the hardware would break that correlation, and it’s not clear what the crazy hypotheses would say about that, e.g. hypotheses that some simply specified intelligence is stage-managing the inputs, or that sometimes the AIXI-tl’s outputs matter, and other times only the physical hardware matters.
Well, you can’t include entire internal workings in the sensory data, and it can’t model significant portion of itself as it has to try big number of hypotheses on the model on each step, so I would not expect the very crazy hypotheses to be very elaborate and have high coverage of the internals.
If I closed my eyes and did not catch a ball, the explanation is that I did not see it coming and could not catch it, but this sentence is rife with self references of the sort that is problematic for AIXI. The correlation between closed eyes and lack of reward can be coded into some sort of magical craziness, but if I close my eyes and not my ears and hear where the ball lands after I missed catching it, there’s the vastly simpler explanation for why I did not catch it—my hand was not in the right spot (and that works with total absence of sensorium as well). I don’t see how AIXI-tl (with very huge constants) can value it’s eyesight (it might have some value if there is some asymmetric in the long models, but it seems clear it would not assign the adequate, rational value to it’s eyesight). In my opinion there is no single unifying principle to intelligence (or none was ever found), and AIXI-tl (with very huge constants) fails way short of even a cat in many important ways.
edit: Some other thought: I am not sure that Solomonoff induction’s prior is compatible with expected utility maximization. If the expected utility imbalance between crazy models grows faster than 2^length , and I would expect it to grow faster than any computable function (if the utility is unbounded), then the actions will be determined by imbalances between crazy, ultra long models. I would not privilege the belief that it just works without some sort of formal proof or some other very good reason to think it works.
To note something on making AIXI based tool: Instead of calculating rewards sum over the whole future (something that is simultaneously impractical, computationally expensive, and would only serve to impair performance on task at hand), one could use the single-step reward, with 1 for button being pressed any time and 0 for button not being pressed ever. It is still not entirely a tool, but it has very bounded range of unintended behaviour (much harder to speculate of the terminator scenario). In the Hutter’s paper he outlines several not-quite-intelligences before arriving at AIXI.
[edit2: also I do not believe that even with the large sum a really powerful AIXI-tl would be intelligently dangerous rather than simply clever at breaking the hardware that’s computing it. All the valid models in AIXI-tl that affect the choice of actions have to magically insert actions being probed into some kind of internal world model. The hardware that actually makes those actions, complete with sensory apparatus, is incidental; a useless power drain; a needless fire hazard endangering the precious reward pathway]
With regards to utility functions, the utility functions in the AI sense are real valued functions taken over the world model, not functions like number of paperclips in the world. The latter function, unsafe or safe, would be incredibly difficult or impossible to define using conventional methods. It would suffice for accelerating the progress to have an algorithm that can take in an arbitrary function and find it’s maximum; while it would indeed seem to be “very difficult” to use that to cure cancer, it could be plugged into existing models and very quickly be used to e.g. design cellular machinery that would keep repairing the DNA alterations.
Likewise, the speculative tool that can understand phrase ‘how to cure cancer’ and phrase ‘what is the curing time of epoxy’ would have to pick up most narrow least objectionable interpretation of the ‘cure cancer’ phrase to merely answer something more useful than ‘cancer is not a type of epoxy or glue, it does not cure’; it seems that not seeing killing everyone as a valid interpretation comes in as necessary consequence of ability to process language at all.
If the past sensory data include information about the internal workings, then there will be a striking correlation between the outputs that the workings would produce on their own (for physical reasons) and the AI’s outputs. That rules out (or drives down expected utility of acting upon) all but very crazy hypotheses about how the Cartesian interaction works. Wrecking the hardware would break that correlation, and it’s not clear what the crazy hypotheses would say about that, e.g. hypotheses that some simply specified intelligence is stage-managing the inputs, or that sometimes the AIXI-tl’s outputs matter, and other times only the physical hardware matters.
Well, you can’t include entire internal workings in the sensory data, and it can’t model significant portion of itself as it has to try big number of hypotheses on the model on each step, so I would not expect the very crazy hypotheses to be very elaborate and have high coverage of the internals.
If I closed my eyes and did not catch a ball, the explanation is that I did not see it coming and could not catch it, but this sentence is rife with self references of the sort that is problematic for AIXI. The correlation between closed eyes and lack of reward can be coded into some sort of magical craziness, but if I close my eyes and not my ears and hear where the ball lands after I missed catching it, there’s the vastly simpler explanation for why I did not catch it—my hand was not in the right spot (and that works with total absence of sensorium as well). I don’t see how AIXI-tl (with very huge constants) can value it’s eyesight (it might have some value if there is some asymmetric in the long models, but it seems clear it would not assign the adequate, rational value to it’s eyesight). In my opinion there is no single unifying principle to intelligence (or none was ever found), and AIXI-tl (with very huge constants) fails way short of even a cat in many important ways.
edit: Some other thought: I am not sure that Solomonoff induction’s prior is compatible with expected utility maximization. If the expected utility imbalance between crazy models grows faster than 2^length , and I would expect it to grow faster than any computable function (if the utility is unbounded), then the actions will be determined by imbalances between crazy, ultra long models. I would not privilege the belief that it just works without some sort of formal proof or some other very good reason to think it works.