Regarding the anvil problem: you have argued with great thoroughness that one can’t perfectly prevent an AIXI from dropping an anvil on its head. However, I can’t see the necessity. We would need to get the probability of a dangerously unfriendly SAI as close to zero as possible, because it poses an existential threat. However, a suicidally foolish AIXI is only a waste of money.
Humans have a negative reinforcement channel relating to bodily harm called pain. It isn’t perfect, but it’s good enough to train most humans to avoid doing suicidal stupid things. Why would an AIXI need anything better? Yout might want to answer that there is some danger related to an AIXI s intelligence, but it’s clock speed, or whatever, could be throttle, during training.
Also any seriously intelligent .AI made with the technology of today, or the near future, is going to require a huge farm of servers. The only way it could physically interact with the world is through remote controlled body...and if drops an anvil on that, it actually will survive as a mind!
It isn’t perfect, but it’s good enough to train most humans to avoid doing suicidal stupid things. Why would an AIXI need anything better?
It’s good enough for some purposes, but even in the case of humans it doesn’t protect a lot of people from suicidally stupid behavior like ‘texting while driving’ or ‘drinking immoderately’ or ‘eating cookies’. To the extent we don’t rely on our naturalistic ability to reason abstractly about death, we’re dependent on the optimization power (and optimization targets) of evolution. A Cartesian AI would require a lot of ad-hoc supervision and punishment from a human, in the same way young or unreflective humans depend for their survival on an adult supervisor or on innate evolved intelligence. This would limit an AI’s ability to outperform humans in adaptive intelligence.
if drops an anvil on that, it actually will survive as a mind!
Sure. In that scenario, the robot body functions like the robot arm I’ve used in my examples. Destroying the robot (arm) limits the AI’s optimization power without directly damaging its software. AIXI will be unusually bad at figuring out for itself not to destroy its motor or robot, and may make strange predictions about the subsequent effects of its output sequence. If AIXI can’t perceive most of its hardware, that exacerbates the problem.
I am aware that humans hav a non zero level of life threatening behaviour. If we wanted it to be lower, we could make it lower, at the expense of various costs. We don’t which seems to mean we are happy with the current cost benefit ratio. Arguing, as you have, that the risk of AI self harm can’t be reduced to zero doesn’t mean we can’t hit an actuarial optimum.
It is not clear to me why you think safety training would limit intelligence.
You’re cutting up behaviors into two categories, ‘safe conduct’ and ‘unsafe conduct’. I’m making a finer cut, one that identifies systematic reasons some kinds of safe or unsafe behavior occur.
If you aren’t seeing why it’s useful to distinguish ‘I dropped an anvil on my head because I’m a Cartesian’ from ‘I dropped an anvil on my head because I’m a newborn who’s never seen dangerous things happen to anyone’, consider the more general dichotomy: ‘Errors due to biases in prior probability distribution’ v. ‘Errors due to small or biased data sets’.
AIMU is the agent designed to make no mistakes of the latter kind; AIXI is not such an agent, and is only intended to avoid mistakes of the former kind. AIXI is supposed to be a universal standard for induction because it only gets things wrong to the extent its data fails it, not to the extent it started off with a-priori wrong assumptions. My claim is that for a physically implemented AIXI-style agent, AIXI fails in its prior, not just in its lack of data-omniscience.
‘You aren’t omniscient about the data’ is a trivial critique, because we could never build something physically omniscient. (‘You aren’t drawing conclusions from the data in a computationally efficient manner’ is a more serious critique, but one I’m bracketing for present purposes, because AIXI isn’t intended to be computationally efficient.) Instead, my main critique is of AIXI’s Solomonoff prior. (A subsidiary critique of mine is directed at reinforcement learning, but I won’t write more about that in this epistemological setting.)
In sum: We should be interested in why the AI is making its mistakes, not just in its aggregate error rate. When we become interested in that, we notice that AIXI makes some mistakes because it’s biased, not just because it’s ignorant. That matters because (a) we could never fully solve the problem of ignorance, but we might be able to fully solve the problem of bias; and (b) if we build a sufficiently smart self-modifier it should be able to make headway on ignorance itself, whereas it won’t necessarily make headway on fixing its own biases. Problems with delusive hypothesis spaces and skewed priors are worrisome even when they only occasionally lead to mistakes, because they’re the sorts of problems that can be permanent, problems agents suffering from them may not readily converge on solutions to.
Regarding the anvil problem: you have argued with great thoroughness that one can’t perfectly prevent an AIXI from dropping an anvil on its head. However, I can’t see the necessity. We would need to get the probability of a dangerously unfriendly SAI as close to zero as possible, because it poses an existential threat. However, a suicidally foolish AIXI is only a waste of money.
Humans have a negative reinforcement channel relating to bodily harm called pain. It isn’t perfect, but it’s good enough to train most humans to avoid doing suicidal stupid things. Why would an AIXI need anything better? Yout might want to answer that there is some danger related to an AIXI s intelligence, but it’s clock speed, or whatever, could be throttle, during training.
Also any seriously intelligent .AI made with the technology of today, or the near future, is going to require a huge farm of servers. The only way it could physically interact with the world is through remote controlled body...and if drops an anvil on that, it actually will survive as a mind!
It’s also a waste of time and intellectual resources. I raised this point with Adele last month.
It’s good enough for some purposes, but even in the case of humans it doesn’t protect a lot of people from suicidally stupid behavior like ‘texting while driving’ or ‘drinking immoderately’ or ‘eating cookies’. To the extent we don’t rely on our naturalistic ability to reason abstractly about death, we’re dependent on the optimization power (and optimization targets) of evolution. A Cartesian AI would require a lot of ad-hoc supervision and punishment from a human, in the same way young or unreflective humans depend for their survival on an adult supervisor or on innate evolved intelligence. This would limit an AI’s ability to outperform humans in adaptive intelligence.
Sure. In that scenario, the robot body functions like the robot arm I’ve used in my examples. Destroying the robot (arm) limits the AI’s optimization power without directly damaging its software. AIXI will be unusually bad at figuring out for itself not to destroy its motor or robot, and may make strange predictions about the subsequent effects of its output sequence. If AIXI can’t perceive most of its hardware, that exacerbates the problem.
I am aware that humans hav a non zero level of life threatening behaviour. If we wanted it to be lower, we could make it lower, at the expense of various costs. We don’t which seems to mean we are happy with the current cost benefit ratio. Arguing, as you have, that the risk of AI self harm can’t be reduced to zero doesn’t mean we can’t hit an actuarial optimum.
It is not clear to me why you think safety training would limit intelligence.
You’re cutting up behaviors into two categories, ‘safe conduct’ and ‘unsafe conduct’. I’m making a finer cut, one that identifies systematic reasons some kinds of safe or unsafe behavior occur.
If you aren’t seeing why it’s useful to distinguish ‘I dropped an anvil on my head because I’m a Cartesian’ from ‘I dropped an anvil on my head because I’m a newborn who’s never seen dangerous things happen to anyone’, consider the more general dichotomy: ‘Errors due to biases in prior probability distribution’ v. ‘Errors due to small or biased data sets’.
AIMU is the agent designed to make no mistakes of the latter kind; AIXI is not such an agent, and is only intended to avoid mistakes of the former kind. AIXI is supposed to be a universal standard for induction because it only gets things wrong to the extent its data fails it, not to the extent it started off with a-priori wrong assumptions. My claim is that for a physically implemented AIXI-style agent, AIXI fails in its prior, not just in its lack of data-omniscience.
‘You aren’t omniscient about the data’ is a trivial critique, because we could never build something physically omniscient. (‘You aren’t drawing conclusions from the data in a computationally efficient manner’ is a more serious critique, but one I’m bracketing for present purposes, because AIXI isn’t intended to be computationally efficient.) Instead, my main critique is of AIXI’s Solomonoff prior. (A subsidiary critique of mine is directed at reinforcement learning, but I won’t write more about that in this epistemological setting.)
In sum: We should be interested in why the AI is making its mistakes, not just in its aggregate error rate. When we become interested in that, we notice that AIXI makes some mistakes because it’s biased, not just because it’s ignorant. That matters because (a) we could never fully solve the problem of ignorance, but we might be able to fully solve the problem of bias; and (b) if we build a sufficiently smart self-modifier it should be able to make headway on ignorance itself, whereas it won’t necessarily make headway on fixing its own biases. Problems with delusive hypothesis spaces and skewed priors are worrisome even when they only occasionally lead to mistakes, because they’re the sorts of problems that can be permanent, problems agents suffering from them may not readily converge on solutions to.