he has always insisted AIXI would not drop an anvil on its head, because Cartesian dualism is not a problem for humans in the real world, who historically believed in a metaphysical soul and mostly got along fine anyway.
Regardless of what humans believed, they also felt pain when they damaged their bodies, so they had a strong instinct to avoid doing things that might damage their bodies. What is the equivalent for AIXI?
Also, I think early Christianity had to explicitly specify suicide as a sin, because it became an obvious “one weird trick” to get in Heaven (accept Jesus, get baptized, all sins removed, quickly kill yourself before you accumulate new ones). And some people still did the “suicide by a Roman cop”.
So, Cartesian dualism was not a problem for humans in real world, because they had other mechanisms for preventing self-harm.
If one’s argument is that there must be some algorithm which solves the anvil problem without needing hacks like a hardwired reward function which inflicts ‘pain’ upon any kind of bodily interaction which threatens the Cartesian boundary, because humans solve it fine, then one had better have firmly established that humans have in fact solved it without pain.
But they haven’t. When humans don’t feel pain, they do do things equivalent to ‘drop an anvil on their head’, which result in blinding, amputation, death by misadventure, etc. Turns out if you don’t feel pain, you may think it’s funny to poke yourself in the eye just to see everyone else’s reaction and go blind or jump off a roof to impress friends and die, or simply walk around too long, damage your foot into sores, which suppurate and turn septic, and you amputate your legs or die. (This is leaving out Lesch–Nyhan syndrome.)
I don’t think that is either my argument or Marcus’s; he probably didn’t have painless humans in mind when he said that AIXI would avoid damaging itself like humans do. Including some kind of reward shaping like pain seems wise, and if it is not included engineers would have to take care that AIXI did not damage itself while it established enough background knowledge to protect its hardware. I do think that following the steps described in my post would ideally teach AIXI to protect itself, though it’s likely that a handful of other tricks and insights are needed in practice to deal with various other problems of embeddedness—and in that case the self-damaging behavior mentioned in your (interesting) write-up would not occur for a sufficiently smart (and single-mindedly goal-directed) agent even without pain sensors.
I also didn’t initially buy the argument that Marcus gave and I think some modifications and care are required to make AIXI work as an embedded agent—the off-policy version is a start. Still, I think there are reasonable responses to the objections you have made:
1: It would be standard to issue a negative reward (or decrease the positive reward) if AIXI is at risk of harming its body. This is the equivalent. 2: AIXI does not believe in heaven. If its percept stream ends this is treated as 0 reward forever (which is usually but not always taken as the worst reward possible depending on author). It’s unclear if AIXI would expect the destruction of its body to lead to the end of its percept stream, but I think it would under some conditions.
It could be difficult to explain to AIXI what “its body” is.
I think the entire point of AIXI was that it kinda considers all possible universes with all possible laws of physics, and then updates based on evidence. To specify “its body”, you would need to explain many things about our universe, which I think defeats the purpose of having AIXI.
Any time you attempt to implement AIXI (or any approximation) in the real world you must specify the reward mechanism. If AIXI is equipped with a robotic body you could choose for the sensors to provide “pain” signals. There is no need to provide a nebulous definition of what is or is not part of AIXI’s body in order to achieve this.
Ah, that makes sense! AIXI can receive pain signals long before it knows what they “mean”, and as its model of the world improves, it learns to avoid pain.
Regardless of what humans believed, they also felt pain when they damaged their bodies, so they had a strong instinct to avoid doing things that might damage their bodies. What is the equivalent for AIXI?
Also, I think early Christianity had to explicitly specify suicide as a sin, because it became an obvious “one weird trick” to get in Heaven (accept Jesus, get baptized, all sins removed, quickly kill yourself before you accumulate new ones). And some people still did the “suicide by a Roman cop”.
So, Cartesian dualism was not a problem for humans in real world, because they had other mechanisms for preventing self-harm.
If one’s argument is that there must be some algorithm which solves the anvil problem without needing hacks like a hardwired reward function which inflicts ‘pain’ upon any kind of bodily interaction which threatens the Cartesian boundary, because humans solve it fine, then one had better have firmly established that humans have in fact solved it without pain.
But they haven’t. When humans don’t feel pain, they do do things equivalent to ‘drop an anvil on their head’, which result in blinding, amputation, death by misadventure, etc. Turns out if you don’t feel pain, you may think it’s funny to poke yourself in the eye just to see everyone else’s reaction and go blind or jump off a roof to impress friends and die, or simply walk around too long, damage your foot into sores, which suppurate and turn septic, and you amputate your legs or die. (This is leaving out Lesch–Nyhan syndrome.)
I don’t think that is either my argument or Marcus’s; he probably didn’t have painless humans in mind when he said that AIXI would avoid damaging itself like humans do. Including some kind of reward shaping like pain seems wise, and if it is not included engineers would have to take care that AIXI did not damage itself while it established enough background knowledge to protect its hardware. I do think that following the steps described in my post would ideally teach AIXI to protect itself, though it’s likely that a handful of other tricks and insights are needed in practice to deal with various other problems of embeddedness—and in that case the self-damaging behavior mentioned in your (interesting) write-up would not occur for a sufficiently smart (and single-mindedly goal-directed) agent even without pain sensors.
I also didn’t initially buy the argument that Marcus gave and I think some modifications and care are required to make AIXI work as an embedded agent—the off-policy version is a start. Still, I think there are reasonable responses to the objections you have made:
1: It would be standard to issue a negative reward (or decrease the positive reward) if AIXI is at risk of harming its body. This is the equivalent.
2: AIXI does not believe in heaven. If its percept stream ends this is treated as 0 reward forever (which is usually but not always taken as the worst reward possible depending on author). It’s unclear if AIXI would expect the destruction of its body to lead to the end of its percept stream, but I think it would under some conditions.
It could be difficult to explain to AIXI what “its body” is.
I think the entire point of AIXI was that it kinda considers all possible universes with all possible laws of physics, and then updates based on evidence. To specify “its body”, you would need to explain many things about our universe, which I think defeats the purpose of having AIXI.
Any time you attempt to implement AIXI (or any approximation) in the real world you must specify the reward mechanism. If AIXI is equipped with a robotic body you could choose for the sensors to provide “pain” signals. There is no need to provide a nebulous definition of what is or is not part of AIXI’s body in order to achieve this.
Ah, that makes sense! AIXI can receive pain signals long before it knows what they “mean”, and as its model of the world improves, it learns to avoid pain.