gwern comments on Free Will and Dodging Anvils: AIXI Off-Policy

gwern 30 Aug 2024 21:10 UTC
8 points
2
If one’s argument is that there must be some algorithm which solves the anvil problem without needing hacks like a hardwired reward function which inflicts ‘pain’ upon any kind of bodily interaction which threatens the Cartesian boundary, because humans solve it fine, then one had better have firmly established that humans have in fact solved it without pain.

But they haven’t. When humans don’t feel pain, they do do things equivalent to ‘drop an anvil on their head’, which result in blinding, amputation, death by misadventure, etc. Turns out if you don’t feel pain, you may think it’s funny to poke yourself in the eye just to see everyone else’s reaction and go blind or jump off a roof to impress friends and die, or simply walk around too long, damage your foot into sores, which suppurate and turn septic, and you amputate your legs or die. (This is leaving out Lesch–Nyhan syndrome.)
- Cole Wyeth 30 Aug 2024 21:26 UTC
  1 point
  0
  Parent
  I don’t think that is either my argument or Marcus’s; he probably didn’t have painless humans in mind when he said that AIXI would avoid damaging itself like humans do. Including some kind of reward shaping like pain seems wise, and if it is not included engineers would have to take care that AIXI did not damage itself while it established enough background knowledge to protect its hardware. I do think that following the steps described in my post would ideally teach AIXI to protect itself, though it’s likely that a handful of other tricks and insights are needed in practice to deal with various other problems of embeddedness—and in that case the self-damaging behavior mentioned in your (interesting) write-up would not occur for a sufficiently smart (and single-mindedly goal-directed) agent even without pain sensors.