Okay, I am convinced. I really, really appreciate you sticking with me through this and persistently finding different ways to phrase your side and then finding ways that other people have phrased it.
For reference it was the link to the paper/book that did it. The parts of it that are immediately relevant here are chapter 3 and section 4.2.1.1 (and optionally section 5.3.5). In particular, chapter 3 explicitly describes an order of operations of goal and subgoal evaluation and then the two other sections show how wireheading is discounted as a failing strategy within a system with a well-defined order of operations. Whatever problems there may be with value stability, this has helped to clear out a whole category of mistakes that I might have made.
Again, I really appreciate the effort that you put in. Thanks a load.
Okay, I am convinced. I really, really appreciate you sticking with me through this and persistently finding different ways to phrase your side and then finding ways that other people have phrased it.
For reference it was the link to the paper/book that did it. The parts of it that are immediately relevant here are chapter 3 and section 4.2.1.1 (and optionally section 5.3.5). In particular, chapter 3 explicitly describes an order of operations of goal and subgoal evaluation and then the two other sections show how wireheading is discounted as a failing strategy within a system with a well-defined order of operations. Whatever problems there may be with value stability, this has helped to clear out a whole category of mistakes that I might have made.
Again, I really appreciate the effort that you put in. Thanks a load.
And thank you for sticking with me! It’s really hard to stick it out when there’s no such thing as an honest disagreement and disagreement is inherently disrespectful!
ETA: See the ETA in this comment to understand how my reasoning was wrong but my conclusion was correct.