For your quiz, could you give an example of something that is grader-optimization but which is not wireheading?
Alignment with platonic grader-output isn’t wireheading. (I mentioned this variant in the second spoiler, for reference.)
For your quiz, could you give an example of something that is grader-optimization but which is not wireheading?
Alignment with platonic grader-output isn’t wireheading. (I mentioned this variant in the second spoiler, for reference.)