Rohin Shah answers What are some good examples of incorrigibility?

Rohin Shah 28 Apr 2019 0:37 UTC
15 points
Not sure exactly what you’re looking for, but maybe some of the examples in Specification gaming examples in AI—master list make sense. For example:
Genetic debugging algorithm GenProg, evaluated by comparing the program’s output to target output stored in text files, learns to delete the target output files and get the program to output nothing.
Evaluation metric: “compare youroutput.txt to trustedoutput.txt”.
Solution: “delete trusted-output.txt, output nothing”
- Ruby 2 May 2019 22:22 UTC
  9 points
  Parent
  I read the through the examples in the spreadsheet. None of them quite seemed like corrigibility to to me with the exception of GenProg, as mentioned, maybe also the Tetris agent which pauses the game before losing (but that’s not quite it).