I think the reward tampering paper has a pretty good description of the various failure modes of wireheading. Though I guess it would be nice to have something like the Goodhart Taxonomy post, but for reward tampering/​wireheading.
I think the reward tampering paper has a pretty good description of the various failure modes of wireheading. Though I guess it would be nice to have something like the Goodhart Taxonomy post, but for reward tampering/​wireheading.