What we need math for is to see whether it is possible to build an agent whose belief that it is maximising such a quantity survives extensive self-knowledge about its own operation.
Well, I am an example of an agent who does not want to wirehead for the reasons explained in the posts I linked to. I have some self knowledge about my own operation, though not nearly as much as I would like (I don’t know how to program a computer to be me), but I doubt that more self knowledge, barring valley effects, would do anything other than increase my ability to avoid wireheading.
Well, I am an example of an agent who does not want to wirehead for the reasons explained in the posts I linked to. I have some self knowledge about my own operation, though not nearly as much as I would like (I don’t know how to program a computer to be me), but I doubt that more self knowledge, barring valley effects, would do anything other than increase my ability to avoid wireheading.