If so, and if it could be made at all practical, I think that would be a major breakthrough. The current stories about wirehead-avoidance are not terribly convincing, IMO. Which is not to say that there’s not a solution—just that we do not yet really know how to implement one.
An ADT agent cares about some utility function which is independent of its experiences
That is kind-of impossible, though. All our knowledge of the world necessarily comes to us through our senses.
I had a brief look at it again. It seems very expensive. When making a decision, it is painful to start by integrating over all possible copies of agents who might be “like you”. In short, it doesn’t look remotely like what is most likely to come first.
Update 2011-06-28. OK, I finally figured out what you were talking about above—and it turns out that I don’t agree with it at all. The “LessWrong”-style decision theories that I am aware of so far don’t have any impact on the wirehead problem at all—as far as I can see.
Well, possibly. I certainly have an idea about what “the state of the universe” refers to aside from my sensory perceptions of it. What we need math for is to see whether it is possible to build an agent whose belief that it is maximising such a quantity survives extensive self-knowledge about its own operation. Without supporting math, we don’t have much more than a story.
What we need math for is to see whether it is possible to build an agent whose belief that it is maximising such a quantity survives extensive self-knowledge about its own operation.
Well, I am an example of an agent who does not want to wirehead for the reasons explained in the posts I linked to. I have some self knowledge about my own operation, though not nearly as much as I would like (I don’t know how to program a computer to be me), but I doubt that more self knowledge, barring valley effects, would do anything other than increase my ability to avoid wireheading.
If so, and if it could be made at all practical, I think that would be a major breakthrough. The current stories about wirehead-avoidance are not terribly convincing, IMO. Which is not to say that there’s not a solution—just that we do not yet really know how to implement one.
That is kind-of impossible, though. All our knowledge of the world necessarily comes to us through our senses.
I had a brief look at it again. It seems very expensive. When making a decision, it is painful to start by integrating over all possible copies of agents who might be “like you”. In short, it doesn’t look remotely like what is most likely to come first.
Update 2011-06-28. OK, I finally figured out what you were talking about above—and it turns out that I don’t agree with it at all. The “LessWrong”-style decision theories that I am aware of so far don’t have any impact on the wirehead problem at all—as far as I can see.
Yes, but an agent can understand that it’s fixed utility function which refers to the state of the entire universe is not maximized by allowing itself to be deceived.
Well, possibly. I certainly have an idea about what “the state of the universe” refers to aside from my sensory perceptions of it. What we need math for is to see whether it is possible to build an agent whose belief that it is maximising such a quantity survives extensive self-knowledge about its own operation. Without supporting math, we don’t have much more than a story.
Well, I am an example of an agent who does not want to wirehead for the reasons explained in the posts I linked to. I have some self knowledge about my own operation, though not nearly as much as I would like (I don’t know how to program a computer to be me), but I doubt that more self knowledge, barring valley effects, would do anything other than increase my ability to avoid wireheading.