This paper does a nice job of formalizing some matters relevant to FAI.
In the AIXI interaction model, the reward input is exogenous, and so there was a gap to fill in. In Appendix B, this paper points out the danger with reward functions—that the agent will hijack the rewarder.
Yet with an internally calculated utility function of input (observations), the danger is that the agent will hijack the input channel, as it were placing a video screen in front of its cameras to show itself whatever maximizes the function. (This is not wireheading, because this is not the direct tweaking of a utility register.)
If we are not going to include problems with counterfitting utility as being part of The Wirehead Problem, then I propose the term The Pornography Problem be used to refer to them.
I think umbrella category which incldues both types of problem is the main one, though. Unless a better term can be found, The Wirehead Problem seems pretty appropriate as an umbrella term. Counterfitting utility is very close to direct self-stimulation.
The classical proposed way of dealing with this is to make sure the agent has some knowledge and understanding of what its goals actually are. What the best way of doing that is is an open problem.
This paper does a nice job of formalizing some matters relevant to FAI.
In the AIXI interaction model, the reward input is exogenous, and so there was a gap to fill in. In Appendix B, this paper points out the danger with reward functions—that the agent will hijack the rewarder.
Yet with an internally calculated utility function of input (observations), the danger is that the agent will hijack the input channel, as it were placing a video screen in front of its cameras to show itself whatever maximizes the function. (This is not wireheading, because this is not the direct tweaking of a utility register.)
How can we address this problem?
If we are not going to include problems with counterfitting utility as being part of The Wirehead Problem, then I propose the term The Pornography Problem be used to refer to them.
I think umbrella category which incldues both types of problem is the main one, though. Unless a better term can be found, The Wirehead Problem seems pretty appropriate as an umbrella term. Counterfitting utility is very close to direct self-stimulation.
The classical proposed way of dealing with this is to make sure the agent has some knowledge and understanding of what its goals actually are. What the best way of doing that is is an open problem.