I like how you specify utility directly over programs, it describes very neatly how someone who sat down and wrote a utility function
)
would do it: First determine how the observation could have been computed by the environment and then evaluate that situation.
This is a special case of the framework I wrote down in the cited article; you can always set
True, the U(program, action sequence) framework can be implemented within the U(action/observation sequence) framework, although you forgot to multiply by 2^-l(q) when describing how. I also don’t really like the finite look-ahead (until m_k) method, since it is dynamically inconsistent.
This solves wireheading only if we can specify which environments contain wireheaded (non-dualistic) agents, delusion boxes, etc..
Let’s stick with delusion boxes for now, because assuming that we can read off from the environment whether the agent has wireheaded breaks dualism. So even if we specify utility directly over environments, we still need to master the task of specifying which action/environment combinations contain delusion boxes to evaluate them correctly. It is still the same problem, just phrased differently.
If I understand you correctly, that sounds like a fairly straightforward problem for AIXI to solve. Some programs q_1 will mimic some other program q_2′s communication with the agent while doing something else in the background, but AIXI considers the possibilities of both q_1 and q_2.
I like how you specify utility directly over programs, it describes very neatly how someone who sat down and wrote a utility function
)would do it: First determine how the observation could have been computed by the environment and then evaluate that situation. This is a special case of the framework I wrote down in the cited article; you can always set
=\sum_{q:q(y_{1:m_k})=x_{1:m_k}}%20U(q,y_{1:m_k}))This solves wireheading only if we can specify which environments contain wireheaded (non-dualistic) agents, delusion boxes, etc..
True, the U(program, action sequence) framework can be implemented within the U(action/observation sequence) framework, although you forgot to multiply by 2^-l(q) when describing how. I also don’t really like the finite look-ahead (until m_k) method, since it is dynamically inconsistent.
Not sure what you mean by that.
I think then you would count that twice, wouldn’t you? Because my original formula already contains the Solomonoff probability...
Oh right. But you still want the probability weighting to be inside the sum, so you would actually need
=\frac{1}{\xi\left(\dot{y}\dot{x}_{%3Ck}y\underline{x}_{k:m_{k}}\right)}\sum_{q:q(y_{1:m_k})=x_{1:m_k}}%20U(q,y_{1:m_k})2%5E{-\ell\left(q\right)}%0A)True. :)
Let’s stick with delusion boxes for now, because assuming that we can read off from the environment whether the agent has wireheaded breaks dualism. So even if we specify utility directly over environments, we still need to master the task of specifying which action/environment combinations contain delusion boxes to evaluate them correctly. It is still the same problem, just phrased differently.
If I understand you correctly, that sounds like a fairly straightforward problem for AIXI to solve. Some programs q_1 will mimic some other program q_2′s communication with the agent while doing something else in the background, but AIXI considers the possibilities of both q_1 and q_2.