jacob_cannell comments on Found Paper: “FDT in an evolutionary environment”

jacob_cannell 27 Nov 2023 20:27 UTC
3 points
0
The actions are inferred from the argmax, but they are also inputs to the prediction models. Thus AIXI is not constrained to avoid updating on its own actions, which allows it to entertain the correct world models for one boxing, for example. If it’s world models have learned that Omega never lies and is always correct, those same world models will learn the predictive shortcut that the box content is completely predictable from the action output channel, and thus it will correctly estimate that the one-box branch has higher payout.
- rotatingpaguro 27 Nov 2023 21:23 UTC
  1 point
  0
  Parent
  The actions are inferred from the argmax, but they are also inputs to the prediction models.
  The actions sui generis being “inputs to the prediction models” does not distinguish CDT from EDT.
  (To be continued, leaving now.)
  - jacob_cannell 27 Nov 2023 21:57 UTC
    2 points
    0
    Parent
    My understanding is that CDT explicitly disallows acausal predictions—so it disallows models which update on future agent actions themselves, which is important for one boxing.
    
    Action Box Empty Box Full
    
    one_box disallowed allowed
    
    two_box allowed disallowed
    
    In EDT/AIXI the world model is allowed to update the hidden box state conditional on the action chosen, even though this is ‘acausal’. Its equivalent to simply correctly observing that the agent will get higher reward in the subset of the multiverse where the agent decides to one boxe.

Action	Box Empty	Box Full
one_box	disallowed	allowed
two_box	allowed	disallowed