Dalcy comments on Dalcy’s Shortform

Dalcy Jun 8, 2024, 8:57 PM
23 points
0
Just read through Robust agents learn causal world models and man it is really cool! It proves a couple of bona fide selection theorems, talking about the internal structure of agents selected against a certain criteria.
- Tl;dr, agents selected to perform robustly in various local interventional distributions must internally represent something isomorphic to a causal model of the variables upstream of utility, for it is capable of answering all causal queries for those variables.
  - Thm 1: agents achieving optimal policy (util max) across various local interventions must be able to answer causal queries for all variables upstream of the utility node
  - Thm 2: relaxation of above to nonoptimal policies, relating regret bounds to the accuracy of the reconstructed causal model
  - the proof is constructive—an algorithm that, when given access to regret-bounded-policy-oracle wrt an environment with some local intervention, queries them appropriately to construct a causal model
    one implication is an algorithm for causal inference that converts black box agents to explicit causal models (because, y’know, agents like you and i are literally that aforementioned ‘regret-bounded-policy-oracle‘)
  - These selection theorems could be considered the converse of the well-known statement that given access to a causal model, one can find an optimal policy. (this and its relaxation to approximate causal models is stated in Thm 3)
- Thm 1 / 2 is like a ‘causal good regulator‘ theorem.
  - gooder regulator theorem is not structural—as in, it gives conditions under which a model of the regulator must be isomorphic to the posterior of the system—a black box statement about the input-output behavior.
- theorem is limited. only applies to cases where the decision node is not upstream of the environment nodes (eg classification. a negative example would be an mdp). but authors claim this is mostly for simpler proofs and they think this can be relaxed.
What links here?
- Dalcy's comment on Dalcy’s Shortform by Dalcy (Oct 18, 2024, 10:43 PM; 41 points)
- Dalcy's comment on Dalcy’s Shortform by Dalcy (Jul 31, 2024, 7:31 AM; 2 points)
- Alexander Gietelink Oldenziel Jun 9, 2024, 11:11 PM
  3 points
  0
  Parent
  yes !! discovered this last week—seems very important the quantitative regret bounds for approximatiions is especially promising
- mattmacdermott Jun 8, 2024, 11:56 PM
  2 points
  0
  Parent
  
  theorem is limited. only applies to cases where the decision node is not upstream of the environment nodes
  
  I think you can drop this premise and modify the conclusion to “you can find a causal model for all variables upstream of the utility and not downstream of the decision.”