The way I think about instrumental goals is: You have have an MDP with a hierarchical structure (i.e. the states are the leaves of a rooted tree), s.t. transitions between states that differ on a higher level of the hierarchy (i.e. correspond to branches that split early) are slower than transitions between states that differ on lower levels of the hierarchy. Then quasi-stationary distributions on states resulting from different policies on the “inner MDP” of a particular “metastate” effectively function as actions w.r.t. to the higher levels. Under some assumptions it should be possible to efficiently control such an MDP in time complexity much lower than polynomial in the total number of states[1]. Hopefully it is also possible to efficiently learn this type of hypothesis.
I don’t think that anywhere there we will need a lemma saying that the algorithm picks “aligned” goals.
For example, if each vertex in the tree has the structure of one of some small set of MDPs, and you are given mappings from admissible distributions on “child” MDPs to actions of “parent” MDP that is compatible with the transition kernel.
The way I think about instrumental goals is: You have have an MDP with a hierarchical structure (i.e. the states are the leaves of a rooted tree), s.t. transitions between states that differ on a higher level of the hierarchy (i.e. correspond to branches that split early) are slower than transitions between states that differ on lower levels of the hierarchy. Then quasi-stationary distributions on states resulting from different policies on the “inner MDP” of a particular “metastate” effectively function as actions w.r.t. to the higher levels. Under some assumptions it should be possible to efficiently control such an MDP in time complexity much lower than polynomial in the total number of states[1]. Hopefully it is also possible to efficiently learn this type of hypothesis.
I don’t think that anywhere there we will need a lemma saying that the algorithm picks “aligned” goals.
For example, if each vertex in the tree has the structure of one of some small set of MDPs, and you are given mappings from admissible distributions on “child” MDPs to actions of “parent” MDP that is compatible with the transition kernel.