Maybe I am confused by what you mean by S. I thought it was the state space, but that isn’t consistent with r in your post which was defined over A×O→Q?
I’m not entirely sure what you mean by the state space.S is a state space associated specifically with the utility function. It has nothing to do with the state space of the environment. The reward function in the OP is (A×O)∗→R, not A×O→R. I slightly abused notation by defining r:S→Q in the parent comment. Let’s say it’s r′:S→Q and r is defined by using T to translate the history to the (last) state and then applying r′.
One more question, this one about the priors: what are they a prior over exactly? …I ask because the term DKL(ζ0||ζ) will be positive infinity if ζ is zero for any value where ζ0 is non-zero.
The prior is just an environment i.e. a partial mapping ζ:(A×O)∗→ΔO defined on every history to which it doesn’t itself assign probability 0. The expression DKL(ξ||ζ) means that we consider all possible ways to choose a Polish space X, probability distributions μ,ν∈ΔX and a mapping f:X×(A×O)∗→ΔO s.t.ζ=Eμ[f] and ξ=Eν[f] (where the expected value is defined using the Bayes law and not pointwise, see also the definition of “instrumental states” here), and take the minimum over all of them of DKL(ν||μ).
I’m not entirely sure what you mean by the state space.S is a state space associated specifically with the utility function. It has nothing to do with the state space of the environment. The reward function in the OP is (A×O)∗→R, not A×O→R. I slightly abused notation by defining r:S→Q in the parent comment. Let’s say it’s r′:S→Q and r is defined by using T to translate the history to the (last) state and then applying r′.
The prior is just an environment i.e. a partial mapping ζ:(A×O)∗→ΔO defined on every history to which it doesn’t itself assign probability 0. The expression DKL(ξ||ζ) means that we consider all possible ways to choose a Polish space X, probability distributions μ,ν∈ΔX and a mapping f:X×(A×O)∗→ΔO s.t.ζ=Eμ[f] and ξ=Eν[f] (where the expected value is defined using the Bayes law and not pointwise, see also the definition of “instrumental states” here), and take the minimum over all of them of DKL(ν||μ).