I think a better active-inference-inspired perspective that fits well with the distinction Anna is trying to make here is that of representing preferences as probability distributions over state/observation trajectories, the idea being that one assigns high “belief in” probabilities to trajectories that are more desirable. This “preference distribution” is distinct from the agent’s “prediction distribution”, which tries to anticipate and explain outcomes as accurately as possible. Active Inference is then cast as the process of minimising the KL divergence between these two distributions.
A couple of pointers which articulate this idea very nicely in different contexts:
Action and Perception as Divergence Minimization—https://arxiv.org/abs/2009.01791
Whence the Expected Free Energy—https://arxiv.org/abs/2004.08128
Alex Alemi’s brilliant talk at NeurIPS—https://nips.cc/virtual/2023/73986
This paper and this one are to my knowledge the most recent technical expositions of the FEP. I don’t know of any clear derivations of the same in the discrete setting.