I’m not seeing how this lets the agent update itself. The formula requires knowledge of sigma, pi, and p. (BTW, could someone add to the comment help text instructions for embedding Latex?) pi is part of the agent but sigma and p are not. You say
We know sigma as part of the dynamics of the system
But all the agent knows, as you’ve described it so far, is the sequence of observations. In fact, it’s stretching it to say that we know sigma or p—we have just given these names to them. sigma is a complete description of how the world state determines what the agent senses, and p is a complete description of how the agent’s actions affect the world. As the designer of the agent, will you be explicitly providing it with that information in some future instalment?
As the designer of the agent, will you be explicitly providing it with that information in some future instalment?
Technically, we don’t need to provide the agent with p and sigma explicitly. We use these parameters when we build the agent’s memory update scheme, but the agent is not necessarily “aware” of the values of the parameters from inside the algorithm.
Let’s take for example an autonomous rover on Mars. The gravity on Mars is known at the time of design, so the rover’s software, and even hardware, is built to operate under these dynamics. The wind velocity at the time and place of landing, on the other hand, is unknown. The rover may need to take measurements to determine this parameter, and encode it in its memory, before it can take it into account in choosing further actions.
But if we are thoroughly Bayesian, then something is known about the wind prior to experience. Is it likely to change every 5 minutes or can the rover wait longer before measuring again? What should be the operational range of the instruments? And so on. In this case we would include this prior in p, while the actual wind velocity is instead hidden in the world state (only to be observed occasionally and partially).
Ultimately, we could include all of physics in our belief—there’s always some Einstein to tell us that Newtonian physics is wrong. The problem is that a large belief space makes learning harder. This is why most humans struggle with intuitive understanding of relativity or quantum mechanics—our brains are not made to represent this part of the belief space.
This is also why reinforcement learning gives special treatment to the case where there are unknown but unchanging parameters of the world dynamics: the “unknown” part makes the belief space large enough to make special algorithms necessary, while the “unchanging” part makes these algorithms possible.
For LaTeX instructions, click “Show help” and then “More Help” (or go here).
I’m not seeing how this lets the agent update itself. The formula requires knowledge of sigma, pi, and p. (BTW, could someone add to the comment help text instructions for embedding Latex?) pi is part of the agent but sigma and p are not. You say
But all the agent knows, as you’ve described it so far, is the sequence of observations. In fact, it’s stretching it to say that we know sigma or p—we have just given these names to them. sigma is a complete description of how the world state determines what the agent senses, and p is a complete description of how the agent’s actions affect the world. As the designer of the agent, will you be explicitly providing it with that information in some future instalment?
Everything you say is essentially true.
Technically, we don’t need to provide the agent with p and sigma explicitly. We use these parameters when we build the agent’s memory update scheme, but the agent is not necessarily “aware” of the values of the parameters from inside the algorithm.
Let’s take for example an autonomous rover on Mars. The gravity on Mars is known at the time of design, so the rover’s software, and even hardware, is built to operate under these dynamics. The wind velocity at the time and place of landing, on the other hand, is unknown. The rover may need to take measurements to determine this parameter, and encode it in its memory, before it can take it into account in choosing further actions.
But if we are thoroughly Bayesian, then something is known about the wind prior to experience. Is it likely to change every 5 minutes or can the rover wait longer before measuring again? What should be the operational range of the instruments? And so on. In this case we would include this prior in p, while the actual wind velocity is instead hidden in the world state (only to be observed occasionally and partially).
Ultimately, we could include all of physics in our belief—there’s always some Einstein to tell us that Newtonian physics is wrong. The problem is that a large belief space makes learning harder. This is why most humans struggle with intuitive understanding of relativity or quantum mechanics—our brains are not made to represent this part of the belief space.
This is also why reinforcement learning gives special treatment to the case where there are unknown but unchanging parameters of the world dynamics: the “unknown” part makes the belief space large enough to make special algorithms necessary, while the “unchanging” part makes these algorithms possible.
For LaTeX instructions, click “Show help” and then “More Help” (or go here).