Garrett Baker comments on Measuring Coherence of Policies in Toy Environments

Garrett Baker 19 Mar 2024 16:19 UTC
3 points
0
They are related, but time-to-loop fails when there are many loops a random policy is likely to access. For example, if a “do nothing” action is the default, your agent will immediately enter a loop, but the sum of the absolute values of the real parts of the eigenvales will be very high (the number of states in the environment).