dx26 comments on Measuring Coherence of Policies in Toy Environments

dx26 19 Mar 2024 1:30 UTC
1 point
0
Right, I think this somewhat corresponds to the “how long it takes a policy to reach a stable loop” (the “distance to loop” metric), which we used in our experiments.

What did you use your coherence definition for?
- Garrett Baker 19 Mar 2024 1:42 UTC
  4 points
  0
  Parent
  Its a long story, but I wanted to see what the functional landscape of coherence looked like for goal misgeneralizing RL environments after doing essential dynamics. Results forthcoming.
- Garrett Baker 19 Mar 2024 16:19 UTC
  3 points
  0
  Parent
  They are related, but time-to-loop fails when there are many loops a random policy is likely to access. For example, if a “do nothing” action is the default, your agent will immediately enter a loop, but the sum of the absolute values of the real parts of the eigenvales will be very high (the number of states in the environment).