Scott Emmons comments on Understanding and controlling a maze-solving policy network

Scott Emmons 11 Mar 2023 23:52 UTC
LW: 3 AF: 3
0
AF
Neat to see the follow-up from your introductory prediction post on this project!

In my prediction I was particularly interested in the following stats:
1. If you put the cheese in the top-left and bottom-right of the largest maze size, what fraction of the time does the out-of-the-box policy you trained go to the cheese?
2. If you try to edit the mouse’s activations to make it go to the top left or bottom right of the largest mazes (leaving the cheese wherever it spawned by default in the top right), what fraction of the time do you succeed in getting the mouse to go to the top left or bottom right? What percentage of network activations are you modifying when you do this?
Do you have these stats? I read some, but not all, of this post, and I didn’t see answers to these questions.
- TurnTrout 13 Mar 2023 15:31 UTC
  LW: 2 AF: 2
  0
  AF Parent
  We definitely didn’t answer all the prediction questions in this posts, and don’t have answers to all the prediction questions—I put in some so it wouldn’t be obvious what exactly we had found.
  Re: 2. I’d off-the-cuff estimate 50% success rate for locally retargeting to top-left and about 14% to bottom-right, modifying ~11 activations (out of 32,768). If we use the cheese vector as well (modifying all of the activations at the layer), that might go up further. Haven’t run the stats, just my sense of how it would go down.