Interpreting a Maze-Solving NetworkTurnTrout20 Apr 2023 22:36 UTCMechanistic interpretability on a pretrained policy network from Goal Misgeneralization in Deep Reinforcement Learning.Predictions for shard theory mechanistic interpretability resultsTurnTrout, Ulisse Mini and peligrietzer1 Mar 2023 5:16 UTC105 points10 comments5 min readLW linkUnderstanding and controlling a maze-solving policy networkTurnTrout, peligrietzer, Ulisse Mini, Monte M and David Udell11 Mar 2023 18:59 UTC328 points27 comments23 min readLW linkMaze-solving agents: Add a top-right vector, make the agent go to the top-rightTurnTrout, peligrietzer and lisathiergart31 Mar 2023 19:20 UTC101 points17 comments11 min readLW linkBehavioural statistics for a maze-solving agentpeligrietzer and TurnTrout20 Apr 2023 22:26 UTC46 points11 comments10 min readLW link