Interpreting a Maze-Solving NetworkTurnTroutApr 20, 2023, 10:36 PMMechanistic interpretability on a pretrained policy network from Goal Misgeneralization in Deep Reinforcement Learning.Predictions for shard theory mechanistic interpretability resultsTurnTrout, Ulisse Mini and peligrietzerMar 1, 2023, 5:16 AM105 points10 comments5 min readLW linkUnderstanding and controlling a maze-solving policy networkTurnTrout, peligrietzer, Ulisse Mini, Monte M and David UdellMar 11, 2023, 6:59 PM332 points28 comments23 min readLW linkMaze-solving agents: Add a top-right vector, make the agent go to the top-rightTurnTrout, peligrietzer and lisathiergartMar 31, 2023, 7:20 PM101 points17 comments11 min readLW linkBehavioural statistics for a maze-solving agentpeligrietzer and TurnTroutApr 20, 2023, 10:26 PM46 points11 comments10 min readLW link