I think I agree that there are significant quibbles you can raise with the picture chalmers outlines, but in general I think he’s pointing at an important problem for interpretability; that it’s not clear what the relationship between a circuit-level algorithmic understanding and the kind of statements we would like to rule out (e.g this system is scheming against me) is.
I think I agree that there are significant quibbles you can raise with the picture chalmers outlines, but in general I think he’s pointing at an important problem for interpretability; that it’s not clear what the relationship between a circuit-level algorithmic understanding and the kind of statements we would like to rule out (e.g this system is scheming against me) is.
Agreed that there’s a problem there, but it’s not at all clear to me (as yet) that Chalmers’ view is a fruitful way to address that problem.
i do agree with that, although ‘step 1 is identify the problem’