Johannes Treutlein comments on Intuitions about solving hard problems

Johannes Treutlein 22 Jun 2022 23:41 UTC
LW: 2 AF: 1
AF
I find this particularly curious since naively, one would assume that weight sharing implicitly implements a simplicity prior, so it should make optimization more likely and thus also deceptive behavior? Maybe the argument is that somehow weight sharing leaves less wiggle room for obscuring one’s reasoning process, making a potential optimizer more interpretable? But the hidden states and tied weights could still be encoding deceptive reasoning in an uninterpretable way?