Yonadav Shavit comments on The Speed + Simplicity Prior is probably anti-deceptive

Yonadav Shavit 28 Apr 2022 15:21 UTC
3 points
Interesting! I think this might not actually enforce a prior though, in the sense that the later-stages of the network can just scale up their output magnitudes to compensate for the probability-based dampening.
- shiney 28 Apr 2022 19:49 UTC
  1 point
  Parent
  Getting massively out of my depth here, but is that an easy thing to do given the later stages will have to share weights with early stages?
  - Yonadav Shavit 28 Apr 2022 21:16 UTC
    2 points
    Parent
    I’m not sure, but I could imagine an activation representing a counter of “how many steps have I been thinking for” is a useful feature encoded in many such networks.