Gunnar_Zarncke comments on RussellThor’s Shortform

Gunnar_Zarncke 21 Jul 2024 20:34 UTC
3 points
0
It makes it easier, but consider this: The human brain also does this—when we conform to expectations, we make ourselves more predictable and model ourselves. But this also doesn’t prevent deception. People still lie and some of the deception is pushed into the subconscious.
- RussellThor 21 Jul 2024 20:59 UTC
  3 points
  2
  Parent
  Sure it doesn’t prevent a deceptive model being made, but if AI engineers made NN with such self awareness at all levels from the ground up, that wouldn’t happen in their models. The encouraging thing if it holds up is that there is little to no “alignment tax” to make the models understandable—they are also better.
  - Gunnar_Zarncke 22 Jul 2024 8:20 UTC
    3 points
    0
    Parent
    Indeed, engineering readability at multiple levels may solve this.