TurnTrout comments on The Plan − 2022 Update

TurnTrout 17 Dec 2022 21:32 UTC
LW: 4 AF: 4
0
AF
Nice update!
On the flip side, I expect we’ll see more discussion about which potential alignment targets (like human values, corrigibility, Do What I Mean, etc) are likely to be naturally expressible in the internal language of neural nets, and how to express them.
While I don’t think of these as alignment targets per se (as I understand the term to be used), I strongly support discussing the internal language of the neural net and moving away from convoluted inner/outer schemes.