michaelcohen comments on Formal Solution to the Inner Alignment Problem

michaelcohen 28 Feb 2021 11:30 UTC
LW: 1 AF: 1
0
AF
It may be helpful to point to specific sections of such a long paper.
(Also, I agree that a neural network trained trained with that reward could produce a deceptive model that makes a well-timed error.)