Well if we screw up that badly with deceptive misalignment, that corresponds to crashing on the launchpad.
It is reasonably likely that humans will have some technique they use that is intended to minimize deceptive misalignment. Or that gradient descent shapes the goals to something similar to what we want before the AI is smart enough to be deceptive.
Well if we screw up that badly with deceptive misalignment, that corresponds to crashing on the launchpad.
It is reasonably likely that humans will have some technique they use that is intended to minimize deceptive misalignment. Or that gradient descent shapes the goals to something similar to what we want before the AI is smart enough to be deceptive.