bioshok comments on All AGI Safety questions welcome (especially basic ones) [May 2023]

bioshok 26 Jun 2023 16:57 UTC
1 point
0
In the context of Deceptive Alignment, would the ultimate goal of an AI system appear random and uncorrelated with the training distribution’s objectives from a human perspective? Or would it be understandable to humans that the goal is somewhat correlated with the objectives of the training distribution?