That is to say, a prepotent AI system whose prepotence was not recognized by its developers is highly likely to be misaligned as well.
I agree that it would be bad news from an evidential perspective if we misjudged the AI capabilities such that it’s perceived to have lower capabilities as well, but the variables of misjudging capabilities and the level of misalignment is in principle independent of each other and not correlated, so I wonder what’s going on here.
I agree that it would be bad news from an evidential perspective if we misjudged the AI capabilities such that it’s perceived to have lower capabilities as well, but the variables of misjudging capabilities and the level of misalignment is in principle independent of each other and not correlated, so I wonder what’s going on here.