Sorry for the dumb question a month after the post, but I’ve just found out about deceptive alignment. Do you think it’s plausible that a signflipped AGI could fake being an FAI in the training stage, just to take a treacherous turn at deployment?
Not really, because it takes time to train the cognitive skills necessary for deception.
You might expect this if your AGI was built with a “capabilities module” and a “goal module” and the capabilities were already present before putting in the goal, but it doesn’t seem like AGI is likely to be built this way.
Not really, because it takes time to train the cognitive skills necessary for deception.
Would that not be the case with *any* form of deceptive alignment, though? Surely it (deceptive alignment) wouldn’t pose a risk at all if that were the case? Sorry in advance for my stupidity.
Sorry for the dumb question a month after the post, but I’ve just found out about deceptive alignment. Do you think it’s plausible that a signflipped AGI could fake being an FAI in the training stage, just to take a treacherous turn at deployment?
Not really, because it takes time to train the cognitive skills necessary for deception.
You might expect this if your AGI was built with a “capabilities module” and a “goal module” and the capabilities were already present before putting in the goal, but it doesn’t seem like AGI is likely to be built this way.
Would that not be the case with *any* form of deceptive alignment, though? Surely it (deceptive alignment) wouldn’t pose a risk at all if that were the case? Sorry in advance for my stupidity.