Some high-profile failures I think we won’t get are related to convergent goals, such as acquiring computing power, deceiving humans into not editing you, etc. We’ll probably get examples of this sort of thing in small scale experiments, that specialists might hear about, but if an AI that’s deceptive for instrumental reasons causes $1bn in damages I think it will be rather too late to learn our lesson.
Some high-profile failures I think we won’t get are related to convergent goals, such as acquiring computing power, deceiving humans into not editing you, etc. We’ll probably get examples of this sort of thing in small scale experiments, that specialists might hear about, but if an AI that’s deceptive for instrumental reasons causes $1bn in damages I think it will be rather too late to learn our lesson.