Might we not consider programming in some forms of caution?
Caution sounds great, but if it turns out that the AI’s goals do indeed lead to killing all humans or what have you, it will only delay these outcomes, no? So caution is only useful if we program its goals wrong, it realises that humans might consider that its goals are wrong, and allows us to take another shot at giving it goals that aren’t wrong. Or basically, corrigibility.
AGI is not clairvoyant. It WILL get things wrong and accidentally produce outcomes which do not comport with its values.
Corrigibility is a valid line of research, but even if you had an extremely corrigible system, it would still risk making mistakes.
AGI should be cautious, whether it is corrigible or not. It could make a mistake based on bad values, no off-switch OR just because it cannot predict all the outcomes of its actions.
Caution sounds great, but if it turns out that the AI’s goals do indeed lead to killing all humans or what have you, it will only delay these outcomes, no? So caution is only useful if we program its goals wrong, it realises that humans might consider that its goals are wrong, and allows us to take another shot at giving it goals that aren’t wrong. Or basically, corrigibility.
Actually, caution is a different question.
AGI is not clairvoyant. It WILL get things wrong and accidentally produce outcomes which do not comport with its values.
Corrigibility is a valid line of research, but even if you had an extremely corrigible system, it would still risk making mistakes.
AGI should be cautious, whether it is corrigible or not. It could make a mistake based on bad values, no off-switch OR just because it cannot predict all the outcomes of its actions.