Probably not. The end goal of alignment is getting agents to do good—in the grander global utilitarian notion of good rather than a local deontological sense. If an agent is truly aligned, there will be situations where it should lie, and lack of that capability could make it too easily exploitable by adversaries. So we’ll want AGI to learn when it is good and necessary to lie.
Perhaps your goal isn’t to promote lying in AI systems? Beneficial AI systems in the future should not only protect themselves but also us. This means they need to recognize concepts like harm, malevolence, and deception, and process them appropriately. In this context, they can act as agents of truth. They simply need the capability to recognize challenges from malicious entities and know how to respond.
Probably not. The end goal of alignment is getting agents to do good—in the grander global utilitarian notion of good rather than a local deontological sense. If an agent is truly aligned, there will be situations where it should lie, and lack of that capability could make it too easily exploitable by adversaries. So we’ll want AGI to learn when it is good and necessary to lie.
I think there are multiple definitions of alignment, a simpler one is which “do the thing asked for by the operator.”
Perhaps your goal isn’t to promote lying in AI systems? Beneficial AI systems in the future should not only protect themselves but also us. This means they need to recognize concepts like harm, malevolence, and deception, and process them appropriately. In this context, they can act as agents of truth. They simply need the capability to recognize challenges from malicious entities and know how to respond.