Commit to only use superhuman persuasion when arguing towards a valid conclusion via valid arguments, in a manner that doesn’t go against the interests of the person being persuaded.
In this plan, how should the AI define what’s in the interest of the person being persuaded? For example, say you have a North Korean soldier who can be persuaded to quite for the west (at the risk of getting the shitty jobs most migrants have) or who can be persuaded to remain loyal to his bosses (at the risk of raising his children in the shitty country most north korean have), what set of rules would you suggest?
That may be too strong of a statement. Say some new tool helps improve AI legislation more than AI design, this might turn slowing down the wheel.