After close observation of behaviours online and in person, for a few years, it becomes ever clearer in my mind that a significant fraction (>1%) of the regular adult population would intentionally sabotage any future friendly AI. (speaking of Canada and US, though this probably applies to most if not all countries) Some wouldn’t even particularly be against friendly AI but would still behave destructively for the kicks, such as sadists and so on.
And that’s not to mention career criminals, or the really deranged who probably would be even less inhibited. I would bet even many teenagers would do so just for curiosity’s sake.
The phenomena, I think, would closely mirror the behaviour of present internet trolls when given an opportunity and an anonymizing screen to shield them. Probably a significant fraction would be the same people. As on most large online communities including LW, I assume everyone reading this would have encountered such individuals already so I will spare the examples.
The most straightforward solution I can think of is for the ‘friendly AI’ to treat these people adversarially, as if they were really trying to destroy or confound it, even if they might not actually want to. Excluding those AI that are completely isolated from uncontrolled interaction. But of course this introduces the problem of the supposedly ‘friendly‘ AI needing to ‘combat’, ‘doubt’, ‘interrogate’, ‘oppress’, etc., some percentage of humans in order to carry out its normal functions.
—
Also, there seems to be a slippery slope because it seems highly unlikely that once any such ‘friendly’ AI regularly carries out such operations that it will be able to resist applying the same methods to the merely annoying humans that are not quite as dangerous. Such as those that reduce trust by breaking written promises, confidences, rules, etc., due to carelessness, temptations, and so on. Then onwards to compulsive alcoholics, gamblers, drug addicts, cultists, and others that negatively affect communities. Then perhaps even seemingly average people who nonetheless ruin conversations they participate in, such as by invoking Godwin’s law or other specious comparisons.
Admittedly, this slippery slope may take decades or centuries to slide down as there are defensible Schelling points, and other factors along the way well discussed on LW, that would counteract this phenomena.
With sufficient predictive capability, nobody needs to be oppressed, at least not in the usual sense. They will just find themselves nudged down the path that lets them be satisfied without much harming others.
I’m probably more comfortable with this future than most. I think that it’s an interesting question, how we should relate emotionally to being predicted and optimized for.
And of course for AI that isn’t able to model humans well enough to notice and ignore/correct bad behavior, yes obviously we shouldn’t give a bunch of unstable high-leverage power to lots of randos. But it still could be reasonable to give relativlely stable, well-aggregated power to the masses.
So it seems that incipient AI need a protected environment to develop into one capable of reliably carrying out such activities. Much like raising children in protected environments before adulthood.