Biased AI heuistics
Heuristics have a bad rep on Less Wrong, but some people are keen to point out how useful they can sometimes be. One major critique of the “Superintelligence” thesis, is that it presents an abstract, Bayesian view of intelligence that ignores the practicalities of bounded rationality.
This trend of thought raises some other concerns, though. What if we could produce an AI of extremely high capabilities, but riven with huge numbers of heuristics? If these were human heuristics, then we might have a chance of of understanding and addressing them, but what if they weren’t? What if the AI has an underconfidence bias, and tended to chance its views too fast? Now, that one is probably quite easy to detect (unlike many that we would not have a clue about), but what if it wasn’t consistent across areas and types of new information?
In that case, our ability to predict or control what the AI does may be very limited. We can understand human biases and heuristics pretty well, and we can understand idealised agents, but differently biased agents might be a big problem.
An idealized or fully correct agent’s behavior is too hard to predict (=implement) in a complex world. That’s why you introduce the heuristics: they are easier to calculate. Can’t that be used to also make them easier to predict by a third party?
Separately from this, the agent might learn or self-modify to have new heuristics. But what does the word “heuristic” mean here? What’s special about it that doesn’t apply to all self modifications and all learning models, if you can’t predict their behavior without actually running them?
Possibly. we need to be closer to the implementation for this.
Does it matter if we aren’t able to recognize it’s biases? Humans are able to function with biases.
We are also are able to recognize and correct for their own biases. And we can’t even look at, let alone rewrite, our own source code.
I’m assuming that it can function at high level despite/because of its biases. And the problem is not that it might not work effectively, but that our job of ensuring it behaves well just got harder, because we just got worse at predicting its decisions.
If we programmed it with human heuristics, wouldn’t we assume that it would have similar biases?
We may not have programmed these in at all—it could just be efficient machine learning. And even if if started with human heuristics, it might modify these away rapidly.