Donald Hobson comments on Why I expect successful (narrow) alignment

Donald Hobson 29 Dec 2018 20:34 UTC
8 points
Human morals are specific and complex (in the formal, high information sense of the word complexity) They also seem hard to define. A strict definition of human morality, or a good referent to it would be morality. Could you have powerful and useful AI that didn’t have this. This would be some kind of whitelisting or low impact optimization, as a general optimization over all possible futures is a disaster without morality. These AI may be somewhat useful, but not nearly as useful as they would be with fewer constraints.
I would make a distinction between math first AI, like logical induction and AIXI, where we understand the AI before it is built. Compare to code first AI, like anything produced by an evolutionary algorithm, anything “emergent” and most deep neural networks, where we build the AI then see what it does. The former approach has a chance of working, a code first ASI is almost certain doom.
I would question the phrase “becomes apparent that alignment is a serious problem”, I do not think this is going to happen. Before ASI, we will have the same abstract and technical arguments we have now for why alignment might be a problem. We will have a few more alpha go moments, but while some go “wow, AGI near”, others will say “go isn’t that hard, we are a long way from this scifi AGI”, or “Superintelligence will be friendly by default”. A few more people might switch sides, but we have already had one alpha go moment, and that didn’t actually make a lot of difference. There is no giant neon sign flashing “ALIGNMENT NOW!”. See no fire alarm on AGI.
Even if we do have a couple of approaches that seem likely to work, it is still difficult to turn a rough approach into a formal technical specification into programming code. The code has to have reasonable runtime. Then the first team to develop AGI have to be using a math first approach and implement alignment without serious errors. I admit that there are probably a few disjunctive possibilities I’ve missed. And these events aren’t independent. Conditional on friendly ASI I would expect a large amount of talent and organizational competence working on AI safety.