Imposing FAI
All the posts on FAI theory as of late have given me cause to think. There’s something in the conversations about it that has always bugged me, but it is something that I haven’t found the words for before now.
It is something like this:
Say that you manage to construct an algorithm for FAI...
Say that you can show that it isn’t going to be a dangerous mistake...
And say you do all of this, and popularize it, before AGI is created (or at least, before an AGI goes *FOOM*)...
...
How in the name of Sagan are you actually going to ENFORCE the idea that all AGIs are FAIs?
I mean, if it required some rare material (like nuclear weapons) or large laboratories (like biological wmds) or some other resource that you could at least make artificially scarce, you could set up a body that ensures that any AGI created is an FAI.
But if all it is, is the right algorithms, the right code, and enough computing power… even if you design a theory for FAI, how would you keep someone from making UFAI anyway? Between people experimenting with the principles (once known), making mistakes, and the prospect of actively malicious *humans*… it just seems like unless you somehow come up with an internal mechanism that makes FAI better and stronger than any UFAI could be, and the solution turns out to be such that any idiot could see that it was a better solution… that UFAI is going to exist at some point no matter what.
At that point, it seems like the question becomes not “How do we make FAI?” (although that might be a secondary question) but rather “How do we prevent the creation of, eliminate, or reduce potential damage from UFAI?” Now, it seems like FAI might be one thing that you do toward that goal, but if UFAI is a highly likely consequence of AGI even *with* an FAI theory, shouldn’t the focus be on how to contain a UFAI event?
Assume there is no strong first-mover advantage (intelligence explosion), and even no strong advantage of AGIs over humanity. Even in this case, a FAI allows to stop the value drift if it’s adequately competitive with whatever other agents it coexists with (including humanity, which is going to change its values over time, not being a cleanly designed agent with a fixed goal definition). If FAI survives, that guarantees that some nontrivial portion of world’s resources will ultimately go to production of human value, as opposed to other things produced by drifted-away humanity (for example, Hanson’s efficiency-obsessed ems) and random AGIs.
(I expect there is a strong first-mover advantage, but this argument doesn’t depend on that assumption.)
I also expect a big first-mover advantage. Assuming that, you aren’t answering the question of the post. Which is: if someone invents FAI theory but not AGI theory, how can they best make or convince the eventual first-mover on AGI to use that FAI theory? (Suppose the inclusion of the FAI theory has some negative side effects for the AGI builder, like longer development time or requiring more processing power because the FAI theory presupposes a certain architecture.)
Ahh, that makes a lot more sense.
I think the theory is that the only thing that is powerful enough to contain UFAI is FAI, so the first self-improving AI had damn well better be FAI.
Huh. Seeing this answer twice, I can’t help but think that the standard strategy for any UFAI then is to first convince you that it is an FAI, and then to convince you that there is another UFAI “almost ready” somewhere.
Heck, if it can do #2, it might be able to skip #1 if it was able to argue that it was the less-dangerous of the two.
That’s probably why EY is so cautious about it and does not want any meaningful AGI research progress to happen until a “provably friendly AI” theory is developed. An admirable goal, though many remain skeptical of the odds of success of such an approach, or even the rationale behind it.
The standard answer is there is such a strong “first mover advantage” for self-improving AIs that it only matters which comes first: If an FAI comes first, it would be enough to stop the creation of uFAI’s (and also vice versa). This is addressed at some length in Eliezer’s paper Artificial Intelligence as a Positive and Negative Factor in Global Risk.
I don’t find this answer totally satisfying. It seems like an awfully detailed prediction to make in absence of a technical theory of AGI.
Well the idea, I gather, is that the FAI will enforce that.
In practice though, it has not even been explicitly listed as requirement for powerful AGI (let alone articulated why), that the AGI has to, on same hardware, outperform—at the task of designing better computers for example—the non-general-intelligent tools which work by combination of iteration with analytic methods and hill climbing (and perhaps better approaches taken from the AGI effort), and whose problem and solution spaces are well defined. (those can think outside the box in human terms, as them search bigger spaces than humans, but within the permissible model and solution space, i.e. ‘killing humans helps make better microchip’ sort of statements are not even processable in the system which is designed for engineering materials at microscale)