Assume there is no strong first-mover advantage (intelligence explosion), and even no strong advantage of AGIs over humanity. Even in this case, a FAI allows to stop the value drift if it’s adequately competitive with whatever other agents it coexists with (including humanity, which is going to change its values over time, not being a cleanly designed agent with a fixed goal definition). If FAI survives, that guarantees that some nontrivial portion of world’s resources will ultimately go to production of human value, as opposed to other things produced by drifted-away humanity (for example, Hanson’s efficiency-obsessed ems) and random AGIs.
(I expect there is a strong first-mover advantage, but this argument doesn’t depend on that assumption.)
I also expect a big first-mover advantage. Assuming that, you aren’t answering the question of the post. Which is: if someone invents FAI theory but not AGI theory, how can they best make or convince the eventual first-mover on AGI to use that FAI theory? (Suppose the inclusion of the FAI theory has some negative side effects for the AGI builder, like longer development time or requiring more processing power because the FAI theory presupposes a certain architecture.)
Assume there is no strong first-mover advantage (intelligence explosion), and even no strong advantage of AGIs over humanity. Even in this case, a FAI allows to stop the value drift if it’s adequately competitive with whatever other agents it coexists with (including humanity, which is going to change its values over time, not being a cleanly designed agent with a fixed goal definition). If FAI survives, that guarantees that some nontrivial portion of world’s resources will ultimately go to production of human value, as opposed to other things produced by drifted-away humanity (for example, Hanson’s efficiency-obsessed ems) and random AGIs.
(I expect there is a strong first-mover advantage, but this argument doesn’t depend on that assumption.)
I also expect a big first-mover advantage. Assuming that, you aren’t answering the question of the post. Which is: if someone invents FAI theory but not AGI theory, how can they best make or convince the eventual first-mover on AGI to use that FAI theory? (Suppose the inclusion of the FAI theory has some negative side effects for the AGI builder, like longer development time or requiring more processing power because the FAI theory presupposes a certain architecture.)
Ahh, that makes a lot more sense.