Right, but it’s not clear that this is a natural flaw for other possible FAI designs, in a way that it seems to be for this one. Here, we start the AGI without understanding of human values, only the output of the initial program that will be available some time in the future is expected to have that understanding, so there is nothing to morally guide the AGI in the meantime. By “solving FAI” I meant that we do get some technical understanding of human values when the thing is launched, which might be enough to avoid the carnage.
(This whole line of reasoning creates a motivation for thinking about Oracle AI boxing. Here we have AGIs that become FAIs eventually, but might be initially UFAI-level dangerous.)
Right, but it’s not clear that this is a natural flaw for other possible FAI designs, in a way that it seems to be for this one. Here, we start the AGI without understanding of human values, only the output of the initial program that will be available some time in the future is expected to have that understanding, so there is nothing to morally guide the AGI in the meantime. By “solving FAI” I meant that we do get some technical understanding of human values when the thing is launched, which might be enough to avoid the carnage.
(This whole line of reasoning creates a motivation for thinking about Oracle AI boxing. Here we have AGIs that become FAIs eventually, but might be initially UFAI-level dangerous.)