creation of slow-thinking poorly understood unFriendly AGIs is not any help in developing a FAI
If we use a model where building a uFAI requires only solving the AGI problem, and building FAI requires solving AGI + Friendliness—are you saying that it will not be of any help in developing Friendliness, or that it will not be of any help in developing AGI or Friendliness?
(The former claim would sound plausible though non-obvious, and the latter way too strong.)
No help in developing FAI theory (decision theory and a way of pointing to human values), probably of little help in developing FAI implementation, although there might be useful methods in common.
FAI requires solving AGI + Friendliness
I don’t believe it works like that. Making a poorly understood AGI doesn’t necessarily help with implementing a FAI (even if you have the theory figured out), as a FAI is not just parameterized by its values, but also defined by the correctness of interpretation of its values (decision theory), which other AGI designs by default won’t have.
Indeed—for example, on the F front, computational models of human ethical reasoning seem like something that could help increase the safety of all kinds of AGI projects and also be useful for Friendliness theory in general, and some of them could conceivably be developed in the context of heuristic AGI. Likewise, for the AGI aspect, it seems like there should be all kinds of machine learning techniques and advances in probability theory (for example) that would be equally useful for pretty much any kind of AGI—after all, we already know that an understanding of e.g. Bayes’ theorem and expected utility will be necessary for pretty much any kind of AGI implementation, so why should we assume that all of the insights that will be useful in many kinds of contexts would have been developed already?
Making a poorly understood AGI doesn’t necessarily help with implementing a FAI (even if you have the theory figured out)
Right, by the above I meant to say “the right kind of AGI + Friendliness”; I certainly agree that there are many conceivable ways of building AGIs that would be impossible to ever make Friendly.
If we use a model where building a uFAI requires only solving the AGI problem, and building FAI requires solving AGI + Friendliness—are you saying that it will not be of any help in developing Friendliness, or that it will not be of any help in developing AGI or Friendliness?
(The former claim would sound plausible though non-obvious, and the latter way too strong.)
No help in developing FAI theory (decision theory and a way of pointing to human values), probably of little help in developing FAI implementation, although there might be useful methods in common.
I don’t believe it works like that. Making a poorly understood AGI doesn’t necessarily help with implementing a FAI (even if you have the theory figured out), as a FAI is not just parameterized by its values, but also defined by the correctness of interpretation of its values (decision theory), which other AGI designs by default won’t have.
Indeed—for example, on the F front, computational models of human ethical reasoning seem like something that could help increase the safety of all kinds of AGI projects and also be useful for Friendliness theory in general, and some of them could conceivably be developed in the context of heuristic AGI. Likewise, for the AGI aspect, it seems like there should be all kinds of machine learning techniques and advances in probability theory (for example) that would be equally useful for pretty much any kind of AGI—after all, we already know that an understanding of e.g. Bayes’ theorem and expected utility will be necessary for pretty much any kind of AGI implementation, so why should we assume that all of the insights that will be useful in many kinds of contexts would have been developed already?
Right, by the above I meant to say “the right kind of AGI + Friendliness”; I certainly agree that there are many conceivable ways of building AGIs that would be impossible to ever make Friendly.