Both are huge difficulties, but most of the work in FAI is probably in the AI part, not the F part.
It is worth noting that taking the “F” seriously implies adding a rather significant amounts of work to the “AI part”. It requires whole extra orders of formal rigor and all the additional complications of provable goal stability under self improvement (regardless of what those goals happen to be). While this doesn’t matter for the purpose of answering smoofra’s question it seems to me that it could could be potentially misleading to neglect the difference in workload between creating “F-compatible AI” and “AI” when talking about the workload imposed by ‘F’.
Note that I don’t think I’m saying something controversial here. I am expecting this to just be wording that I am wary of rather than a fundamentally different understanding. But if I have actually misunderstood the MIRI position on the relative difficulty of Friendliness to arbitrary AI then I would appreciate being corrected. That would be significant new information for me to consider (and also extremely good news!)
The reason this is significant can of course be illustrated by considering the counterfactual world where a convergence thesis holds. Or, more relevantly, considering the possibility of GAI researchers that believe that a convergence thesis holds but somehow manage to be competent researchers anyhow. Their task becomes (crudely speaking) that of creating any AI that can make something smarter than itself. My estimate is that this is an order of magnitude simpler than the task for FAI creators creating an AI, even completely neglecting the work that goes into creating the goal system.
If I find out that I am mistaken about the relative difficulties here then I will get to drastically update my expectation of humanity surviving and in general in the direction of awesome things happening.
I take it that by “convergence thesis”, you’re referring to the statement that all sufficiently intelligent agents will have approximately the same values.
I take it that by “convergence thesis”, you’re referring to the statement that all sufficiently intelligent agents will have approximately the same values.
Both are huge difficulties, but most of the work in FAI is probably in the AI part, not the F part.
It is worth noting that taking the “F” seriously implies adding a rather significant amounts of work to the “AI part”. It requires whole extra orders of formal rigor and all the additional complications of provable goal stability under self improvement (regardless of what those goals happen to be). While this doesn’t matter for the purpose of answering smoofra’s question it seems to me that it could could be potentially misleading to neglect the difference in workload between creating “F-compatible AI” and “AI” when talking about the workload imposed by ‘F’.
Note that I don’t think I’m saying something controversial here. I am expecting this to just be wording that I am wary of rather than a fundamentally different understanding. But if I have actually misunderstood the MIRI position on the relative difficulty of Friendliness to arbitrary AI then I would appreciate being corrected. That would be significant new information for me to consider (and also extremely good news!)
The reason this is significant can of course be illustrated by considering the counterfactual world where a convergence thesis holds. Or, more relevantly, considering the possibility of GAI researchers that believe that a convergence thesis holds but somehow manage to be competent researchers anyhow. Their task becomes (crudely speaking) that of creating any AI that can make something smarter than itself. My estimate is that this is an order of magnitude simpler than the task for FAI creators creating an AI, even completely neglecting the work that goes into creating the goal system.
If I find out that I am mistaken about the relative difficulties here then I will get to drastically update my expectation of humanity surviving and in general in the direction of awesome things happening.
I take it that by “convergence thesis”, you’re referring to the statement that all sufficiently intelligent agents will have approximately the same values.
Yes.
OK, forget about F for a second. Isn’t the huge difficulty finding the right deductions to make, not formalizing them and verifying them?