the practical FAI project would nonetheless choose the AI’s goal system in the old-fashioned way, by human deliberation and consensus
[...]
Our practical FAI project has “solved” FAI by simply coming to an agreement on what to wish for, and by studying with legalistic care how to avoid pitfalls and loopholes in the finer details of the wish
If we are making an AGI, then humans think too slowly, in comparison, to be able to completely consider every single possible aspect of a “wish”, so I don’t think legalistic is strong enough, given the large negative utility of a mistake. A mathematical proof of Friendliness should be required, and that is what the formalisations of “hopelessly impractical models of cognition” (e.g. TDT) are a step towards.
If your goal is AGI, then you want a cognitive architecture that will exhibit these behaviors.
FTFY. If you are designing something to have behaviour x then you want behaviour x to definitely occur, possibly being built out of other behaviours, but not just “emerging” out of other behaviours.
If you are designing something to have behaviour x then you want behaviour x to definitely occur, possibly being built out of other behaviours, but not just “emerging” out of other behaviours.
I think the problem with the proposal is the opposite of what I think you think it is.
Omohundro’s universal AI instrumental values are things that, if absent in the final product, mean that you have failed. Their presence means little because one could simply design for them.
It’s not that we want these behaviors to occur; if we don’t know how they do then “emerging” or “arising in a way I do not understand” are fine phrases to use. If you don’t understand how they arise from the sub-units that you’ve carefully built, you’re probably, but not certainly, in a lot of trouble. If you try too hard to design the unit to do these behaviors directly, you’re hacking together a solution and are almost certainly failing, basically certainly failing less “so you’re saying there’s a chance”.
If we are making an AGI, then humans think too slowly, in comparison, to be able to completely consider every single possible aspect of a “wish”, so I don’t think legalistic is strong enough, given the large negative utility of a mistake. A mathematical proof of Friendliness should be required, and that is what the formalisations of “hopelessly impractical models of cognition” (e.g. TDT) are a step towards.
FTFY. If you are designing something to have behaviour x then you want behaviour x to definitely occur, possibly being built out of other behaviours, but not just “emerging” out of other behaviours.
I think the problem with the proposal is the opposite of what I think you think it is.
Omohundro’s universal AI instrumental values are things that, if absent in the final product, mean that you have failed. Their presence means little because one could simply design for them.
It’s not that we want these behaviors to occur; if we don’t know how they do then “emerging” or “arising in a way I do not understand” are fine phrases to use. If you don’t understand how they arise from the sub-units that you’ve carefully built, you’re probably, but not certainly, in a lot of trouble. If you try too hard to design the unit to do these behaviors directly, you’re hacking together a solution and are almost certainly failing, basically certainly failing less “so you’re saying there’s a chance”.
That’s what I was trying to say, thanks :)