I think it would be significantly easier to make FAI than LukeFreindly AI: for the latter, you need to do most of the work involved in the former, but also work out how to get the AI to find you (and not accidentally be freindly to someone else).
If it turns out that there’s a lot of coherance in human values, FAI will resemble LukeFreindlyAI quite closely anyway.
I think it would be significantly easier to make FAI than LukeFreindly AI
Massively backwards! Creating an FAI (presumably ‘friendly to humanity’) requires an AI that can somehow harvest and aggregate preferences over humans in general but an FAI just needs to scan one brain.
If FAI is HumanityFriendly rather than LukeFriendly, you have to work out how to get the AI to find humanity and not accidentally optimize for the extrapolated volition of some other group. It seems easier to me to establish parameters for “finding” Luke than for “finding” humanity.
Of course an arbitrarily chosen human’s values are more similar to to the aggregated values of humanity as a whole than humanity’s values are similar to an arbitrarily chosen point in value-space. Value-space is big.
I don’t see how my point depends on that, though. Your argument here claims that “FAI” is easier than “LukeFriendlyAI” because LFAI requires an additional step of defining the target, and FAI doesn’t require that step. I’m pointing out that FAI does require that step. In fact, target definition for “humanity” is a more difficult problem than target definition for “Luke”
I find it much more likely that it’s the other way around; making one for a single brain that already has an utility function seems much easier than finding out a good compromise between billions. Especially if the form “upload me, then preform this specific type of enchantment to enable me to safely continue self improving.” turns out to be safe enough.
I think it would be significantly easier to make FAI than LukeFreindly AI: for the latter, you need to do most of the work involved in the former, but also work out how to get the AI to find you (and not accidentally be freindly to someone else).
If it turns out that there’s a lot of coherance in human values, FAI will resemble LukeFreindlyAI quite closely anyway.
Massively backwards! Creating an FAI (presumably ‘friendly to humanity’) requires an AI that can somehow harvest and aggregate preferences over humans in general but an FAI just needs to scan one brain.
Scanning is unlikely to be the bottleneck for a GAI, and it seems most of the difficulty with CEV is from the Extrapolation part, not the Coherence.
It doesn’t matter how easy the parts may be, scanning, extrapolating and cohering all of humanity is harder than scanning and extrapolating Luke.
Not if Luke’s values contain pointers to all those other humans.
If FAI is HumanityFriendly rather than LukeFriendly, you have to work out how to get the AI to find humanity and not accidentally optimize for the extrapolated volition of some other group. It seems easier to me to establish parameters for “finding” Luke than for “finding” humanity.
Yes, it depends on whether you think Luke is more different from humanity than humanity is from StuffWeCareNotOf
Of course an arbitrarily chosen human’s values are more similar to to the aggregated values of humanity as a whole than humanity’s values are similar to an arbitrarily chosen point in value-space. Value-space is big.
I don’t see how my point depends on that, though. Your argument here claims that “FAI” is easier than “LukeFriendlyAI” because LFAI requires an additional step of defining the target, and FAI doesn’t require that step. I’m pointing out that FAI does require that step. In fact, target definition for “humanity” is a more difficult problem than target definition for “Luke”
I find it much more likely that it’s the other way around; making one for a single brain that already has an utility function seems much easier than finding out a good compromise between billions. Especially if the form “upload me, then preform this specific type of enchantment to enable me to safely continue self improving.” turns out to be safe enough.