However, an FAI is not to be given the CEV as it’s goal but rather a superintelligence is to use our CEV to determine what goals an FAI should be given. What this means though is that there will be a point where a superintelligence exists that is not friendly.
There will be a superintelligence that wants to be friendly, but isn’t sure what friendliness means exactly. It will at that point already have some useful heuristics for friendliness (e. g. killing humans is unlikely to be friendly, actions that would cause reactions consistent with pain are unlikely to be friendly, doing what the programmers tell it to is more likely to be friendly than the opposite, and so on). Explicitly designing a heuristic for the time between reaching superhuman level and understanding friendliness might be worthwhile, but probably not worth speding resources on at this point.
Explicitly designing a heuristic for the time between reaching superhuman level and understanding friendliness might be worthwhile, but probably not worth speding resources on at this point.
ISTM that the main thing the AI needs to understand is that a large amount of optimization pressure has already been applied towards Friendliness-like goals; thus, random changes to the state of the world are likely to be bad.
There will be a superintelligence that wants to be friendly, but isn’t sure what friendliness means exactly. It will at that point already have some useful heuristics for friendliness (e. g. killing humans is unlikely to be friendly, actions that would cause reactions consistent with pain are unlikely to be friendly, doing what the programmers tell it to is more likely to be friendly than the opposite, and so on). Explicitly designing a heuristic for the time between reaching superhuman level and understanding friendliness might be worthwhile, but probably not worth speding resources on at this point.
ISTM that the main thing the AI needs to understand is that a large amount of optimization pressure has already been applied towards Friendliness-like goals; thus, random changes to the state of the world are likely to be bad.