My best guess about what’s preferable to what is still this way, but I’m significantly less certain of its truth (there are analogies that make the answer come out differently, and level of rigor in the above comment is not much better than these analogies). In any case, I don’t see how we can actually use these considerations. (I’m working in a direction that should ideally make questions like this more clear in the future.)
In any case, I don’t see how we can actually use these considerations.
If you know how to build a uFAI (or “probably somewhat reflective on its goal system but nowhere near provably Friendly” AI), build one and put it in an encrypted glass case. Ideally you would work out the AGI theory in your head, determine how long it would take to code the AGI after adjusting for planning fallacy, then be ready to start coding if doom is predictably going to occur. If doom isn’t predictable then the safety tradeoffs are larger. This can easily go wrong, obviously.
My best guess about what’s preferable to what is still this way, but I’m significantly less certain of its truth (there are analogies that make the answer come out differently, and level of rigor in the above comment is not much better than these analogies). In any case, I don’t see how we can actually use these considerations. (I’m working in a direction that should ideally make questions like this more clear in the future.)
If you know how to build a uFAI (or “probably somewhat reflective on its goal system but nowhere near provably Friendly” AI), build one and put it in an encrypted glass case. Ideally you would work out the AGI theory in your head, determine how long it would take to code the AGI after adjusting for planning fallacy, then be ready to start coding if doom is predictably going to occur. If doom isn’t predictable then the safety tradeoffs are larger. This can easily go wrong, obviously.