One idea that I haven’t heard much discussion of: build a superintelligent AI, have it create a model of the world, build a tool for exploring that model of the world, figure out where “implement CEV” resides in that model of the world (the proverbial B002 predicate), and tell the AI to do that. This would be predicated on the ability to create a foolproof AI box or otherwise have the AI create a very detailed model of the world without being motivated to do anything with it. I have a feeling AI boxing may be easier than Friendliness, because the AI box problem’s structure is disjunctive (if any of the barriers to the AI screwing humanity up work, the box has worked) whereas the Friendliness problem is conjunctive (if we get any single element of Friendliness wrong, we fail at the entire thing).
I suppose if the post-takeoff AI understands human language the same way we do, in principle you could write a book-length natural language description of what you want it to do and hardcode that in to its goal structure, but it seems a bit dubious.
One idea that I haven’t heard much discussion of: build a superintelligent AI, have it create a model of the world, build a tool for exploring that model of the world, figure out where “implement CEV” resides in that model of the world (the proverbial B002 predicate), and tell the AI to do that. This would be predicated on the ability to create a foolproof AI box or otherwise have the AI create a very detailed model of the world without being motivated to do anything with it. I have a feeling AI boxing may be easier than Friendliness, because the AI box problem’s structure is disjunctive (if any of the barriers to the AI screwing humanity up work, the box has worked) whereas the Friendliness problem is conjunctive (if we get any single element of Friendliness wrong, we fail at the entire thing).
I suppose if the post-takeoff AI understands human language the same way we do, in principle you could write a book-length natural language description of what you want it to do and hardcode that in to its goal structure, but it seems a bit dubious.