When you say “we can check what would happen if the AI was given the ability to launch the world’s nuclear arsenals”, do you mean the question you’re really asking is “would it be a good (from the AI’s perspective) if the nuclear missiles were launched as a result of some fluke”? Because if you’re literally asking “would the AI launch nuclear missiles” than you run into the Newcomb obstacle I described since the AI that might launch nuclear missiles is the “wild type” AI without special control handicaps.
Also, there are questions for which you don’t want to know the detailed answer. For example “the text of a speech of an important world leader” that the AI would create is something to which you don’t want to expose your own brain.
When you say “we can check what would happen if the AI was given the ability to launch the world’s nuclear arsenals”, do you mean the question you’re really asking is “would it be a good (from the AI’s perspective) if the nuclear missiles were launched as a result of some fluke”? Because if you’re literally asking “would the AI launch nuclear missiles” than you run into the Newcomb obstacle I described since the AI that might launch nuclear missiles is the “wild type” AI without special control handicaps.
Also, there are questions for which you don’t want to know the detailed answer. For example “the text of a speech of an important world leader” that the AI would create is something to which you don’t want to expose your own brain.