I think that you should have a 3rd sub-question of “Can we prevent the AI from significantly altering the system it wishes to answer a question about in order to answer the question?”
For a (ridiculous and slightly contrived) example:
“Tell me, as precisely as possible, how long I have to live.”
“2 seconds.”
“What?”
Neutral AI kills human.
Here, rather than calculating an expected answer based on genetics, lifestyle, etc., the AI finds that it can accurately answer the question if it simply kills the human.
Less ridiculously:
“This frog is sick, what’s wrong with it?”
AI proceeds with all manner of destructive tests which, while conclusively finding out
what’s wrong with the frog, kills it.
If you wanted to make the frog better, that’s a problem.
I think that you should have a 3rd sub-question of “Can we prevent the AI from significantly altering the system it wishes to answer a question about in order to answer the question?”
For a (ridiculous and slightly contrived) example:
Here, rather than calculating an expected answer based on genetics, lifestyle, etc., the AI finds that it can accurately answer the question if it simply kills the human.
Less ridiculously:
If you wanted to make the frog better, that’s a problem.