For instance, if anything dangerous approached the AIXI’s location, the human could lower the AIXI’s reward, until it became very effective at deflecting danger. The more variety of things that could potentially threaten the AIXI, the more likely it is to construct plans of actions that contain behaviours that look a lot like “defend myself.” [...]
It seems like you’re just hardcoding the behavior, trying to get a human to cover all the cases for AIXI instead of modifying AIXI to deal with the general problem itself.
I get that you’re hoping it will infer the general problem, but nothing stops it from learning a related rule like “Human sensing danger is bad.”. Since humans are imperfect at sensing danger, that rule will better predict what’s happening compared to the actual danger you want AIXI to model. Then it removes your fear and experiments with nuclear weapons. Hurray!
It seems like you’re just hardcoding the behavior, trying to get a human to cover all the cases for AIXI instead of modifying AIXI to deal with the general problem itself.
I get that you’re hoping it will infer the general problem, but nothing stops it from learning a related rule like “Human sensing danger is bad.”. Since humans are imperfect at sensing danger, that rule will better predict what’s happening compared to the actual danger you want AIXI to model. Then it removes your fear and experiments with nuclear weapons. Hurray!
Very valid point.