Would changing how the reward function pays off work? Instead of rewarding based on humans, pay out all rewards when the vault is checked (at a time unknown to the AI). The AI isn’t asked if the diamond is present or absent. Instead, it is asked “If the vault were checked now, do you want to be rewarded if the diamond is present or absent.
Would changing how the reward function pays off work? Instead of rewarding based on humans, pay out all rewards when the vault is checked (at a time unknown to the AI). The AI isn’t asked if the diamond is present or absent. Instead, it is asked “If the vault were checked now, do you want to be rewarded if the diamond is present or absent.