1) that the problem can be stated as: the AI has latent knowledge of lots of variables, like the status of the cameras, doors, alarm system, etc and also whether the diamond is in the vault; but you can’t directly ask it whether the diamond is in the vault, because its training has taught it to answer “would a human observer think the diamond is in the vault?” instead (because there was no way at training time to give it feedback on whether it correctly predicted the diamond was in the vault, only feedback on whether it correctly predicted a human thought the diamond was in the vault)?
2) that you do have access to z, the large “vector of floats representing the generative model’s latent space”, but that you have no idea which part(s) of it represents the AI’s knowledge about whether the diamond is in the room?
Yes, that’s right. The key thing I’d add to 1) is that ARC believes most kinds of data augmentation (giving the human AI assistance, having the human think longer, giving them other kinds of advantages) are also unlikely to work, so you’d need to do something to “crack open the black box” and penalize ways the reporter is computing its answer. They could still be surprised by data augmentation techniques but they’d hold them to a higher standard.
Am I right in thinking:
1) that the problem can be stated as: the AI has latent knowledge of lots of variables, like the status of the cameras, doors, alarm system, etc and also whether the diamond is in the vault; but you can’t directly ask it whether the diamond is in the vault, because its training has taught it to answer “would a human observer think the diamond is in the vault?” instead (because there was no way at training time to give it feedback on whether it correctly predicted the diamond was in the vault, only feedback on whether it correctly predicted a human thought the diamond was in the vault)?
2) that you do have access to z, the large “vector of floats representing the generative model’s latent space”, but that you have no idea which part(s) of it represents the AI’s knowledge about whether the diamond is in the room?
Yes, that’s right. The key thing I’d add to 1) is that ARC believes most kinds of data augmentation (giving the human AI assistance, having the human think longer, giving them other kinds of advantages) are also unlikely to work, so you’d need to do something to “crack open the black box” and penalize ways the reporter is computing its answer. They could still be surprised by data augmentation techniques but they’d hold them to a higher standard.