I think this is a useful perspective. A way to reformulate (part of) this is: an explanation is good when it looks like it was sampled from the distribution of explanations you would come up with yourself, if you thought about the problem long enough. This resonates a lot with some of my (and other people’s) thought about AI alignment, where we try to constrain the AI to “human-like” behavior or make it imitate human behavior (this happens in Delegative Reinforcement Learning, Quantilization and other approaches).
I think this is a useful perspective. A way to reformulate (part of) this is: an explanation is good when it looks like it was sampled from the distribution of explanations you would come up with yourself, if you thought about the problem long enough. This resonates a lot with some of my (and other people’s) thought about AI alignment, where we try to constrain the AI to “human-like” behavior or make it imitate human behavior (this happens in Delegative Reinforcement Learning, Quantilization and other approaches).