Maybe I don’t see a bright line between these things. Adding an “explaining module” to an existing AI and then doing more training is not so different from designing an AI that has an “explaining module” from the start. And training an AI with an “explaining module” isn’t so different from training an AI with a “making sure internal states are somewhat interpretable” module.
I’m probably advocating something close to “Ex-ante,” but with lots of learning, including learning that informs the AI what features of the world we want it to make interpretable to us.
Maybe I don’t see a bright line between these things. Adding an “explaining module” to an existing AI and then doing more training is not so different from designing an AI that has an “explaining module” from the start. And training an AI with an “explaining module” isn’t so different from training an AI with a “making sure internal states are somewhat interpretable” module.
I’m probably advocating something close to “Ex-ante,” but with lots of learning, including learning that informs the AI what features of the world we want it to make interpretable to us.