This depends on the internal structure of the thing. The inner workings of any particular human mind are mostly a black box to us. The internal workings of software need not be. If your AI has data structures and control logic that we can understand, you could dump results out and review by hand. For instance, there might be a debug interface that lets you unambiguously access the AI’s internal probability estimate for some contingency.
Note that you need not have a perfect understanding of how the AI works in order to rule out the presence of a whole shadow AI inside the running program.
This depends on the internal structure of the thing. The inner workings of any particular human mind are mostly a black box to us. The internal workings of software need not be. If your AI has data structures and control logic that we can understand, you could dump results out and review by hand. For instance, there might be a debug interface that lets you unambiguously access the AI’s internal probability estimate for some contingency.
Note that you need not have a perfect understanding of how the AI works in order to rule out the presence of a whole shadow AI inside the running program.