I don’t understand much of this, and I want to, so let me start by asking basic questions in a much simpler setting.
We are playing Conway’s game of life with some given initial state. An disciple AI is given a 5 by 5 region of the board and allowed to manipulate its entries arbitrarily—information leaves that region according to the usual rules for the game.
The master AI decides on some algorithm for the disciple AI to execute. Then it runs the simulation with and without the disciple AI. The results can be compared directly—by, for example, counting the number of squares where the two futures differ. This can be a measure of the “impact” of the AI.
What complexities am I missing? Is it mainly that Conway’s game of life is deterministic and we are designing an AI for a stochastic world?
Have you tried this? Does it work?