It might be that the evolving-to-extinction policy of making the world harder to predict through logs is complicated enough that it can only emerge through a deceptive ticket deciding to pursue it—or it could be the case that it’s simple enough that one ticket could randomly start writing stuff to logs, get selected for, and end up pursuing such a policy without ever actually having come up with it explicitly.
I’m not sure about the latter. Suppose there is a “simple” ticket that randomly writes stuff to the logs in a way that makes future training examples harder to predict. I don’t see what would cause that ticket to be selected for.
If that ticket is better at predicting the random stuff it’s writing to the logs—which it should be if it’s generating that randomness—then that would be sufficient. However, that does rely on the logs directly being part of the prediction target rather than only through some complicated function like a human seeing them.
I’m not sure about the latter. Suppose there is a “simple” ticket that randomly writes stuff to the logs in a way that makes future training examples harder to predict. I don’t see what would cause that ticket to be selected for.
If that ticket is better at predicting the random stuff it’s writing to the logs—which it should be if it’s generating that randomness—then that would be sufficient. However, that does rely on the logs directly being part of the prediction target rather than only through some complicated function like a human seeing them.