I would be careful using reinforcement learning to check for theoretical maximization of training data, given that plenty of agents generally do not start out with 0 bits of information about the environment. The shape of input data/​action space is still useful information.
Even in designing the agent itself, it seems to me that general knowledge of human-related systems could be introduced into the architecture.
Selecting the architecture that gives us highest upper-bound for information utilization in a system is also, in some sense, inserting extra data.
I would be careful using reinforcement learning to check for theoretical maximization of training data, given that plenty of agents generally do not start out with 0 bits of information about the environment. The shape of input data/​action space is still useful information.
Even in designing the agent itself, it seems to me that general knowledge of human-related systems could be introduced into the architecture.
Selecting the architecture that gives us highest upper-bound for information utilization in a system is also, in some sense, inserting extra data.