What I was trying to say is that RL agents DO maximize the output of its critic network—but the critic network does not reflect states of the world directly. Therefore the total system isn’t directly a maximizer. The question I’m trying to pose is whether or not it acts like a maximizer, under given particular conditions of training and RL construction.
While you’re technically correct that an NN is a mathematical function, it seems fair to say that it’s not an explicit function in the sense that we can’t read or interpret it very well.
What I was trying to say is that RL agents DO maximize the output of its critic network—but the critic network does not reflect states of the world directly. Therefore the total system isn’t directly a maximizer. The question I’m trying to pose is whether or not it acts like a maximizer, under given particular conditions of training and RL construction.
While you’re technically correct that an NN is a mathematical function, it seems fair to say that it’s not an explicit function in the sense that we can’t read or interpret it very well.