That is the opposite of what you said—Clippy, according to you, is maximizing the output of it’s critic network. And you can’t say “there’s not an explicit mathematical function”—any neural network with a specific set of weights is by definition an explicit mathematical function, just usually not a one with a compact representation.
What I was trying to say is that RL agents DO maximize the output of its critic network—but the critic network does not reflect states of the world directly. Therefore the total system isn’t directly a maximizer. The question I’m trying to pose is whether or not it acts like a maximizer, under given particular conditions of training and RL construction.
While you’re technically correct that an NN is a mathematical function, it seems fair to say that it’s not an explicit function in the sense that we can’t read or interpret it very well.
Clippy isn’t a maximizer. And neither is any current RL agent. I did mention that, but I’ll edit to make that clear.
That is the opposite of what you said—Clippy, according to you, is maximizing the output of it’s critic network. And you can’t say “there’s not an explicit mathematical function”—any neural network with a specific set of weights is by definition an explicit mathematical function, just usually not a one with a compact representation.
What I was trying to say is that RL agents DO maximize the output of its critic network—but the critic network does not reflect states of the world directly. Therefore the total system isn’t directly a maximizer. The question I’m trying to pose is whether or not it acts like a maximizer, under given particular conditions of training and RL construction.
While you’re technically correct that an NN is a mathematical function, it seems fair to say that it’s not an explicit function in the sense that we can’t read or interpret it very well.