I agree that you need more than just reinforcement learning.
I’m sympathetic to your broader point, but until somebody says exactly what the rewards (a.k.a. “reinforcement events”) are, I’m withholding judgment.
So in a sense this is what I’m getting at. “This resembles prior ideas which seem flawed; how do you intend on avoiding those flaws?”.
I agree that you need more than just reinforcement learning.
So in a sense this is what I’m getting at. “This resembles prior ideas which seem flawed; how do you intend on avoiding those flaws?”.