I’m tempted to agree and disagree with you at the same time… I agree that memory should be cleared between tasks in this case, and I agree that it should not be trying to guess the user’s intentions. These are things that are likely to make alignment harder while not helping much with the primary task of getting coffee.
But ideally a truly robust solution would not rely on keeping the robot ignorant of things. So, like you said, the problem is still hard enough that you can’t solve it in a few minutes.
But still, like you said… it certainly seems we have tools that are in some sense more steerable than pure reinforcement learning at least. Which is really nice!
Yeah, I think it’s that one