I’m trying to point at “myopic RL”, which does, in fact, do things.
Ah, an off-by-one miscommunication. Sure, it’s both rational and competently goal-directed.
I do object, and still object, since I don’t think we can realistically include the current time in the state.
I mean, if you want to go down that route, then “win at least one medal” is also not state-dependent, because you can’t realistically include “whether Alice has won a medal” in the state: you can only include an impression of whether Alice has won a medal, based on past and current observations. So I still have the same objection.
finite-horizon variants of AIXI have this “problem” of time-inconsistent preferences
Oh, I see. You probably mean AI systems that act as though they have goals that will only last for e.g. 5 seconds. Then, 2 seconds later, they act as though they have goals that will last for 5 more seconds, i.e. 7 seconds after the initial time. (I was thinking of agents that initially care about the next 5 seconds, and then after 2 seconds, they care about the next 3 seconds, and after 7 seconds, they don’t care about anything.)
I agree that the preferences you were talking about are time-inconsistent, and such agents seem both less rational and less competently goal-directed to me.
Ah, an off-by-one miscommunication. Sure, it’s both rational and competently goal-directed.
I mean, if you want to go down that route, then “win at least one medal” is also not state-dependent, because you can’t realistically include “whether Alice has won a medal” in the state: you can only include an impression of whether Alice has won a medal, based on past and current observations. So I still have the same objection.
Oh, I see. You probably mean AI systems that act as though they have goals that will only last for e.g. 5 seconds. Then, 2 seconds later, they act as though they have goals that will last for 5 more seconds, i.e. 7 seconds after the initial time. (I was thinking of agents that initially care about the next 5 seconds, and then after 2 seconds, they care about the next 3 seconds, and after 7 seconds, they don’t care about anything.)
I agree that the preferences you were talking about are time-inconsistent, and such agents seem both less rational and less competently goal-directed to me.