This is an issue of EDT having problems, I wrote about this problem and a possible solution here.
That post seems to be trying to solve a different problem (it still assumes that the agent knows its own source code, AFAICT). Can you please re-read what I wrote and if that post really is addressing the same problem, explain how?
I am. Consider tragedy of the commons which is simpler. If there are many on-policy RL agents that are playing tragedy of the commons and are synchronized with each other (so they always take the same action, including exploration actions) then they can notice that they expect less utility when they defect than when they cooperate.
I see, but the synchronization seems rather contrived. To the extent that humans are RL agents, our learning algorithms are not synchronized (and defecting in tragedy of the commons happens very often as a result), so why is synchronized RL relevant? I don’t see how this is supposed to help convince a skeptic.
Can you please re-read what I wrote and if that post really is addressing the same problem, explain how?
You’re right, the post doesn’t address that issue. I agree that it is unclear how to apply EDT as a human. However, humans can still learn from abstract agents.
I see, but the synchronization seems rather contrived.
Okay, here’s an attempt at stating the argument more clearly:
You’re a bureaucrat in a large company. You’re keeping track of how much money the company has. You believe there were previous bureaucrats there before you, who are following your same decision theory. Both you and the previous bureaucrats could have corrupted the records of the company to change how much money the company believes itself to have. If any past bureaucrat has corrupted the records, the records are wrong. You don’t know how long the company has been around or where in the chain you are; all you know is that there will be 100 bureaucrats in total.
You (and other bureaucrats) want somewhat to corrupt the records, but want even more to know how much money the company has. Do you corrupt the records?
UDT says ‘no’ due to a symmetry argument that if you corrupt the records than so do all past bureaucrats. So does COEDT. Both believe that, if you corrupt the records, you don’t have knowledge of how much money the company has.
(Model-free RL doesn’t have enough of a world model to get these symmetries without artificial synchronization)
The assumption “You don’t know how long the company has been around or where in the chain you are” seems unrealistic/contrived, much like the assumption of “synchronized RL” in your previous argument. Again this seems like it’s not going to be very convincing to a skeptic, at least without, for example, a further argument for why the assumption actually makes sense on some deeper level.
Aside from that, here’s a counter-argument: among all fields of research, math is probably one of the hardest to corrupt, because publishing theorems require proofs which can be checked relatively easily and if frauds/errors (false theorems) creep into the literature anyway, eventually a contradiction will be derived and the field will know something went wrong and backtrack to find the problem. If fear of acausally corrupting the current state of the field is the main reason for refraining from doing fraud, then math ought to have a higher amount of fraud relative to other fields, but actually the opposite is true (AFAICT).
That post seems to be trying to solve a different problem (it still assumes that the agent knows its own source code, AFAICT). Can you please re-read what I wrote and if that post really is addressing the same problem, explain how?
I see, but the synchronization seems rather contrived. To the extent that humans are RL agents, our learning algorithms are not synchronized (and defecting in tragedy of the commons happens very often as a result), so why is synchronized RL relevant? I don’t see how this is supposed to help convince a skeptic.
You’re right, the post doesn’t address that issue. I agree that it is unclear how to apply EDT as a human. However, humans can still learn from abstract agents.
Okay, here’s an attempt at stating the argument more clearly:
You’re a bureaucrat in a large company. You’re keeping track of how much money the company has. You believe there were previous bureaucrats there before you, who are following your same decision theory. Both you and the previous bureaucrats could have corrupted the records of the company to change how much money the company believes itself to have. If any past bureaucrat has corrupted the records, the records are wrong. You don’t know how long the company has been around or where in the chain you are; all you know is that there will be 100 bureaucrats in total.
You (and other bureaucrats) want somewhat to corrupt the records, but want even more to know how much money the company has. Do you corrupt the records?
UDT says ‘no’ due to a symmetry argument that if you corrupt the records than so do all past bureaucrats. So does COEDT. Both believe that, if you corrupt the records, you don’t have knowledge of how much money the company has.
(Model-free RL doesn’t have enough of a world model to get these symmetries without artificial synchronization)
The assumption “You don’t know how long the company has been around or where in the chain you are” seems unrealistic/contrived, much like the assumption of “synchronized RL” in your previous argument. Again this seems like it’s not going to be very convincing to a skeptic, at least without, for example, a further argument for why the assumption actually makes sense on some deeper level.
Aside from that, here’s a counter-argument: among all fields of research, math is probably one of the hardest to corrupt, because publishing theorems require proofs which can be checked relatively easily and if frauds/errors (false theorems) creep into the literature anyway, eventually a contradiction will be derived and the field will know something went wrong and backtrack to find the problem. If fear of acausally corrupting the current state of the field is the main reason for refraining from doing fraud, then math ought to have a higher amount of fraud relative to other fields, but actually the opposite is true (AFAICT).