If your world model represents random variables such as “the action I will take in 1 second” then condition on that random variable being some value.
I don’t think that works, especially for kind of purpose you have in mind. For example suppose I’m in a situation where I’m pretty sure the normative/correct action is A but due to to things like cosmic rays I have some chance of choosing B. Then if I condition on “the action I will take in 1 second is B” I will mostly be conditioning on choosing B due to things like cosmic rays, which would be very different from conditioning on “source code X outputs action B”.
It wouldn’t be hard to code up a reinforcement learning agent based on EDT (that’s essentially what on-policy learning is), which isn’t EDT proper due to not having a world model, but which strongly suggests that EDT is coherent.
Can you explain what the connection between on-policy learning and EDT is? (And you’re not suggesting that an on-policy learning algorithm would directly produce an agent that would refrain from mathematical fraud for the kind of reason you give, or something analogous to that, right?)
The relevant question is how “mathematical truth” ends up seeming like a terminal value to so many; it’s unlikely to be baked in, it’s likely to be some Schelling point reached through a combination of priors and cultural learning.
It seems like truth and beauty are directly baked in and maybe there’s some learning involved for picking out or settling on what kinds of truth and beauty to value as a culture. But I’m not seeing how this supports your position.
Then if I condition on “the action I will take in 1 second is B” I will mostly be conditioning on choosing B due to things like cosmic rays
This is an issue of EDT having problems, I wrote about this problem and a possible solution here.
Can you explain what the connection between on-policy learning and EDT is?
The q-values in on-policy learning are computed based on expected values estimated from the policy’s own empirical history. Very similar to E[utility | I take action A, my policy is π]; these converge in the limit.
And you’re not suggesting that an on-policy learning algorithm would directly produce an agent that would refrain from mathematical fraud for the kind of reason you give, or something analogous to that, right?
I am. Consider tragedy of the commons which is simpler. If there are many on-policy RL agents that are playing tragedy of the commons and are synchronized with each other (so they always take the same action, including exploration actions) then they can notice that they expect less utility when they defect than when they cooperate.
But I’m not seeing how this supports your position.
My position is roughly “people are coordinating towards mathematical epistemology and such coordination involves accepting an ‘ought’ of not committing mathematical fraud”. Such coordination is highly functional, so we should expect good decision theories to manage something at least as good as it. At the very least, learning a good decision theory shouldn’t result in failing at such coordination problems, relative to the innocent who don’t know good decision theory.
This is an issue of EDT having problems, I wrote about this problem and a possible solution here.
That post seems to be trying to solve a different problem (it still assumes that the agent knows its own source code, AFAICT). Can you please re-read what I wrote and if that post really is addressing the same problem, explain how?
I am. Consider tragedy of the commons which is simpler. If there are many on-policy RL agents that are playing tragedy of the commons and are synchronized with each other (so they always take the same action, including exploration actions) then they can notice that they expect less utility when they defect than when they cooperate.
I see, but the synchronization seems rather contrived. To the extent that humans are RL agents, our learning algorithms are not synchronized (and defecting in tragedy of the commons happens very often as a result), so why is synchronized RL relevant? I don’t see how this is supposed to help convince a skeptic.
Can you please re-read what I wrote and if that post really is addressing the same problem, explain how?
You’re right, the post doesn’t address that issue. I agree that it is unclear how to apply EDT as a human. However, humans can still learn from abstract agents.
I see, but the synchronization seems rather contrived.
Okay, here’s an attempt at stating the argument more clearly:
You’re a bureaucrat in a large company. You’re keeping track of how much money the company has. You believe there were previous bureaucrats there before you, who are following your same decision theory. Both you and the previous bureaucrats could have corrupted the records of the company to change how much money the company believes itself to have. If any past bureaucrat has corrupted the records, the records are wrong. You don’t know how long the company has been around or where in the chain you are; all you know is that there will be 100 bureaucrats in total.
You (and other bureaucrats) want somewhat to corrupt the records, but want even more to know how much money the company has. Do you corrupt the records?
UDT says ‘no’ due to a symmetry argument that if you corrupt the records than so do all past bureaucrats. So does COEDT. Both believe that, if you corrupt the records, you don’t have knowledge of how much money the company has.
(Model-free RL doesn’t have enough of a world model to get these symmetries without artificial synchronization)
The assumption “You don’t know how long the company has been around or where in the chain you are” seems unrealistic/contrived, much like the assumption of “synchronized RL” in your previous argument. Again this seems like it’s not going to be very convincing to a skeptic, at least without, for example, a further argument for why the assumption actually makes sense on some deeper level.
Aside from that, here’s a counter-argument: among all fields of research, math is probably one of the hardest to corrupt, because publishing theorems require proofs which can be checked relatively easily and if frauds/errors (false theorems) creep into the literature anyway, eventually a contradiction will be derived and the field will know something went wrong and backtrack to find the problem. If fear of acausally corrupting the current state of the field is the main reason for refraining from doing fraud, then math ought to have a higher amount of fraud relative to other fields, but actually the opposite is true (AFAICT).
I don’t think that works, especially for kind of purpose you have in mind. For example suppose I’m in a situation where I’m pretty sure the normative/correct action is A but due to to things like cosmic rays I have some chance of choosing B. Then if I condition on “the action I will take in 1 second is B” I will mostly be conditioning on choosing B due to things like cosmic rays, which would be very different from conditioning on “source code X outputs action B”.
Can you explain what the connection between on-policy learning and EDT is? (And you’re not suggesting that an on-policy learning algorithm would directly produce an agent that would refrain from mathematical fraud for the kind of reason you give, or something analogous to that, right?)
It seems like truth and beauty are directly baked in and maybe there’s some learning involved for picking out or settling on what kinds of truth and beauty to value as a culture. But I’m not seeing how this supports your position.
This is an issue of EDT having problems, I wrote about this problem and a possible solution here.
The q-values in on-policy learning are computed based on expected values estimated from the policy’s own empirical history. Very similar to E[utility | I take action A, my policy is π]; these converge in the limit.
I am. Consider tragedy of the commons which is simpler. If there are many on-policy RL agents that are playing tragedy of the commons and are synchronized with each other (so they always take the same action, including exploration actions) then they can notice that they expect less utility when they defect than when they cooperate.
My position is roughly “people are coordinating towards mathematical epistemology and such coordination involves accepting an ‘ought’ of not committing mathematical fraud”. Such coordination is highly functional, so we should expect good decision theories to manage something at least as good as it. At the very least, learning a good decision theory shouldn’t result in failing at such coordination problems, relative to the innocent who don’t know good decision theory.
That post seems to be trying to solve a different problem (it still assumes that the agent knows its own source code, AFAICT). Can you please re-read what I wrote and if that post really is addressing the same problem, explain how?
I see, but the synchronization seems rather contrived. To the extent that humans are RL agents, our learning algorithms are not synchronized (and defecting in tragedy of the commons happens very often as a result), so why is synchronized RL relevant? I don’t see how this is supposed to help convince a skeptic.
You’re right, the post doesn’t address that issue. I agree that it is unclear how to apply EDT as a human. However, humans can still learn from abstract agents.
Okay, here’s an attempt at stating the argument more clearly:
You’re a bureaucrat in a large company. You’re keeping track of how much money the company has. You believe there were previous bureaucrats there before you, who are following your same decision theory. Both you and the previous bureaucrats could have corrupted the records of the company to change how much money the company believes itself to have. If any past bureaucrat has corrupted the records, the records are wrong. You don’t know how long the company has been around or where in the chain you are; all you know is that there will be 100 bureaucrats in total.
You (and other bureaucrats) want somewhat to corrupt the records, but want even more to know how much money the company has. Do you corrupt the records?
UDT says ‘no’ due to a symmetry argument that if you corrupt the records than so do all past bureaucrats. So does COEDT. Both believe that, if you corrupt the records, you don’t have knowledge of how much money the company has.
(Model-free RL doesn’t have enough of a world model to get these symmetries without artificial synchronization)
The assumption “You don’t know how long the company has been around or where in the chain you are” seems unrealistic/contrived, much like the assumption of “synchronized RL” in your previous argument. Again this seems like it’s not going to be very convincing to a skeptic, at least without, for example, a further argument for why the assumption actually makes sense on some deeper level.
Aside from that, here’s a counter-argument: among all fields of research, math is probably one of the hardest to corrupt, because publishing theorems require proofs which can be checked relatively easily and if frauds/errors (false theorems) creep into the literature anyway, eventually a contradiction will be derived and the field will know something went wrong and backtrack to find the problem. If fear of acausally corrupting the current state of the field is the main reason for refraining from doing fraud, then math ought to have a higher amount of fraud relative to other fields, but actually the opposite is true (AFAICT).