I suppose she could reason that if she can’t prove that her formal system is consistent, then she can’t trust that the actions she can prove to be “good” are actually “good”. Or, even if she trusts her present self, she can’t trust her future selves...
Lucy works by trying to prove a given action is “good” and doing it if she’s successful. She doesn’t need to prove that “there is no proof A is not good” into order to do A. However it’s true she can’t trust her future selves. Indeed it can be a problem even without self-modifications. For example, if someone drops an anvil on Lucy, Lucy cannot reason that “I should dodge the anvil because if I get destroyed I won’t do good things in the future”. She can still have a hard-coded aversion to being destroyed, but this seems like a hack.
...Generalize this to arbitrary problems and you get UDT or something alike.
UDT assumes a system of assigning probabilities to mathematical sentences given a (possibly inconsistent) theory. For UDT to admit naturalistic trust, it needs that system to overcome the Loebian obstacle, like the system I propose here. Indeed the system I propose is intended as a component of UDT or UIM.
“I should dodge the anvil because if I get destroyed I won’t do good things in the future”. She can still have a hard-coded aversion to being destroyed, but this seems like a hack.
I think it’s worse than that. If your argument is correct, the type of AI you are describing can’t plan because it can’t trust its future selves to follow through with the plan, even if doing so wouldn’t require commitment. It would be limited to immediate utility maximization.
It seems that your argument proves too much. In fact, I think that Lucy can still prove useful properties about its successors even without proving the consistency of their and therefore her own, proof system.
UDT assumes a system of assigning probabilities to mathematical sentences given a (possibly inconsistent) theory. For UDT to admit naturalistic trust, it needs that system to overcome the Loebian obstacle, like the system I propose here. Indeed the system I propose is intended as a component of UDT or UIM.
That’s not how UDT works in my understanding, even though I shall admit that I’m not an expert on the subject. Do you have a reference?
I think it’s worse than that. If your argument is correct, the type of AI you are describing can’t plan because it can’t trust its future selves to follow through with the plan, even if doing so wouldn’t require commitment.
We can avoid this problem if Lucy performs an action once it is the first in a provably “good” sequence of actions. This would allow her to dodge the anvil if it interferes with her immediate plans, but not on the general grounds of “a universe with Lucy is a better universe since Lucy is doing good things”.
That’s not how UDT works in my understanding, even though I shall admit that I’m not an expert on the subject. Do you have a reference?
I don’t have a reference which discusses UDT and the Loebian obstacle together. You can find a description of the AFAIK “latest and greatest” UDT here. UDT considers proofs in a formal system. If this system suffers from the Loebian obstacle this will lead to the kind of problems I discuss here. In fact, I haven’t stated it explicitly but I think of Lucy as a UDT agent: she considers possible actions as logical counterfactuals and computes expected utility based on that.
Lucy works by trying to prove a given action is “good” and doing it if she’s successful. She doesn’t need to prove that “there is no proof A is not good” into order to do A. However it’s true she can’t trust her future selves. Indeed it can be a problem even without self-modifications. For example, if someone drops an anvil on Lucy, Lucy cannot reason that “I should dodge the anvil because if I get destroyed I won’t do good things in the future”. She can still have a hard-coded aversion to being destroyed, but this seems like a hack.
UDT assumes a system of assigning probabilities to mathematical sentences given a (possibly inconsistent) theory. For UDT to admit naturalistic trust, it needs that system to overcome the Loebian obstacle, like the system I propose here. Indeed the system I propose is intended as a component of UDT or UIM.
I think it’s worse than that. If your argument is correct, the type of AI you are describing can’t plan because it can’t trust its future selves to follow through with the plan, even if doing so wouldn’t require commitment. It would be limited to immediate utility maximization.
It seems that your argument proves too much. In fact, I think that Lucy can still prove useful properties about its successors even without proving the consistency of their and therefore her own, proof system.
That’s not how UDT works in my understanding, even though I shall admit that I’m not an expert on the subject. Do you have a reference?
We can avoid this problem if Lucy performs an action once it is the first in a provably “good” sequence of actions. This would allow her to dodge the anvil if it interferes with her immediate plans, but not on the general grounds of “a universe with Lucy is a better universe since Lucy is doing good things”.
I don’t have a reference which discusses UDT and the Loebian obstacle together. You can find a description of the AFAIK “latest and greatest” UDT here. UDT considers proofs in a formal system. If this system suffers from the Loebian obstacle this will lead to the kind of problems I discuss here. In fact, I haven’t stated it explicitly but I think of Lucy as a UDT agent: she considers possible actions as logical counterfactuals and computes expected utility based on that.