The agent normally won’t even know “explicit values” of actual action and actual outcome. Knowing actual value would break the illusion of consistent consequences: suppose the agent is consistent, knows that A=2, and isn’t out of time yet, then it can prove [A=1 ⇒ O=100000], even if in fact O=1000, use that moral argument to beat any other with worse promised outcome, and decide A=1, contradiction.
This would only happen if the agent had a rule of inference that allowed it to infer from
A=1 ⇒ O=100000
and
all other promised outcomes are worse than 100000
that
A = 1.
But why would the first-order theory use such a rule of inference? You seem to have just given an argument for why we shouldn’t put this rule of inference into the theory.
ETA: I guess that my point leads right to your conclusion, and explains it. The agent is built so that, upon deducing the first two bullet-points, the agent proceeds to do the action assigned to the constant symbol 1 by the interpretation. But the point is that the agent doesn’t bother to infer the third bullet-point; the agent just acts. As a result, it never deduces any formulas of the form [A=X], which is what you were saying.
The agent never proves A=1, but it does (by assumption) prove A=2, while in fact it turns out that the agent acts as A=1 (without proving it), and so in the standard model it’s true that A=1, while agent’s theory says that in standard model A=2, which means that agent’s theory is inconsistent, which contradicts the assumption that it’s consistent.
Okay. I was confused because you wrote that the agent “can . . . decide A=1”, rather than “will do action 1.”
Is there a rule of inference in the system such that the first two bullet points above entail the third within the system? I see that [A=1] is true in the standard model, but the agent’s theory isn’t complete in general. So should we expect that [A=1] is a theorem of the agent’s theory? (ETA1: Okay, I think that your comment here, explains why [A=1] must be a theorem if the agent actually does 1. But I still need to think it through.)
(ETA2: Now that I better understand the axiom that defines A in the theory, I see why [A=1] must be a theorem if the agent actually does 1.)
Also, it seems that your proof only goes through if the agent “knows that A=2” when the agent will not in fact do action v(2) (which directly contradict soundness). But if the agent knows [A=X], where v(X) is the actual action, then all we can conclude is that the agent declines to observe my first two bullet-points above in a way that would induce it to do v(1). (Here, v(X) is the interpretation of the constant symbol X under the standard model.)
Is there a rule of inference in the system such that the first two bullet points above entail the third within the system? I see that [A=1] is true in the standard model, but the agent’s theory isn’t complete in general. So why should we expect that [A=1] is a theorem of the agent’s theory?
Every recursive function is representable in Robinson’s arithmetic Q, that is for any (say, 1-ary) recursive function F, there is a formula w such that F(n)=m ⇒ Q|- w(n,m) and F(n)<>m ⇒ Q|- ~w(n,m). Hence, statements like this that hold in the standard model, also hold in any model.
That the agent doesn’t avoid looking for further moral arguments to prove is reflected in “isn’t out of time” condition, which is not formalized, and is listed as the first open problem. If in fact A=X, then it can’t prove ludicrous moral arguments with A=X as the premise, only the actual utility, but it clearly can prove arguments that beat the true one by using a false premise.
Every recursive function is representable in Robinson’s arithmetic Q, that is for any (say, 1-ary) recursive function F, there is a formula w such that F(n)=m ⇒ Q|- w(n,m) and F(n)<>m ⇒ Q|- ~w(n,m). Hence, statements like this that hold in the standard model, also hold in any model.
I think that I need to see this spelled out more. I take it that, in this case, your formula w is w(A,X) = [A=X]. What is your recursive function F?
We are representing the no-arguments agent program agent() using a formula w with one free variable, such that agent()=n ⇒ Q|- w(n) and agent()<>n ⇒ Q|- ~w(n). Actual action is defined by the axiom w(A), where A is a new constant symbol.
Okay, I’m convinced. In case it helps someone else, here is how I now understand your argument.
We have an agent written in some program. Because the agent is a computer program, and because our 1st-order logic can handle recursion, we can write down a wff [w(x)], with one free variable x, such that, for any action-constant X, [w(X)] is a theorem if and only if the agent does v(X).
[ETA: Vladimir points out that it isn’t handling recursion alone that suffices. Nonetheless, a theory like PA or ZFC is apparently powerful enough to do this. I don’t yet understand the details of how this works, but it certainly seems very plausible to me.]
In particular, if the agent goes through a sequence of operations that conclude with the agent doing v(X), then that sequence of operations can be converted systematically into a proof of [w(X)]. Conversely, if [w(X)] is a theorem, then it has a proof that can be reinterpreted as the sequence of operations that the agent will carry out, and which will conclude with the agent doing v(X).
The wff [w(x)] also has the property that, given two constant U and V, [w(U) & w(V)] entails [U = V].
Now, the agent’s axiom system includes [w(A)], where A is a constant symbol. Thus, if the agent does v(X), then [w(A)] and [w(X)] are both theorems, so we must have that [A=X] is a theorem.
This works (although “because our 1st-order logic can handle recursion” is not it, etc.). (Note that “[w(X)] is a theorem of T if and only if X is as we need” is weak T-representability, while I cited the strong kind, that also guarantees [~w(X)] being a theorem if X is not as we need.)
(although “because our 1st-order logic can handle recursion” is not it, etc.).
That was there because of this line from your post: “The theory should provide sufficient tools to define recursive functions and/or other necessary concepts.”
Okay, thanks. My understanding is definitely vaguest at the point where the agent’s program is converted into the wff [w(x)]. Still, the argument is at the point of seeming very plausible to me.
Not so much rusty as never pursued beyond the basics. The logic I know is mostly from popular books like Gödel, Escher, Bach, plus a philosophy course on modal logic, where I learned the basic concepts used to talk about interpretations of theories.
Still commenting while reading:
This would only happen if the agent had a rule of inference that allowed it to infer from
A=1 ⇒ O=100000
and
all other promised outcomes are worse than 100000
that
A = 1.
But why would the first-order theory use such a rule of inference? You seem to have just given an argument for why we shouldn’t put this rule of inference into the theory.
ETA: I guess that my point leads right to your conclusion, and explains it. The agent is built so that, upon deducing the first two bullet-points, the agent proceeds to do the action assigned to the constant symbol 1 by the interpretation. But the point is that the agent doesn’t bother to infer the third bullet-point; the agent just acts. As a result, it never deduces any formulas of the form [A=X], which is what you were saying.
The agent never proves A=1, but it does (by assumption) prove A=2, while in fact it turns out that the agent acts as A=1 (without proving it), and so in the standard model it’s true that A=1, while agent’s theory says that in standard model A=2, which means that agent’s theory is inconsistent, which contradicts the assumption that it’s consistent.
Okay. I was confused because you wrote that the agent “can . . . decide A=1”, rather than “will do action 1.”
Is there a rule of inference in the system such that the first two bullet points above entail the third within the system? I see that [A=1] is true in the standard model, but the agent’s theory isn’t complete in general. So should we expect that [A=1] is a theorem of the agent’s theory? (ETA1: Okay, I think that your comment here, explains why [A=1] must be a theorem if the agent actually does 1. But I still need to think it through.)
(ETA2: Now that I better understand the axiom that defines A in the theory, I see why [A=1] must be a theorem if the agent actually does 1.)
Also, it seems that your proof only goes through if the agent “knows that A=2” when the agent will not in fact do action v(2) (which directly contradict soundness). But if the agent knows [A=X], where v(X) is the actual action, then all we can conclude is that the agent declines to observe my first two bullet-points above in a way that would induce it to do v(1). (Here, v(X) is the interpretation of the constant symbol X under the standard model.)
Every recursive function is representable in Robinson’s arithmetic Q, that is for any (say, 1-ary) recursive function F, there is a formula w such that F(n)=m ⇒ Q|- w(n,m) and F(n)<>m ⇒ Q|- ~w(n,m). Hence, statements like this that hold in the standard model, also hold in any model.
That the agent doesn’t avoid looking for further moral arguments to prove is reflected in “isn’t out of time” condition, which is not formalized, and is listed as the first open problem. If in fact A=X, then it can’t prove ludicrous moral arguments with A=X as the premise, only the actual utility, but it clearly can prove arguments that beat the true one by using a false premise.
I think that I need to see this spelled out more. I take it that, in this case, your formula w is w(A,X) = [A=X]. What is your recursive function F?
We are representing the no-arguments agent program agent() using a formula w with one free variable, such that agent()=n ⇒ Q|- w(n) and agent()<>n ⇒ Q|- ~w(n). Actual action is defined by the axiom w(A), where A is a new constant symbol.
Okay, I’m convinced. In case it helps someone else, here is how I now understand your argument.
We have an agent written in some program. Because the agent is a computer program, and because our 1st-order logic can handle recursion, we can write down a wff [w(x)], with one free variable x, such that, for any action-constant X, [w(X)] is a theorem if and only if the agent does v(X).
[ETA: Vladimir points out that it isn’t handling recursion alone that suffices. Nonetheless, a theory like PA or ZFC is apparently powerful enough to do this. I don’t yet understand the details of how this works, but it certainly seems very plausible to me.]
In particular, if the agent goes through a sequence of operations that conclude with the agent doing v(X), then that sequence of operations can be converted systematically into a proof of [w(X)]. Conversely, if [w(X)] is a theorem, then it has a proof that can be reinterpreted as the sequence of operations that the agent will carry out, and which will conclude with the agent doing v(X).
The wff [w(x)] also has the property that, given two constant U and V, [w(U) & w(V)] entails [U = V].
Now, the agent’s axiom system includes [w(A)], where A is a constant symbol. Thus, if the agent does v(X), then [w(A)] and [w(X)] are both theorems, so we must have that [A=X] is a theorem.
This works (although “because our 1st-order logic can handle recursion” is not it, etc.). (Note that “[w(X)] is a theorem of T if and only if X is as we need” is weak T-representability, while I cited the strong kind, that also guarantees [~w(X)] being a theorem if X is not as we need.)
That was there because of this line from your post: “The theory should provide sufficient tools to define recursive functions and/or other necessary concepts.”
Doesn’t make it an explanatory sufficient condition to conclude what you did: I object to your use of “because”.
Okay, thanks. My understanding is definitely vaguest at the point where the agent’s program is converted into the wff [w(x)]. Still, the argument is at the point of seeming very plausible to me.
No worries. Your logic seems rusty though, so if you want to build something in this direction, you should probably reread a good textbook.
Not so much rusty as never pursued beyond the basics. The logic I know is mostly from popular books like Gödel, Escher, Bach, plus a philosophy course on modal logic, where I learned the basic concepts used to talk about interpretations of theories.