The setup can be translated into a purely logical framework. There would be constants A (Agent) and W (World), satisfying the axioms:
(Ax1) A=1 OR A=2
(Ax2) (A=1 AND W=1000) OR (A=2 AND W=1000000)
(Ax3) forall a1,a2,w1,w2 ((A=a1 ⇒ W=w1) AND (A=a2 ⇒ W=w2) AND (w1>w2)) ⇒ NOT (A=a2)
Then the Agent’s algorithm would be equivalent to a step-by-step deduction:
(1) Ax2 |- A=1 ⇒ W=1000
(2) Ax2 |- A=2 ⇒ W=1000000
(3) Ax3 + (1) + (2) |- NOT (A=1)
(4) Ax1 + (3) |- A=2
In this form it is clear that there are no logical contradictions at any time. The system never believes A=1, it only believes (A=1 ⇒ W=1000), which is true.
In the purely logical framework, the word “could” is modeled by what Hofstadter calls the “Fantasy Rule” of propositional calculus.
The direction of setting the problem up in logical setting is further pursued here and here.
The axioms as you wrote them don’t fit or work for multiple reasons. Conceptually, the problem is in figuring out how decisions influence outcomes, so having something like (Ax2) defies this purpose. Statements like that are supposed to be inferred by the agent from more opaque-to-the-purpose-of-deciding definitions of A and W, and finding the statements of suitable form constitutes most of the activity in making a decision.
If the agent can figure out what it decides, then by default it becomes able to come up with spurious arguments for any decision, which makes its inference system (axioms) either inconsistent or unpredictable (if we assume consistency and don’t let (Ax3) into it). The rules that say which arguments lead to which actions shouldn’t be part of the theory that the agent uses to come up with the arguments. This excludes your (Ax3), which can only be assumed “externally” to the agent, a way of seeing if it’s doing OK, not something it should itself be able to see.
A contradiction can be inferred from your axioms in this way: (4) says A=2, which implies [A=1 ⇒ W=0], which by (Ax3) together with [A=2 ⇒ W=1000000] implies that A=1, which is a contradiction and a decision to do whatever (I think you’ve also mixed up A=1 and A=2 while going from (Ax2) to (1) and (2)).
Thanks! This site is rather large :) And the most interesting (for me) stuff is rarely linked from the sequences. That is, it seems so… Is there a more efficient way to get up to date?
[I would SO like to have found this site 5 years ago!]
The axioms were not proposed seriously as a complete solution, of course. The (Ax2) can be thought of as a cached consequence of a long chain of reasoning. The main point of the post was about the word “could”, after all.
(Ax3) is just an axiomatic restatement of the “utility maximization” goal. I don’t see why an agent can’t know about itself that it’s a utility maximizer.
A contradiction can be inferred from your axioms in this way: (4) says A=2, which implies [A=1 ⇒ W=0], which by (Ax3) together with [A=2 ⇒ W=1000000] implies that A=1, which is a contradiction and a decision to do whatever (I think you’ve also mixed up A=1 and A=2 while going from (Ax2) to (1) and (2)).
Yes, A=1 and A=2 were mixed up in the (Ax2), fixed it now. But I don’t see your contradiction.
(Ax3) + [A=1 ⇒ W=0] + [A=2 ⇒ W=1000000] does not imply A=1. It implies NOT(A=1)
See the posts linked to from UDT, ADT, TDT, section “Decision theory” in this post.
I don’t see why an agent can’t know about itself that it’s a utility maximizer.
It can, it’s just a consideration it can’t use in figuring out how the outcome depends on its decision. While looking for what is influenced by a decision, the decision itself should remain unknown (given that the decision is determined by what this dependence turns out to be, it’s not exactly self-handicapping).
Yes, A=1 and A=2 were mixed up in the (Ax2), fixed it now. But I don’t see your contradiction. (Ax3) + [A=1 ⇒ W=0] + [A=2 ⇒ W=1000000] does not imply A=1. It implies NOT(A=1)
It seems the mixing up of A=1 and A=2 didn’t spare this argument, but it’s easy enough to fix. From A=2 we have NOT(A=1), so [A=1 ⇒ W=1,000,000,000] and together with [A=2 ⇒ W=1,000,000] by (Ax3) we have A=1, contradiction.
If I attempted to fix this, I would try to change (Ax3) to something like:
forall a1,a2,w1,w2 ((A=a1 ⇒ W=w1) AND (A=a2 ⇒ W=w2) AND (w1>w2)) ⇒ NOT (FinalA=a2)
where FinalA is the actual decision. But then, why should the world’s axiom (Ax2) be defined in terms of A and not FinalA? This seems conceptually wrong...
The setup can be translated into a purely logical framework. There would be constants A (Agent) and W (World), satisfying the axioms:
(Ax1) A=1 OR A=2
(Ax2) (A=1 AND W=1000) OR (A=2 AND W=1000000)
(Ax3) forall a1,a2,w1,w2 ((A=a1 ⇒ W=w1) AND (A=a2 ⇒ W=w2) AND (w1>w2)) ⇒ NOT (A=a2)
Then the Agent’s algorithm would be equivalent to a step-by-step deduction:
(1) Ax2 |- A=1 ⇒ W=1000
(2) Ax2 |- A=2 ⇒ W=1000000
(3) Ax3 + (1) + (2) |- NOT (A=1)
(4) Ax1 + (3) |- A=2
In this form it is clear that there are no logical contradictions at any time. The system never believes A=1, it only believes (A=1 ⇒ W=1000), which is true.
In the purely logical framework, the word “could” is modeled by what Hofstadter calls the “Fantasy Rule” of propositional calculus.
The direction of setting the problem up in logical setting is further pursued here and here.
The axioms as you wrote them don’t fit or work for multiple reasons. Conceptually, the problem is in figuring out how decisions influence outcomes, so having something like (Ax2) defies this purpose. Statements like that are supposed to be inferred by the agent from more opaque-to-the-purpose-of-deciding definitions of A and W, and finding the statements of suitable form constitutes most of the activity in making a decision.
If the agent can figure out what it decides, then by default it becomes able to come up with spurious arguments for any decision, which makes its inference system (axioms) either inconsistent or unpredictable (if we assume consistency and don’t let (Ax3) into it). The rules that say which arguments lead to which actions shouldn’t be part of the theory that the agent uses to come up with the arguments. This excludes your (Ax3), which can only be assumed “externally” to the agent, a way of seeing if it’s doing OK, not something it should itself be able to see.
A contradiction can be inferred from your axioms in this way: (4) says A=2, which implies [A=1 ⇒ W=0], which by (Ax3) together with [A=2 ⇒ W=1000000] implies that A=1, which is a contradiction and a decision to do whatever (I think you’ve also mixed up A=1 and A=2 while going from (Ax2) to (1) and (2)).
Thanks! This site is rather large :) And the most interesting (for me) stuff is rarely linked from the sequences. That is, it seems so… Is there a more efficient way to get up to date? [I would SO like to have found this site 5 years ago!]
The axioms were not proposed seriously as a complete solution, of course. The (Ax2) can be thought of as a cached consequence of a long chain of reasoning. The main point of the post was about the word “could”, after all.
(Ax3) is just an axiomatic restatement of the “utility maximization” goal. I don’t see why an agent can’t know about itself that it’s a utility maximizer.
Yes, A=1 and A=2 were mixed up in the (Ax2), fixed it now. But I don’t see your contradiction. (Ax3) + [A=1 ⇒ W=0] + [A=2 ⇒ W=1000000] does not imply A=1. It implies NOT(A=1)
See the posts linked to from UDT, ADT, TDT, section “Decision theory” in this post.
It can, it’s just a consideration it can’t use in figuring out how the outcome depends on its decision. While looking for what is influenced by a decision, the decision itself should remain unknown (given that the decision is determined by what this dependence turns out to be, it’s not exactly self-handicapping).
It seems the mixing up of A=1 and A=2 didn’t spare this argument, but it’s easy enough to fix. From A=2 we have NOT(A=1), so [A=1 ⇒ W=1,000,000,000] and together with [A=2 ⇒ W=1,000,000] by (Ax3) we have A=1, contradiction.
I thought about a different axiomatization, which would not have the same consistency problems. Not sure whether this is the right place, but:
Let X be a variable ranging over all possible agents.
(World Axioms)
forall X [Decides(X)=1 ⇒ Receives(X)=1000000]
forall X [Decides(X)=2 ⇒ Receives(X)=1000]
(Agent Axioms—utility maximization)
BestX = argmax{X} Receives(X)
Decision = Decides(BestX)
Then Decision=1 is easily provable, regardless of whether any specific BestX is known.
Does not seem to lead to contradictions.
Yes, I see now, thanks.
If I attempted to fix this, I would try to change (Ax3) to something like:
forall a1,a2,w1,w2 ((A=a1 ⇒ W=w1) AND (A=a2 ⇒ W=w2) AND (w1>w2)) ⇒ NOT (FinalA=a2)
where FinalA is the actual decision. But then, why should the world’s axiom (Ax2) be defined in terms of A and not FinalA? This seems conceptually wrong...
Ok, I’ll go read up on the posts :)