The agent normally won’t even know “explicit values” of actual action and actual outcome. Knowing actual value would break the illusion of consistent consequences: suppose the agent is consistent, knows that A=2, and isn’t out of time yet, then it can prove [A=1 ⇒ O=100000], even if in fact O=1000, use that moral argument to beat any other with worse promised outcome, and decide A=1, contradiction.
This would only happen if the agent had a rule of inference that allowed it to infer from
A=1 ⇒ O=100000
and
all other promised outcomes are worse than 100000
that
A = 1.
But why would the first-order theory use such a rule of inference? You seem to have just given an argument for why we shouldn’t put this rule of inference into the theory.
ETA: I guess that my point leads right to your conclusion, and explains it. The agent is built so that, upon deducing the first two bullet-points, the agent proceeds to do the action assigned to the constant 1 by the interpretation. But the point is that the agent doesn’t bother to infer the third bullet-point; the agent just acts. As a result, it never deduces any formulas of the form [A=X], which is what you were saying.
Still commenting while reading:
This would only happen if the agent had a rule of inference that allowed it to infer from
A=1 ⇒ O=100000
and
all other promised outcomes are worse than 100000
that
A = 1.
But why would the first-order theory use such a rule of inference? You seem to have just given an argument for why we shouldn’t put this rule of inference into the theory.
ETA: I guess that my point leads right to your conclusion, and explains it. The agent is built so that, upon deducing the first two bullet-points, the agent proceeds to do the action assigned to the constant 1 by the interpretation. But the point is that the agent doesn’t bother to infer the third bullet-point; the agent just acts. As a result, it never deduces any formulas of the form [A=X], which is what you were saying.