I appreciate that it’s a model, but it seems—perhaps wrongly, since as already mentioned I am an ignorant doofus—as if at least some of what you’re doing with the model depends essentially on the strictly-logic-based nature of the agent. (E.g., the Troll Bridge problem as stated here seems that way, down to applying Löb’s theorem to the agent as formal system.)
Formal logic is very brittle; ex falso quodlibet, and all that; it (ignorantly and doofusily) looks to me as if you might be looking at a certain class of models and then finding problems (e.g., Troll Bridge) that are only problems because of specific features of the models that couldn’t realistically apply to the real world.
(In terms of the “rocket alignment problem” metaphor: suppose you start thinking about orbital mechanics, come up with exact-conic-section orbits as an interesting class of things to study, and prove some theorem that says that some class of things isn’t achievable for exact-conic-section orbits for a reason that comes down to something like dividing by the sum of squares of all the higher-order terms that are exactly zero for a perfect conic section orbit. That would be an interesting theorem, and it’s not hard to imagine how some less-rigid generalization of it might apply to real trajectories (“if the sum of squares of those coefficients is small then the trajectory is unstable and hard to get right” or something) -- but as it stands it doesn’t really tell you anything about real problems faced by real rockets whose trajectories are not perfect conic sections. And logic is generally much more brittle than orbital mechanics, chaos theory notwithstanding; there isn’t generally anything that corresponds to the sum of squares of coefficients being small; a proof that contains only a few small errors is not a proof at all.)
But, hmm, you reckon one could make a viable proof-based agent that has a consistent set of axioms describing a potentially-inconsistent set of probabilities. That’s an intriguing idea but I’m having trouble seeing how it would work. E.g., suppose I’m searching for proofs that my expected utility if I do X is at least Y units; that expectation obviously involves a whole lot of probabilities, and my actual probability assignments are inconsistent in various ways. How do I prove anything about my expected utilities if the probabilities involved might be inconsistent?
This is all a bit off topic on this particular post; it’s not especially about your account of decision-theoretic counterfactuals as such, but about the whole project of understanding decision theory in terms of agents whose decision processes involve trying to prove things about their own behaviour.
I appreciate that it’s a model, but it seems—perhaps wrongly, since as already mentioned I am an ignorant doofus—as if at least some of what you’re doing with the model depends essentially on the strictly-logic-based nature of the agent. (E.g., the Troll Bridge problem as stated here seems that way, down to applying Löb’s theorem to the agent as formal system.)
Formal logic is very brittle; ex falso quodlibet, and all that; it (ignorantly and doofusily) looks to me as if you might be looking at a certain class of models and then finding problems (e.g., Troll Bridge) that are only problems because of specific features of the models that couldn’t realistically apply to the real world.
(In terms of the “rocket alignment problem” metaphor: suppose you start thinking about orbital mechanics, come up with exact-conic-section orbits as an interesting class of things to study, and prove some theorem that says that some class of things isn’t achievable for exact-conic-section orbits for a reason that comes down to something like dividing by the sum of squares of all the higher-order terms that are exactly zero for a perfect conic section orbit. That would be an interesting theorem, and it’s not hard to imagine how some less-rigid generalization of it might apply to real trajectories (“if the sum of squares of those coefficients is small then the trajectory is unstable and hard to get right” or something) -- but as it stands it doesn’t really tell you anything about real problems faced by real rockets whose trajectories are not perfect conic sections. And logic is generally much more brittle than orbital mechanics, chaos theory notwithstanding; there isn’t generally anything that corresponds to the sum of squares of coefficients being small; a proof that contains only a few small errors is not a proof at all.)
But, hmm, you reckon one could make a viable proof-based agent that has a consistent set of axioms describing a potentially-inconsistent set of probabilities. That’s an intriguing idea but I’m having trouble seeing how it would work. E.g., suppose I’m searching for proofs that my expected utility if I do X is at least Y units; that expectation obviously involves a whole lot of probabilities, and my actual probability assignments are inconsistent in various ways. How do I prove anything about my expected utilities if the probabilities involved might be inconsistent?
This is all a bit off topic on this particular post; it’s not especially about your account of decision-theoretic counterfactuals as such, but about the whole project of understanding decision theory in terms of agents whose decision processes involve trying to prove things about their own behaviour.