Because we have a “basic counterfactual” proposition for what would happen if we 1-box and what would happen if we 2-box, and both of those propositions stick around, LCH’s bets about what happens in either case both matter. This is unlike conditional bets, where if we 1-box, then bets conditional on 2-boxing disappear, refunded, as if they were never made in the first place.
I don’t understand this part. Your explanation of PCDT at least didn’t prepare me for it, it doesn’t mention betting. And why is the payoff for the counterfactual-2-boxing determined by the beliefs of the agent after 1-boxing?
And what I think is mostly independent of that confusion: I don’t think things are as settled.
I’m more worried about the embedding problems with the trader in dutch book arguments, so the one against CDT isn’t as decisive for me.
In the Troll Bridge hypothetical, we prove that [cross]->[U=-10]. This will make the conditional expectations poor. But this doesn’t have to change the counterfactuals.
But how is the counterfactual supposed to actually think? I don’t think just having the agent unrevisably believe that crossing is counterfactually +10 is a reasonable answer, even if it doesn’t have any instrumental problems in this case. I think it ought to be possible to get something like “whether to cross in troll bridge depends only on what you otherwise think about PAs consistency” with some logical method. But even short of that, there needs to be some method to adjust your counterfactuals if they fail to really match you conditionals. And if we had an actual concrete model of counterfactual reasoning instead of a list of desiderata, it might be possible to make a troll based on the consistency of whatever is inside this model, as opposed to PA.
I also think there is a good chance the answer to the cartesian boundary problem won’t be “heres how to calculate where your boundary is”, but something else of which boundaries are an approximation, and then something similar would go for counterfactuals, and then there won’t be a counterfactual theory which respects embedding.
These later two considerations suggest the leftover work isn’t just formalisation.
I don’t understand this part. Your explanation of PCDT at least didn’t prepare me for it, it doesn’t mention betting. And why is the payoff for the counterfactual-2-boxing determined by the beliefs of the agent after 1-boxing?
Not sure how to best answer. I’m thinking of all this in an LIDT setting, so all learning occurs through traders making bets. The payoff for 2-boxing is dependent on beliefs after 1-boxing because all share prices update every market day and the “payout” for a share is essentially what you can sell it for. Similarly, if a trader buys a share of an undecidable sentence (let’s say, the consistency of PA) then the only “payoff” is whatever you can sell it for later, based on future market prices, because the sentence will never get fully decided one way or the other.
But how is the counterfactual supposed to actually think? I don’t think just having the agent unrevisably believe that crossing is counterfactually +10 is a reasonable answer, even if it doesn’t have any instrumental problems in this case.
My claim is: eventually, if you observe enough cases of “crossing” in similar circumstances, your expectation for “cross” should be consistent with the empirical history (rather than, say, −10 even though you’ve never experienced −10 for crossing). To give a different example, I’m claiming it is irrational to persist in thinking 1-boxing gets you less money in expectation, if your empirical history continues to show that it is better on average.
And I claim that if there is a persistent disagreement between counterfactuals and evidential conditionals, then the agent will in fact experimentally try crossing infinitely often, due to the value-of-information of testing the disagreement (that is, this will be the limiting behavior of reduced temporal discounting, under the assumption that the agent isn’t worried about traps).
So the two will indeed converge (under those assumptions).
And if we had an actual concrete model of counterfactual reasoning instead of a list of desiderata, it might be possible to make a troll based on the consistency of whatever is inside this model, as opposed to PA.
The hope is that we can block the troll argument completely if proving B->A does not imply cf(A|B)=1, because no matter what predicate the troll uses, the inference from P to cf fails. So what we concretely need to do is give a version of counterfactual reasoning which lets cf(A|B) not equal 1 in some cases where B->A is proved.
Granted, there could be some other problematic argument. However, if my learning-theoretic ideas go through, this provides another safeguard: Troll Bridge is a case where the agent never learns the empirical distribution, due to refusing to observe a specific case. If we know this never happens (given the learnability conditions), then this blocks off a whole range of Troll-Bridge-like arguments.
I’m more worried about the embedding problems with the trader in dutch book arguments, so the one against CDT isn’t as decisive for me.
[...]
I also think there is a good chance the answer to the cartesian boundary problem won’t be “heres how to calculate where your boundary is”, but something else of which boundaries are an approximation, and then something similar would go for counterfactuals, and then there won’t be a counterfactual theory which respects embedding.
This is a sensible position. I think this is similar to Scott G’s take on my direction.
My argument would not be that the dutch book should be super compelling, but rather, that it appears we can do everything without questioning so many assumptions.
For example, Scott would argue that probability is for things spacelike separated from you, so we need a different concept for thinking about consequences of actions. My argument is not anything against Scott’s concrete reasons to cast doubt on the broad applicability of probabilistic thinking; rather, my argument is “look at all the things we can do with probabilistic reasoning” (at least, suitablygeneralized).
In particular, good learning-theoretic results can address concerns about decision-theoretic paradoxes; a convincing optimality result could and should systematically rule out a wide range of decision-theoretic paradoxes. So, if true, it could become difficult to motivate any additional worries about cartesian frames etc.
The payoff for 2-boxing is dependent on beliefs after 1-boxing because all share prices update every market day and the “payout” for a share is essentially what you can sell it for.
If a sentence is undecidable, then you could have two traders who disagree on its value indefinitely: one would have a highest price to buy, thats below the others lowest price to sell. But then anything between those two prices could be the “market price”, in the classical supply and demand sense. If you say that the “payout” of a share is what you can sell it for… well, the “physical causation” trader is also buying shares on the counterfactual option that won’t happen. And if he had to sell those, he couldn’t sell them at a price close to where he bought them—he could only sell them at how much the “logical causation” trader values them, and so both would be losing “payout” on their trades with the unrealized option. Thats one interpretation of “sell”. If theres a “market maker” in addition to both traders, it depends on what prices he makes—and as outlined above, there is a wide range of prices that would be consistent for him to offer as a market maker, including ways which are very close to the logical traders valuations—in which case, the logical trader is gaining on the physical one.
Trying to communicate a vague intuition here: There is a set of methods which rely on there being a time when “everything is done”, to then look back from there and do credit assignment for everything that happened before. They characteristically use backwards induction to prove things. I think markets fall into this: the argument for why ideal markets don’t have bubbles is that eventually, the real value will be revealed, and so the bubble has to pop, and then someone holds the bag, and you don’t want to be that someone, and people predicting this and trying to avoid it will make the bubble pop earlier, in the idealised case instantly. I also think these methods aren’t going to work well with embedding. They essentially use “after the world” as a subsitute for “outside the world”.
My claim is: eventually, if you observe enough cases of “crossing” in similar circumstances, your expectation for “cross” should be consistent with the empirical history
My question was more “how should this roughly work” rather than “what conditions should it fulfill”, because I think thinking about this illuminates my next point.
The hope is that we can block the troll argument completely if proving B->A does not imply cf(A|B)=1
This doesn’t help against what I’m imagining, I’m not touching indicative B->A. So, standard Troll Bridge:
Reasoning within PA (ie, the logic of the agent):
Suppose the agent crosses.
Further suppose that the agent proves that crossing implies U=-10.
Examining the source code of the agent, because we’re assuming the agent crosses, either PA proved that crossing implies U=+10, or it proved that crossing implies U=0.
So, either way, PA is inconsistent—by way of 0=-10 or +10=-10.
So the troll actually blows up the bridge, and really, U=-10.
Therefore (popping out of the second assumption), if the agent proves that crossing implies U=-10, then in fact crossing implies U=-10.
By Löb’s theorem, crossing really implies U=-10.
So (since we’re still under the assumption that the agent crosses), U=-10.
So (popping out of the assumption that the agent crosses), the agent crossing implies U=-10.
Since we proved all of this in PA, the agent proves it, and proves no better utility in addition (unless PA is truly inconsistent). On the other hand, it will prove that not crossing gives it a safe U=0. So it will in fact not cross.
But now, say the agents counterfactual reasoning comes not from PA, but from system X. Then the argument fails because “suppose the agent proves crossing->U=-10 in PA” doesn’t go any further because examining the sourcecode of the agent doesn’t say anything about PA anymore, and “suppose the agent proves crossing->U=-10 in X” doesn’t show that PA is inconsistent, so the bridge isn’t blown up. But lets have a troll that blows up the bridge if X is inconsistent. Then we can make an argument like this:
Reasoning within X (ie, the logic of counterfactuals):
Suppose the agent crosses.
Further suppose that the agent proves in X that crossing implies U=-10.
Examining the source code of the agent, because we’re assuming the agent crosses, either X proved that crossing implies U=+10, or it proved that crossing implies U=0.
So, either way, X is inconsistent—by way of 0=-10 or +10=-10.
So the troll actually blows up the bridge, and really, U=-10.
Therefore (popping out of the second assumption), if the agent proves that crossing implies U=-10, then in fact crossing implies U=-10.
By Löb’s theorem, crossing really implies U=-10.
So (since we’re still under the assumption that the agent crosses), U=-10.
So (popping out of the assumption that the agent crosses), the agent crossing implies U=-10.
Since we proved all of this in X, the agent proves it, and proves no better utility in addition (unless X is truly inconsistent). On the other hand, it will prove that not crossing gives it a safe U=0. So it will in fact not cross.
Now, this argument relies on X and counterfactual reasoning having a lot of the properties of PA and normal reasoning. But even a system that doesn’t run on proofs per se could still end up implementing something a lot like logic, and then it would have a property thats a lot like inconsistency, and then the troll could blow up the bridge conditionally on that. Basically, it still seems reasonable to me that counterfactual worlds should be closed under inference, up to our ability to infer. And I don’t see which of the rules for manipulating logical implications wouldn’t be valid for counterfactual implications in their own closed system, if you formally separate them. If you want your X to avoid this argument, then it needs to not-do something PA does. “Formal separation” between the systems isn’t enough, because the results of counterfactual reasoning still really do effect your actions, and if the counterfactual reasoning system can understand this, Troll Bridge returns. And if there was such a something, we could just use a logic that doesn’t do this in the first place, no need for the two-layer approach.
a convincing optimality result could
I’m also sceptical of optimality results. When you’re doing subjective probability, any method you come up with will be proven optimal relative to its own prior—the difference between different subjective methods is only in their ontology, and the optimality results don’t protect you against mistakes there. Also, when you’re doing subjectivism, and it turns out the methods required to reach some optimality condition aren’t subjectively optimal, you say “Don’t be a stupid frequentist and do the subjectively optimal thing instead”. So, your bottom line is written. If the optimality condition does come out in your favour, you can’t be more sure because of it—that holds even under the radical version of expected evidence conservation. I also suspect that as subjectivism gets more “radical”, there will be fewer optimality results besides the one relative to prior.
I’m also sceptical of optimality results. When you’re doing subjective probability, any method you come up with will be proven optimal relative to its own prior—the difference between different subjective methods is only in their ontology, and the optimality results don’t protect you against mistakes there. Also, when you’re doing subjectivism, and it turns out the methods required to reach some optimality condition aren’t subjectively optimal, you say “Don’t be a stupid frequentist and do the subjectively optimal thing instead”. So, your bottom line is written. If the optimality condition does come out in your favour, you can’t be more sure because of it—that holds even under the radical version of expected evidence conservation. I also suspect that as subjectivism gets more “radical”, there will be fewer optimality results besides the one relative to prior.
This sounds like doing optimality results poorly. Unfortunately, there is a lot of that (EG how the different optimality notions for CDT and EDT don’t help decide between them).
In particular, the “don’t be a stupid frequentist” move has blinded Bayesians (although frequentists have also been blinded in a different way).
Solomonoff induction has a relatively good optimality notion (that it doesn’t do too much worse than any computable prediction).
AIXI has a relatively poor one (you only guarantee that you take the subjectively best action according to Solomonoff induction; but this is hardly any guarantee at all in terms of reward gained, which is supposed to be the objective). (There are variants of AIXI which have other optimality guarantees, but none very compelling afaik.)
An example of a less trivial optimality notion is the infrabayes idea, where if the world fits within the constraints of one of your partial hypotheses, then you will eventually learn to do at least as well (reward-wise) as that hypothesis implies you can do.
If a sentence is undecidable, then you could have two traders who disagree on its value indefinitely: one would have a highest price to buy, thats below the others lowest price to sell. But then anything between those two prices could be the “market price”, in the classical supply and demand sense. If you say that the “payout” of a share is what you can sell it for… well, the “physical causation” trader is also buying shares on the counterfactual option that won’t happen. And if he had to sell those, he couldn’t sell them at a price close to where he bought them—he could only sell them at how much the “logical causation” trader values them, and so both would be losing “payout” on their trades with the unrealized option. Thats one interpretation of “sell”. If theres a “market maker” in addition to both traders, it depends on what prices he makes—and as outlined above, there is a wide range of prices that would be consistent for him to offer as a market maker, including ways which are very close to the logical traders valuations—in which case, the logical trader is gaining on the physical one.
Hmm. Well, I didn’t really try to prove that ‘physical causation’ would persist as a hypothesis. I just tried to show that it wouldn’t, and failed. If you’re right, that’d be great!
But here is what I am thinking:
Firstly, yes, there is a market maker. You can think of the market maker as setting the price exactly where buys and sells balance; both sides stand to win the same amount if they’re correct, because that amount is just the combined amount they’ve spent.
Causality is a little funky because of fixed point stuff, but rather than imagining the traders hold shares for a long time, we can instead imagine that today’s shares “pay out” overnight (at the next day’s prices), and then traders have to re-invest if they still want to hold a position. (But this is fine, because they got paid the next day’s prices, so they can afford to buy the same number of shares as they had.)
But if the two traders don’t reinvest, then tomorrow’s prices (and therefore their profits) are up to the whims of the rest of the market.
So I don’t see how we can be sure that PCH loses out overall. LCH has to exploit PCH—but if LCH tries it, then we’re seemingly in a situation where LCH has to sell for PCH’s prices, in which case it suffers the loss I described in the OP.
Thanks for raising the question, though! It would be very interesting if PCH actually could not maintain its position.
My question was more “how should this roughly work” rather than “what conditions should it fulfill”, because I think thinking about this illuminates my next point.
I have been thinking a bit more about this.
I think it should roughly work like this: you have a ‘conditional contract’, which is like normal conditional bets, except normally a conditional bet (a|b) is made up of a conjunction bet (a&b) and a hedge on the negation of the condition (not-b); the ‘conditional contract’ instead gives the trader an inseparable pair of contracts (the a&x bet bound together with the not-b bet).
Normally, the price of anything that’s proved goes to one quickly (and zero for anything refuted), because traders are getting $1 per share (and $0 per share for what’s been refuted). (We can also have the market maker just automatically set these prices to 1 and 0, which is probably more sensible.) That’s why the conditional probability for b|a goes to 1 when a->b is proved: a->b is not(a & not b), so the price of a¬(b) goes to 0, so the price of not(b)|a goes to zero.
But the special bundled contract doesn’t go to zero like this, because the conditional contract only really pays out when the condition is satisfied or refuted. If a trader tries to ‘correct’ the conditional-contract market by buying b|a when a->b, the trader will only exploit the market in the case that b actually occurs (which is not happening in Troll Bridge).
Granted, this sounds like a huge hack.
Reasoning within X (ie, the logic of counterfactuals):
As you note, this does not work if X is extremely weak (which is the plan outlined in the OP). This is in keeping with the spirit of the “subjective theory of counterfactuals”: there are very few constraints on logical counterfactuals, since after all, they may violate logic!
But even a system that doesn’t run on proofs per se could still end up implementing something a lot like logic, and then it would have a property thats a lot like inconsistency, and then the troll could blow up the bridge conditionally on that.
I agree that this is a serious concern. For example, we can consider logical induction without any logic (eg, the universal induction formalism). It doesn’t apparently have troll bridge problems, because it lacks logic. But if it comes to believe any PA-like logic strongly, then it will be susceptible to Troll Bridge.
My proposal is essentially similar to that, except I am trying to respect logic in most of the system, simply reducing its impact on action selection. But within my proposed system, I think the wrong ‘prior’ (ie distribution of wealth for traders) can make it susceptible again.
I’m not blocking Troll Bridge problems, I’m making the definition of rational agent broad enough that crossing is permissible. But if I think the Troll Bridge proof is actively irrational, I should be able to actually rule it out. IE, specify an X which is inconsistent with PA.
So I don’t see how we can be sure that PCH loses out overall. LCH has to exploit PCH—but if LCH tries it, then we’re seemingly in a situation where LCH has to sell for PCH’s prices, in which case it suffers the loss I described in the OP.
So I’ve reread the logical induction paper for this, and I’m not sure I understand exploitation. Under 3.5, it says:
On each day, the reasoner receives 50¢ from T, but after day t, the reasoner must pay $1 every day thereafter.
So this sounds like before day t, T buys a share every day, and those shares never pay out—otherwise T would receive $t on day t in addition to everything mentioned here. Why?
In the version that I have in my head, theres a market with PCH and LCH in it that assigns constant price to the unactualised bet, so neither of them gain or lose anything with their trades on it, and LCH exploits PCH on the actualised ones.
But the special bundled contract doesn’t go to zero like this, because the conditional contract only really pays out when the condition is satisfied or refuted.
So if I’m understanding this correctly: The conditional contract on (a|b) pays if a&b is proved, if a&~b is proved, and if ~a&~b is proved.
Now I have another question: how does logical induction arbitrage against contradiction? The bet on a pays $1 if a is proved. The bet on ~a pays $1 if not-a is proved. But the bet on ~a isn’t “settled” when a is proved—why can’t the market just go on believing its .7? (Likely this is related to my confusion with the paper).
My proposal is essentially similar to that, except I am trying to respect logic in most of the system, simply reducing its impact on action selection. But within my proposed system, I think the wrong ‘prior’ (ie distribution of wealth for traders) can make it susceptible again.
I’m not blocking Troll Bridge problems, I’m making the definition of rational agent broad enough that crossing is permissible. But if I think the Troll Bridge proof is actively irrational, I should be able to actually rule it out. IE, specify an X which is inconsistent with PA.
What makes you think that theres a “right” prior? You want a “good” learning mechanism for counterfactuals. To be good, such a mechanism would have to learn to make the inferences we consider good, at least with the “right” prior. But we can’t pinpoint any wrong inference in Troll Bridge. It doesn’t seem like whats stopping us from pinpointing the mistake in Troll Bridge is a lack of empirical data. So, a good mechanism would have to learn to be susceptible to Troll Bridge, especially with the “right” prior. I just don’t see what would be a good reason for thinking theres a “right” prior that avoids Troll Bridge (other than “there just has to be some way of avoiding it”), that wouldn’t also let us tell directly how to think about Troll Bridge, no learning needed.
Now I have another question: how does logical induction arbitrage against contradiction? The bet on a pays $1 if a is proved. The bet on ~a pays $1 if not-a is proved. But the bet on ~a isn’t “settled” when a is proved—why can’t the market just go on believing its .7? (Likely this is related to my confusion with the paper).
Again, my view may have drifted a bit from the LI paper, but the way I think about this is that the market maker looks at the minimum amount of money a trader has “in any world” (in the sense described in my other comment). This excludes worlds which the deductive process has ruled out, so for example if A∨B has been proved, all worlds will have either A or B. So if you had a bet which would pay $10 on A, and a bet which would pay $2 on B, you’re treated as if you have $2 to spend. It’s like a bookie allowing a gambler to make a bet without putting down the money because the bookie knows the gambler is “good for it” (the gambler will definitely be able to pay later, based on the bets the gambler already has, combined with the logical information we now know).
Of course, because logical bets don’t necessarily ever pay out, the market maker realistically shouldn’t expect that traders are necessarily “good for it”. But doing so allows traders to arbitrage logically contradictory beliefs, so, it’s nice for our purposes. (You could say this is a difference between an ideal prediction market and a mere betting market; a prediction market should allow arbitrage of inconsistency in this way.)
This excludes worlds which the deductive process has ruled out, so for example if A∨B has been proved, all worlds will have either A or B. So if you had a bet which would pay $10 on A, and a bet which would pay $2 on B, you’re treated as if you have $2 to spend.
I agree you can arbitrage inconsistencies this way, but it seems very questionable. For one, it means the market maker needs to interpret the output of the deductive process semantically. And it makes him go bankrupt if that logic is inconsistent. And there could be a case where a proposition is undecidable, and a meta-proposition about it is undecidable, and a meta-meta-propopsition about it is undecidable, all the way up, and then something bad happens, though I’m not sure what concretely.
On each day, the reasoner receives 50¢ from T, but after day t, the reasoner must pay $1 every day thereafter.
Hm. It’s a bit complicated and there are several possible ways to set things up. Reading that paragraph, I’m not sure about this sentence either.
In the version I was trying to explain, where traders are “forced to sell” every morning before the day of trading begins, the reasoner would receive 50¢ from the trader every day, but would return that money next morning. Also, in the version I was describing, the reasoner is forced to set the price to $1 rather than 50¢ as soon as the deductive process proves 1+1=2. So, that morning, the reasoner has to return $1 rather than 50¢. That’s where the reasoner loses money to the trader. After that, the price is $1 forever, so the trader would just be paying $1 every day and getting that $1 back the next morning.
I would then define exploitation as “the trader’s total wealth (across different times) has no upper bound”. (It doesn’t necessarily escape to infinity—it might oscillate up and down, but with higher and higher peaks.)
Now, the LI paper uses a different definition of exploitation, which involves how much money a trader has within a world (which basically means we imagine the deductive process decides all the sentences, and we ask how much money the trader would have; and, we consider all the different ways the deductive process could do this). This is not equivalent to my definition of exploitation in general; according to the LI paper, a trader ‘exploits’ the market even if its wealth is unbounded only in some very specific world (eg, where a specific sequence of in-fact-undecidable sentences gets proved).
However, I do have an unpublished proof that the two definitions of exploitation are equivalent for the logical induction algorithm and for a larger class of “reasonable” logical inductors. This is a non-trivial result, but, justifies using my definition of exploitation (which I personally find a lot more intuitive). My basic intuition for the result is: if you don’t know the future, the only way to ensure you don’t lose unbounded money in reality is to ensure you don’t lose unbounded money in any world. (“If you don’t know the future” is a significant constraint on logical inductors.)
Also, when those definitions do differ, I’m personally not convinced that the definition in the logical induction paper is better… it is stronger, in the sense that it gives us a more stringent logical induction criterion, but the “irrational” behaviors which it helps rule out don’t seem particularly irrational to me. Simply put, I am only convinced that I should care about actually losing unbounded money, as opposed to losing unbounded money in some hypothetical world.
In the version that I have in my head, theres a market with PCH and LCH in it that assigns constant price to the unactualised bet, so neither of them gain or lose anything with their trades on it, and LCH exploits PCH on the actualised ones.
Why is the price of the un-actualized bet constant? My argument in the OP was to suppose that PCH is the dominant hypothesis, so, mostly controls market prices. PCH thinks it gains important information when it sees which action we actually took, so it updates the expectation for the un-actualized action. So the price moves. Similarly, if PCH and LCH had similar probability, we would expect the price to move.
Why is the price of the un-actualized bet constant? My argument in the OP was to suppose that PCH is the dominant hypothesis, so, mostly controls market prices.
Thinking about this in detail, it seems like what influence traders have on the market price depends on a lot more of their inner workings than just their beliefs. I was thinking in a way where each trader only had one price for the bet, below which they bought and above which they sold, no matter how many units they traded (this might contradict “continuous trading strategies” because of finite wealth), in which case there would be a range of prices that could be the “market” price, and it could stay constant even with one end of that range shifting. But there could also be an outcome like yours, if the agents demand better and better prices to trade one more unit of the bet.
I think its still possible to have a scenario like this. Lets say each trader would buy or sell a certain amount when the price is below/above what they think it to be, but the transition being very steep instead of instant. Then you could still have long price intervalls where the amounts bought and sold remain constant, and then every point in there could be the market price.
I’m not sure if this is significant. I see no reason to set the traders up this way other than the result in the particular scenario that kicked this off, and adding traders who don’t follow this pattern breaks it. Still, its a bit worrying that trading strategies seem to matter in addition to beliefs, because what do they represent? A traders initial wealth is supposed to be our confidence in its heuristics—but if a trader is mathematical heuristics and trading strategy packaged, then what does confidence in the trading strategy mean epistemically? Two things to think about:
Is it possible to consistently define the set of traders with the same beliefs as trader X?
It seems that logical induction is using a trick, where it avoids inconsistent discrete traders, but includes an infinite sequence of continuous traders with ever steeper transitions to get some of the effects. This could lead to unexpected differences between behaviour “at all finite steps” vs “at the limit”. What can we say about logical induction if trading strategies need to be lipschitz-continuous with a shared upper limit on the lipschitz constant?
What makes you think that theres a “right” prior? You want a “good” learning mechanism for counterfactuals. To be good, such a mechanism would have to learn to make the inferences we consider good, at least with the “right” prior. But we can’t pinpoint any wrong inference in Troll Bridge. It doesn’t seem like whats stopping us from pinpointing the mistake in Troll Bridge is a lack of empirical data. So, a good mechanism would have to learn to be susceptible to Troll Bridge, especially with the “right” prior. I just don’t see what would be a good reason for thinking theres a “right” prior that avoids Troll Bridge (other than “there just has to be some way of avoiding it”), that wouldn’t also let us tell directly how to think about Troll Bridge, no learning needed.
Now I feel like you’re trying to have it both ways; earlier you raised the concern that a proposal which doesn’t overtly respect logic could nonetheless learn a sort of logic internally, which could then be susceptible to Troll Bridge. I took this as a call for an explicit method of avoiding Troll Bridge, rather than merely making it possible with the right prior.
But now, you seem to be complaining that a method that explicitly avoids Troll Bridge would be too restrictive?
To be good, such a mechanism would have to learn to make the inferences we consider good, at least with the “right” prior. But we can’t pinpoint any wrong inference in Troll Bridge.
I think there is a mistake somewhere in the chain of inference from cross→−10 to low expected value for crossing. Material implication is being conflated with counterfactual implication.
A strong candidate from my perspective is the inference from ¬(A∧B) to C(A|B)=0 where C represents probabilistic/counterfactual conditional (whatever we are using to generate expectations for actions).
So, a good mechanism would have to learn to be susceptible to Troll Bridge, especially with the “right” prior.
You seem to be arguing that being susceptible to Troll Bridge should be judged as a necessary/positive trait of a decision theory. But there are decision theories which don’t have this property, such as regular CDT, or TDT (depending on the logical-causality graph). Are you saying that those are all necessarily wrong, due to this?
I just don’t see what would be a good reason for thinking theres a “right” prior that avoids Troll Bridge (other than “there just has to be some way of avoiding it”), that wouldn’t also let us tell directly how to think about Troll Bridge, no learning needed.
I’m not sure quite what you meant by this. For example, I could have a lot of prior mass on “crossing gives me +10, not crossing gives me 0”. Then my +10 hypothesis would only be confirmed by experience. I could reason using counterfactuals, so that the troll bridge argument doesn’t come in and ruin things. So, there is definitely a way. And being born with this prior doesn’t seem like some kind of misunderstanding/delusion about the world.
So it also seems natural to try and design agents which reliably learn this, if they have repeated experience with Troll Bridge.
But now, you seem to be complaining that a method that explicitly avoids Troll Bridge would be too restrictive?
No, I think finding such a no-learning-needed method would be great. It just means your learning-based approach wouldn’t be needed.
You seem to be arguing that being susceptible to Troll Bridge should be judged as a necessary/positive trait of a decision theory.
No. I’m saying if our “good” reasoning can’t tell us where in Troll Bridge the mistake is, then something that learns to make “good” inferences would have to fall for it.
But there are decision theories which don’t have this property, such as regular CDT, or TDT (depending on the logical-causality graph). Are you saying that those are all necessarily wrong, due to this?
A CDT is only worth as much as its method of generating counterfactuals. We generally consider regular CDT (which I interpret as “getting its counterfactuals from something-like-epsilon-exploration”) to miss important logical connections. “TDT” doesn’t have such a method. There is a (logical) causality graph that makes you do the intuitively right thing on Troll Bridge, but how to find it formally?
A strong candidate from my perspective is the inference from ¬(A∧B) to C(A|B)=0
Isn’t this just a rephrasing of your idea that the agent should act based on C(A|B) instead of B->A? I don’t see any occurance of ~(A&B) in the troll bridge argument. Now, it is equivalent to B->~A, so perhaps you think one of the propositions that occur as implications in troll bridge should be parsed this way? My modified troll bridge parses them all as counterfactual implication.
For example, I could have a lot of prior mass on “crossing gives me +10, not crossing gives me 0”. Then my +10 hypothesis would only be confirmed by experience. I could reason using counterfactuals
I’ve said why I don’t think “using counterfactuals”, absent further specification, is a solution. For the simple “crossing is +10″ belief… you’re right its succeeds, and insofar as you just wanted to show that its rationally possible to cross, I suppose it does.
This… really didn’t fit into my intuitions about learning. Consider that there is also the alternative agent who believes that crossing is −10, and sticks to that. And the reason he sticks to that isn’t that hes to afraid and VOI isn’t worth it: while its true that he never empirically confirms it, he is right, and the bridge would blow up if he were to cross it. That method works because it ignores the information in the problem description, and has us insert the relevant takeaway without any of the confusing stuff directly into its prior. Are you really willing to say: Yup, thats basically the solution to counterfactuals, just a bit of formalism left to work out?
I don’t understand this part. Your explanation of PCDT at least didn’t prepare me for it, it doesn’t mention betting. And why is the payoff for the counterfactual-2-boxing determined by the beliefs of the agent after 1-boxing?
And what I think is mostly independent of that confusion: I don’t think things are as settled.
I’m more worried about the embedding problems with the trader in dutch book arguments, so the one against CDT isn’t as decisive for me.
But how is the counterfactual supposed to actually think? I don’t think just having the agent unrevisably believe that crossing is counterfactually +10 is a reasonable answer, even if it doesn’t have any instrumental problems in this case. I think it ought to be possible to get something like “whether to cross in troll bridge depends only on what you otherwise think about PAs consistency” with some logical method. But even short of that, there needs to be some method to adjust your counterfactuals if they fail to really match you conditionals. And if we had an actual concrete model of counterfactual reasoning instead of a list of desiderata, it might be possible to make a troll based on the consistency of whatever is inside this model, as opposed to PA.
I also think there is a good chance the answer to the cartesian boundary problem won’t be “heres how to calculate where your boundary is”, but something else of which boundaries are an approximation, and then something similar would go for counterfactuals, and then there won’t be a counterfactual theory which respects embedding.
These later two considerations suggest the leftover work isn’t just formalisation.
Not sure how to best answer. I’m thinking of all this in an LIDT setting, so all learning occurs through traders making bets. The payoff for 2-boxing is dependent on beliefs after 1-boxing because all share prices update every market day and the “payout” for a share is essentially what you can sell it for. Similarly, if a trader buys a share of an undecidable sentence (let’s say, the consistency of PA) then the only “payoff” is whatever you can sell it for later, based on future market prices, because the sentence will never get fully decided one way or the other.
My claim is: eventually, if you observe enough cases of “crossing” in similar circumstances, your expectation for “cross” should be consistent with the empirical history (rather than, say, −10 even though you’ve never experienced −10 for crossing). To give a different example, I’m claiming it is irrational to persist in thinking 1-boxing gets you less money in expectation, if your empirical history continues to show that it is better on average.
And I claim that if there is a persistent disagreement between counterfactuals and evidential conditionals, then the agent will in fact experimentally try crossing infinitely often, due to the value-of-information of testing the disagreement (that is, this will be the limiting behavior of reduced temporal discounting, under the assumption that the agent isn’t worried about traps).
So the two will indeed converge (under those assumptions).
The hope is that we can block the troll argument completely if proving B->A does not imply cf(A|B)=1, because no matter what predicate the troll uses, the inference from P to cf fails. So what we concretely need to do is give a version of counterfactual reasoning which lets cf(A|B) not equal 1 in some cases where B->A is proved.
Granted, there could be some other problematic argument. However, if my learning-theoretic ideas go through, this provides another safeguard: Troll Bridge is a case where the agent never learns the empirical distribution, due to refusing to observe a specific case. If we know this never happens (given the learnability conditions), then this blocks off a whole range of Troll-Bridge-like arguments.
This is a sensible position. I think this is similar to Scott G’s take on my direction.
My argument would not be that the dutch book should be super compelling, but rather, that it appears we can do everything without questioning so many assumptions.
For example, Scott would argue that probability is for things spacelike separated from you, so we need a different concept for thinking about consequences of actions. My argument is not anything against Scott’s concrete reasons to cast doubt on the broad applicability of probabilistic thinking; rather, my argument is “look at all the things we can do with probabilistic reasoning” (at least, suitably generalized).
In particular, good learning-theoretic results can address concerns about decision-theoretic paradoxes; a convincing optimality result could and should systematically rule out a wide range of decision-theoretic paradoxes. So, if true, it could become difficult to motivate any additional worries about cartesian frames etc.
If a sentence is undecidable, then you could have two traders who disagree on its value indefinitely: one would have a highest price to buy, thats below the others lowest price to sell. But then anything between those two prices could be the “market price”, in the classical supply and demand sense. If you say that the “payout” of a share is what you can sell it for… well, the “physical causation” trader is also buying shares on the counterfactual option that won’t happen. And if he had to sell those, he couldn’t sell them at a price close to where he bought them—he could only sell them at how much the “logical causation” trader values them, and so both would be losing “payout” on their trades with the unrealized option. Thats one interpretation of “sell”. If theres a “market maker” in addition to both traders, it depends on what prices he makes—and as outlined above, there is a wide range of prices that would be consistent for him to offer as a market maker, including ways which are very close to the logical traders valuations—in which case, the logical trader is gaining on the physical one.
Trying to communicate a vague intuition here: There is a set of methods which rely on there being a time when “everything is done”, to then look back from there and do credit assignment for everything that happened before. They characteristically use backwards induction to prove things. I think markets fall into this: the argument for why ideal markets don’t have bubbles is that eventually, the real value will be revealed, and so the bubble has to pop, and then someone holds the bag, and you don’t want to be that someone, and people predicting this and trying to avoid it will make the bubble pop earlier, in the idealised case instantly. I also think these methods aren’t going to work well with embedding. They essentially use “after the world” as a subsitute for “outside the world”.
My question was more “how should this roughly work” rather than “what conditions should it fulfill”, because I think thinking about this illuminates my next point.
This doesn’t help against what I’m imagining, I’m not touching indicative B->A. So, standard Troll Bridge:
Reasoning within PA (ie, the logic of the agent):
Suppose the agent crosses.
Further suppose that the agent proves that crossing implies U=-10.
Examining the source code of the agent, because we’re assuming the agent crosses, either PA proved that crossing implies U=+10, or it proved that crossing implies U=0.
So, either way, PA is inconsistent—by way of 0=-10 or +10=-10.
So the troll actually blows up the bridge, and really, U=-10.
Therefore (popping out of the second assumption), if the agent proves that crossing implies U=-10, then in fact crossing implies U=-10.
By Löb’s theorem, crossing really implies U=-10.
So (since we’re still under the assumption that the agent crosses), U=-10.
So (popping out of the assumption that the agent crosses), the agent crossing implies U=-10.
Since we proved all of this in PA, the agent proves it, and proves no better utility in addition (unless PA is truly inconsistent). On the other hand, it will prove that not crossing gives it a safe U=0. So it will in fact not cross.
But now, say the agents counterfactual reasoning comes not from PA, but from system X. Then the argument fails because “suppose the agent proves crossing->U=-10 in PA” doesn’t go any further because examining the sourcecode of the agent doesn’t say anything about PA anymore, and “suppose the agent proves crossing->U=-10 in X” doesn’t show that PA is inconsistent, so the bridge isn’t blown up. But lets have a troll that blows up the bridge if X is inconsistent. Then we can make an argument like this:
Reasoning within X (ie, the logic of counterfactuals):
Suppose the agent crosses.
Further suppose that the agent proves in X that crossing implies U=-10.
Examining the source code of the agent, because we’re assuming the agent crosses, either X proved that crossing implies U=+10, or it proved that crossing implies U=0.
So, either way, X is inconsistent—by way of 0=-10 or +10=-10.
So the troll actually blows up the bridge, and really, U=-10.
Therefore (popping out of the second assumption), if the agent proves that crossing implies U=-10, then in fact crossing implies U=-10.
By Löb’s theorem, crossing really implies U=-10.
So (since we’re still under the assumption that the agent crosses), U=-10.
So (popping out of the assumption that the agent crosses), the agent crossing implies U=-10.
Since we proved all of this in X, the agent proves it, and proves no better utility in addition (unless X is truly inconsistent). On the other hand, it will prove that not crossing gives it a safe U=0. So it will in fact not cross.
Now, this argument relies on X and counterfactual reasoning having a lot of the properties of PA and normal reasoning. But even a system that doesn’t run on proofs per se could still end up implementing something a lot like logic, and then it would have a property thats a lot like inconsistency, and then the troll could blow up the bridge conditionally on that. Basically, it still seems reasonable to me that counterfactual worlds should be closed under inference, up to our ability to infer. And I don’t see which of the rules for manipulating logical implications wouldn’t be valid for counterfactual implications in their own closed system, if you formally separate them. If you want your X to avoid this argument, then it needs to not-do something PA does. “Formal separation” between the systems isn’t enough, because the results of counterfactual reasoning still really do effect your actions, and if the counterfactual reasoning system can understand this, Troll Bridge returns. And if there was such a something, we could just use a logic that doesn’t do this in the first place, no need for the two-layer approach.
I’m also sceptical of optimality results. When you’re doing subjective probability, any method you come up with will be proven optimal relative to its own prior—the difference between different subjective methods is only in their ontology, and the optimality results don’t protect you against mistakes there. Also, when you’re doing subjectivism, and it turns out the methods required to reach some optimality condition aren’t subjectively optimal, you say “Don’t be a stupid frequentist and do the subjectively optimal thing instead”. So, your bottom line is written. If the optimality condition does come out in your favour, you can’t be more sure because of it—that holds even under the radical version of expected evidence conservation. I also suspect that as subjectivism gets more “radical”, there will be fewer optimality results besides the one relative to prior.
This sounds like doing optimality results poorly. Unfortunately, there is a lot of that (EG how the different optimality notions for CDT and EDT don’t help decide between them).
In particular, the “don’t be a stupid frequentist” move has blinded Bayesians (although frequentists have also been blinded in a different way).
Solomonoff induction has a relatively good optimality notion (that it doesn’t do too much worse than any computable prediction).
AIXI has a relatively poor one (you only guarantee that you take the subjectively best action according to Solomonoff induction; but this is hardly any guarantee at all in terms of reward gained, which is supposed to be the objective). (There are variants of AIXI which have other optimality guarantees, but none very compelling afaik.)
An example of a less trivial optimality notion is the infrabayes idea, where if the world fits within the constraints of one of your partial hypotheses, then you will eventually learn to do at least as well (reward-wise) as that hypothesis implies you can do.
Hmm. Well, I didn’t really try to prove that ‘physical causation’ would persist as a hypothesis. I just tried to show that it wouldn’t, and failed. If you’re right, that’d be great!
But here is what I am thinking:
Firstly, yes, there is a market maker. You can think of the market maker as setting the price exactly where buys and sells balance; both sides stand to win the same amount if they’re correct, because that amount is just the combined amount they’ve spent.
Causality is a little funky because of fixed point stuff, but rather than imagining the traders hold shares for a long time, we can instead imagine that today’s shares “pay out” overnight (at the next day’s prices), and then traders have to re-invest if they still want to hold a position. (But this is fine, because they got paid the next day’s prices, so they can afford to buy the same number of shares as they had.)
But if the two traders don’t reinvest, then tomorrow’s prices (and therefore their profits) are up to the whims of the rest of the market.
So I don’t see how we can be sure that PCH loses out overall. LCH has to exploit PCH—but if LCH tries it, then we’re seemingly in a situation where LCH has to sell for PCH’s prices, in which case it suffers the loss I described in the OP.
Thanks for raising the question, though! It would be very interesting if PCH actually could not maintain its position.
I have been thinking a bit more about this.
I think it should roughly work like this: you have a ‘conditional contract’, which is like normal conditional bets, except normally a conditional bet (a|b) is made up of a conjunction bet (a&b) and a hedge on the negation of the condition (not-b); the ‘conditional contract’ instead gives the trader an inseparable pair of contracts (the a&x bet bound together with the not-b bet).
Normally, the price of anything that’s proved goes to one quickly (and zero for anything refuted), because traders are getting $1 per share (and $0 per share for what’s been refuted). (We can also have the market maker just automatically set these prices to 1 and 0, which is probably more sensible.) That’s why the conditional probability for b|a goes to 1 when a->b is proved: a->b is not(a & not b), so the price of a¬(b) goes to 0, so the price of not(b)|a goes to zero.
But the special bundled contract doesn’t go to zero like this, because the conditional contract only really pays out when the condition is satisfied or refuted. If a trader tries to ‘correct’ the conditional-contract market by buying b|a when a->b, the trader will only exploit the market in the case that b actually occurs (which is not happening in Troll Bridge).
Granted, this sounds like a huge hack.
As you note, this does not work if X is extremely weak (which is the plan outlined in the OP). This is in keeping with the spirit of the “subjective theory of counterfactuals”: there are very few constraints on logical counterfactuals, since after all, they may violate logic!
I agree that this is a serious concern. For example, we can consider logical induction without any logic (eg, the universal induction formalism). It doesn’t apparently have troll bridge problems, because it lacks logic. But if it comes to believe any PA-like logic strongly, then it will be susceptible to Troll Bridge.
My proposal is essentially similar to that, except I am trying to respect logic in most of the system, simply reducing its impact on action selection. But within my proposed system, I think the wrong ‘prior’ (ie distribution of wealth for traders) can make it susceptible again.
I’m not blocking Troll Bridge problems, I’m making the definition of rational agent broad enough that crossing is permissible. But if I think the Troll Bridge proof is actively irrational, I should be able to actually rule it out. IE, specify an X which is inconsistent with PA.
I don’t have any proposal for that.
So I’ve reread the logical induction paper for this, and I’m not sure I understand exploitation. Under 3.5, it says:
So this sounds like before day t, T buys a share every day, and those shares never pay out—otherwise T would receive $t on day t in addition to everything mentioned here. Why?
In the version that I have in my head, theres a market with PCH and LCH in it that assigns constant price to the unactualised bet, so neither of them gain or lose anything with their trades on it, and LCH exploits PCH on the actualised ones.
So if I’m understanding this correctly: The conditional contract on (a|b) pays if a&b is proved, if a&~b is proved, and if ~a&~b is proved.
Now I have another question: how does logical induction arbitrage against contradiction? The bet on a pays $1 if a is proved. The bet on ~a pays $1 if not-a is proved. But the bet on ~a isn’t “settled” when a is proved—why can’t the market just go on believing its .7? (Likely this is related to my confusion with the paper).
What makes you think that theres a “right” prior? You want a “good” learning mechanism for counterfactuals. To be good, such a mechanism would have to learn to make the inferences we consider good, at least with the “right” prior. But we can’t pinpoint any wrong inference in Troll Bridge. It doesn’t seem like whats stopping us from pinpointing the mistake in Troll Bridge is a lack of empirical data. So, a good mechanism would have to learn to be susceptible to Troll Bridge, especially with the “right” prior. I just don’t see what would be a good reason for thinking theres a “right” prior that avoids Troll Bridge (other than “there just has to be some way of avoiding it”), that wouldn’t also let us tell directly how to think about Troll Bridge, no learning needed.
Again, my view may have drifted a bit from the LI paper, but the way I think about this is that the market maker looks at the minimum amount of money a trader has “in any world” (in the sense described in my other comment). This excludes worlds which the deductive process has ruled out, so for example if A∨B has been proved, all worlds will have either A or B. So if you had a bet which would pay $10 on A, and a bet which would pay $2 on B, you’re treated as if you have $2 to spend. It’s like a bookie allowing a gambler to make a bet without putting down the money because the bookie knows the gambler is “good for it” (the gambler will definitely be able to pay later, based on the bets the gambler already has, combined with the logical information we now know).
Of course, because logical bets don’t necessarily ever pay out, the market maker realistically shouldn’t expect that traders are necessarily “good for it”. But doing so allows traders to arbitrage logically contradictory beliefs, so, it’s nice for our purposes. (You could say this is a difference between an ideal prediction market and a mere betting market; a prediction market should allow arbitrage of inconsistency in this way.)
I agree you can arbitrage inconsistencies this way, but it seems very questionable. For one, it means the market maker needs to interpret the output of the deductive process semantically. And it makes him go bankrupt if that logic is inconsistent. And there could be a case where a proposition is undecidable, and a meta-proposition about it is undecidable, and a meta-meta-propopsition about it is undecidable, all the way up, and then something bad happens, though I’m not sure what concretely.
Hm. It’s a bit complicated and there are several possible ways to set things up. Reading that paragraph, I’m not sure about this sentence either.
In the version I was trying to explain, where traders are “forced to sell” every morning before the day of trading begins, the reasoner would receive 50¢ from the trader every day, but would return that money next morning. Also, in the version I was describing, the reasoner is forced to set the price to $1 rather than 50¢ as soon as the deductive process proves 1+1=2. So, that morning, the reasoner has to return $1 rather than 50¢. That’s where the reasoner loses money to the trader. After that, the price is $1 forever, so the trader would just be paying $1 every day and getting that $1 back the next morning.
I would then define exploitation as “the trader’s total wealth (across different times) has no upper bound”. (It doesn’t necessarily escape to infinity—it might oscillate up and down, but with higher and higher peaks.)
Now, the LI paper uses a different definition of exploitation, which involves how much money a trader has within a world (which basically means we imagine the deductive process decides all the sentences, and we ask how much money the trader would have; and, we consider all the different ways the deductive process could do this). This is not equivalent to my definition of exploitation in general; according to the LI paper, a trader ‘exploits’ the market even if its wealth is unbounded only in some very specific world (eg, where a specific sequence of in-fact-undecidable sentences gets proved).
However, I do have an unpublished proof that the two definitions of exploitation are equivalent for the logical induction algorithm and for a larger class of “reasonable” logical inductors. This is a non-trivial result, but, justifies using my definition of exploitation (which I personally find a lot more intuitive). My basic intuition for the result is: if you don’t know the future, the only way to ensure you don’t lose unbounded money in reality is to ensure you don’t lose unbounded money in any world. (“If you don’t know the future” is a significant constraint on logical inductors.)
Also, when those definitions do differ, I’m personally not convinced that the definition in the logical induction paper is better… it is stronger, in the sense that it gives us a more stringent logical induction criterion, but the “irrational” behaviors which it helps rule out don’t seem particularly irrational to me. Simply put, I am only convinced that I should care about actually losing unbounded money, as opposed to losing unbounded money in some hypothetical world.
Why is the price of the un-actualized bet constant? My argument in the OP was to suppose that PCH is the dominant hypothesis, so, mostly controls market prices. PCH thinks it gains important information when it sees which action we actually took, so it updates the expectation for the un-actualized action. So the price moves. Similarly, if PCH and LCH had similar probability, we would expect the price to move.
Thinking about this in detail, it seems like what influence traders have on the market price depends on a lot more of their inner workings than just their beliefs. I was thinking in a way where each trader only had one price for the bet, below which they bought and above which they sold, no matter how many units they traded (this might contradict “continuous trading strategies” because of finite wealth), in which case there would be a range of prices that could be the “market” price, and it could stay constant even with one end of that range shifting. But there could also be an outcome like yours, if the agents demand better and better prices to trade one more unit of the bet.
The continuity property is really important.
I think its still possible to have a scenario like this. Lets say each trader would buy or sell a certain amount when the price is below/above what they think it to be, but the transition being very steep instead of instant. Then you could still have long price intervalls where the amounts bought and sold remain constant, and then every point in there could be the market price.
I’m not sure if this is significant. I see no reason to set the traders up this way other than the result in the particular scenario that kicked this off, and adding traders who don’t follow this pattern breaks it. Still, its a bit worrying that trading strategies seem to matter in addition to beliefs, because what do they represent? A traders initial wealth is supposed to be our confidence in its heuristics—but if a trader is mathematical heuristics and trading strategy packaged, then what does confidence in the trading strategy mean epistemically? Two things to think about:
Is it possible to consistently define the set of traders with the same beliefs as trader X?
It seems that logical induction is using a trick, where it avoids inconsistent discrete traders, but includes an infinite sequence of continuous traders with ever steeper transitions to get some of the effects. This could lead to unexpected differences between behaviour “at all finite steps” vs “at the limit”. What can we say about logical induction if trading strategies need to be lipschitz-continuous with a shared upper limit on the lipschitz constant?
Now I feel like you’re trying to have it both ways; earlier you raised the concern that a proposal which doesn’t overtly respect logic could nonetheless learn a sort of logic internally, which could then be susceptible to Troll Bridge. I took this as a call for an explicit method of avoiding Troll Bridge, rather than merely making it possible with the right prior.
But now, you seem to be complaining that a method that explicitly avoids Troll Bridge would be too restrictive?
I think there is a mistake somewhere in the chain of inference from cross→−10 to low expected value for crossing. Material implication is being conflated with counterfactual implication.
A strong candidate from my perspective is the inference from ¬(A∧B) to C(A|B)=0 where C represents probabilistic/counterfactual conditional (whatever we are using to generate expectations for actions).
You seem to be arguing that being susceptible to Troll Bridge should be judged as a necessary/positive trait of a decision theory. But there are decision theories which don’t have this property, such as regular CDT, or TDT (depending on the logical-causality graph). Are you saying that those are all necessarily wrong, due to this?
I’m not sure quite what you meant by this. For example, I could have a lot of prior mass on “crossing gives me +10, not crossing gives me 0”. Then my +10 hypothesis would only be confirmed by experience. I could reason using counterfactuals, so that the troll bridge argument doesn’t come in and ruin things. So, there is definitely a way. And being born with this prior doesn’t seem like some kind of misunderstanding/delusion about the world.
So it also seems natural to try and design agents which reliably learn this, if they have repeated experience with Troll Bridge.
No, I think finding such a no-learning-needed method would be great. It just means your learning-based approach wouldn’t be needed.
No. I’m saying if our “good” reasoning can’t tell us where in Troll Bridge the mistake is, then something that learns to make “good” inferences would have to fall for it.
A CDT is only worth as much as its method of generating counterfactuals. We generally consider regular CDT (which I interpret as “getting its counterfactuals from something-like-epsilon-exploration”) to miss important logical connections. “TDT” doesn’t have such a method. There is a (logical) causality graph that makes you do the intuitively right thing on Troll Bridge, but how to find it formally?
Isn’t this just a rephrasing of your idea that the agent should act based on C(A|B) instead of B->A? I don’t see any occurance of ~(A&B) in the troll bridge argument. Now, it is equivalent to B->~A, so perhaps you think one of the propositions that occur as implications in troll bridge should be parsed this way? My modified troll bridge parses them all as counterfactual implication.
I’ve said why I don’t think “using counterfactuals”, absent further specification, is a solution. For the simple “crossing is +10″ belief… you’re right its succeeds, and insofar as you just wanted to show that its rationally possible to cross, I suppose it does.
This… really didn’t fit into my intuitions about learning. Consider that there is also the alternative agent who believes that crossing is −10, and sticks to that. And the reason he sticks to that isn’t that hes to afraid and VOI isn’t worth it: while its true that he never empirically confirms it, he is right, and the bridge would blow up if he were to cross it. That method works because it ignores the information in the problem description, and has us insert the relevant takeaway without any of the confusing stuff directly into its prior. Are you really willing to say: Yup, thats basically the solution to counterfactuals, just a bit of formalism left to work out?