There is at least one major step that I did not know of, between the things I think I understand and a market that has currency and traders.
I understand how a market of traders can result in a consensus evaluation of probability, because there is a *correct* evaluation of the probability of a proposition. How does a market of traders result in a consensus evaluation of the utility of an event? If two traders disagree about whether to pull the lever, how is it determined which one gets the currency?
Shares in the event are bought and sold on the market. The share will pay out $1 if the event is true. The share can also be shorted, in which case the shorter gets $1 if the event turns out false. The overall price equilibrates to a probability for the event.
There are several ways to handle utility. One way is to make bets about whether the utility will fall in particular ranges. Another way is for the market to directly contain shares of utility which can be purchased (and shorted). These pay out $U, whatever the utility actually turns out to be—traders give it an actual price by speculating on what the eventual value will be. In either case, we would then assign expected utility to events via conditional betting.
If we want do do reward-learning in a setup like this, the (discounted) rewards can be incremental payouts of the U shares. But note that even if there is no feedback of any kind (IE, the shares of U never actually pay out), the shares equilibrate to a subjective value on the market—like collector’s items. But the market still forces the changes in value over time to be increasingly coherent, and the conditional beliefs about it to be increasingly coherent. This corresponds to fully subjective utility with no outside feedback.
If two traders disagree about whether to pull the lever, how is it determined which one gets the currency?
They make bets about what happens if the lever is or isn’t pulled (including conditional buys/sells of shares of utility). These bets will be evaluated as normal. In this setup we only get feedback on whichever action actually happens—but, this may still be enough data to learn under certain assumptions (which I hope to discuss in a future post). We can also consider more exotic settings in which we do get feedback on both cases even though only one happens; this could be feasible through human feedback about counterfactuals. (I also hope to discuss this alternative in a future post.)
Suppose the utility trading commission discovered that a trader used forbidden methods to short a utility bet (e.g. insider trading, coercing other traders, exploiting a flaw in the marketplace), and takes action to confiscate the illicit gains.
What actions transfer utility from the target? (In systems that pay out money, their bank account is debited; in systems that use blockchain, transactions are added or rolled back manually) what does it mean to take utility from a trader directly?
There is at least one major step that I did not know of, between the things I think I understand and a market that has currency and traders.
I understand how a market of traders can result in a consensus evaluation of probability, because there is a *correct* evaluation of the probability of a proposition. How does a market of traders result in a consensus evaluation of the utility of an event? If two traders disagree about whether to pull the lever, how is it determined which one gets the currency?
The mechanism is the same in both cases:
Shares in the event are bought and sold on the market. The share will pay out $1 if the event is true. The share can also be shorted, in which case the shorter gets $1 if the event turns out false. The overall price equilibrates to a probability for the event.
There are several ways to handle utility. One way is to make bets about whether the utility will fall in particular ranges. Another way is for the market to directly contain shares of utility which can be purchased (and shorted). These pay out $U, whatever the utility actually turns out to be—traders give it an actual price by speculating on what the eventual value will be. In either case, we would then assign expected utility to events via conditional betting.
If we want do do reward-learning in a setup like this, the (discounted) rewards can be incremental payouts of the U shares. But note that even if there is no feedback of any kind (IE, the shares of U never actually pay out), the shares equilibrate to a subjective value on the market—like collector’s items. But the market still forces the changes in value over time to be increasingly coherent, and the conditional beliefs about it to be increasingly coherent. This corresponds to fully subjective utility with no outside feedback.
They make bets about what happens if the lever is or isn’t pulled (including conditional buys/sells of shares of utility). These bets will be evaluated as normal. In this setup we only get feedback on whichever action actually happens—but, this may still be enough data to learn under certain assumptions (which I hope to discuss in a future post). We can also consider more exotic settings in which we do get feedback on both cases even though only one happens; this could be feasible through human feedback about counterfactuals. (I also hope to discuss this alternative in a future post.)
Suppose the utility trading commission discovered that a trader used forbidden methods to short a utility bet (e.g. insider trading, coercing other traders, exploiting a flaw in the marketplace), and takes action to confiscate the illicit gains.
What actions transfer utility from the target? (In systems that pay out money, their bank account is debited; in systems that use blockchain, transactions are added or rolled back manually) what does it mean to take utility from a trader directly?