Ooops. To redeem my tarnished honor, I propose an algorithmic solution to the duplicate quote problem: a full list of quotes indexed by author (of the quote). Checking to see if a quote has already been posted would then be a fast operation.
Your honour remains intact! I predicted that the quote had been used, based primarily on how much I like it. Google didn’t find it in a quotes thread. I suppose that would mean my honour is tarnished. How much honour does one lose by assigning greater than 0.5 probability to something that turns out to be incorrect. Is there some kind of algorithm for that? ;)
You add the log of the probability you gave for what happened, so add ln(1-0.87) = −2.04 honor. Unfortunately, there’s no way to make it go up, and it’s pretty much guaranteed to go down a lot.
Just don’t assign anything a probability of 0. If you’re wrong, you lose infinite honor.
I like it, but that ‘no way to make it go up’ is a problem. It feels like we should have some sort of logarithmic representation of honour too, allowing for increasing honour if you get something right, mostly when your honour is currently low.
To what extent do we want ‘honour’ to be a measure of calibration and to what extent a measure of predictive power?
A naive suggestion could be to take log(x) - log(p), where p is the probability given by MAXENT. That is, honor is how much better you do than the “completely uninformed” maximal entropy predictor. This would enable better-than-average predictors to make their honor go up.
This of course has the shortcoming that maximal entropy may not be practical to actually calculate in many situations. It also may or may not produce incentives to strategically make certain predictions and not others. I haven’t analysed that very much.
I can’t remember the Post I got that from. It wasn’t talking about honor.
This is the only possible system in which you’re rewarded most for giving the answers accurately, and your honor remains the same regardless of how you count it. For example, predicting A and B loses the same honor as predicting A and predicting B given A.
Technically, you can use a different log base, but that just amounts to a scaling factor.
I like it, but that ‘no way to make it go up’ is a problem.
I agree; the typical human brain balks and runs away when faced with a scale of merit whose max-point is 0.
To what extent do we want ‘honour’ to be a measure of calibration and to what extent a measure of predictive power?
Yes.
In other words, my honor as an epistemic rationalist should be a mix of calibration and predictive power. An amusing but arbitrary formula might be just to give yourself 2x honor when your binary prediction with probability x comes true and to dock yourself ln (1-x) honor when it doesn’t. If you make 20 predictions each at p = 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, and 0.95 for a total of 200 predictions a day and you are perfectly calibrated, you would expect to lose about 3.4 honor each day.
There’s gotta be a way to fix this so that a perfectly calibrated person would gain a tiny amount of honor each day rather than lose it. It might not be elegant, though. Got any ideas?
I agree; the typical human brain balks and runs away when faced with a scale of merit whose max-point is 0.
Zero does seem more appropriate either as a minimum or a midpoint. If everything is going to be negative then flip it around and say ‘less is good’! But the main problem I have with only losing honor based on making predictions is that it essentially rewards never saying anything of importance that could be contradicted. That sounds a bit too much like real life for some reason. ;)
There’s gotta be a way to fix this so that a perfectly calibrated person would gain a tiny amount of honor each day rather than lose it. It might not be elegant, though. Got any ideas?
The tricky part is not so much making up the equations but in determining what criteria to rate the scale against. We would inevitably be injecting something arbitrary.
You’re supposed to have a probability for everything. The closest you can do to not guessing is give every possibility equal probabilities, in which case you’d lose honor even faster than normal.
You could give yourself honor equal to the square of the probability you gave, but that means you’d have incentive to phrase it in as many questions possible. After all, if you gave a single probability for what happens for your entire life, you couldn’t get more than one point of honor. With the system I mentioned first, you’d lose exactly the same honor.
Honour I don’t know about; I feel like any honour lost you could gain back by giving us a costly signal that you are recalibrating. But it does let us determine how badly calibrated you are, and then we can make judgements like pr(wedrifid is wrong | wedrifid is badly calibrated).
Particularly when the ‘prediction’ was largely my way of complimenting the quote in a non-boring way. :P
I was actually relieved when I didn’t found it wasn’t in the quotes thread. I wasn’t sure what I would update to if it was a double post. Slightly upward, only a little—there were too many complications. I can even imagine lowering p(double post | a quote is awesome and relevant) based finding that the instance is, in fact, a double post. (If the probability is particularly high and the underlying reasoning was such that I expected comments of that level of awesome to have been reposted half a dozen times.)
The tricky part now is not to prevent my intuitive expectation from updating too much. I’ve paid particular attention to this instance so by default I would expect my intuitions to base to much on the single case.
The hard part would then be making that list algorithmically. An easier algorithmic method would be to do approximate string matches with previous quote threads, using something like the Smith-Waterman algorithm for pairwise local sequence alignment. This is what biologists do when they have a gene sequence and want to know if something like it is already in the databases, and there’s no reason why the method shouldn’t also apply just as well to English text.
The way this would look to users is just a text box where you paste in the quote, and it’ll tell you if the quote has been posted before. Even easier to use than a full list of quotes.
Ooops. To redeem my tarnished honor, I propose an algorithmic solution to the duplicate quote problem: a full list of quotes indexed by author (of the quote). Checking to see if a quote has already been posted would then be a fast operation.
Your honour remains intact! I predicted that the quote had been used, based primarily on how much I like it. Google didn’t find it in a quotes thread. I suppose that would mean my honour is tarnished. How much honour does one lose by assigning greater than 0.5 probability to something that turns out to be incorrect. Is there some kind of algorithm for that? ;)
You add the log of the probability you gave for what happened, so add ln(1-0.87) = −2.04 honor. Unfortunately, there’s no way to make it go up, and it’s pretty much guaranteed to go down a lot.
Just don’t assign anything a probability of 0. If you’re wrong, you lose infinite honor.
I like it, but that ‘no way to make it go up’ is a problem. It feels like we should have some sort of logarithmic representation of honour too, allowing for increasing honour if you get something right, mostly when your honour is currently low.
To what extent do we want ‘honour’ to be a measure of calibration and to what extent a measure of predictive power?
A naive suggestion could be to take log(x) - log(p), where p is the probability given by MAXENT. That is, honor is how much better you do than the “completely uninformed” maximal entropy predictor. This would enable better-than-average predictors to make their honor go up.
This of course has the shortcoming that maximal entropy may not be practical to actually calculate in many situations. It also may or may not produce incentives to strategically make certain predictions and not others. I haven’t analysed that very much.
I can’t remember the Post I got that from. It wasn’t talking about honor.
This is the only possible system in which you’re rewarded most for giving the answers accurately, and your honor remains the same regardless of how you count it. For example, predicting A and B loses the same honor as predicting A and predicting B given A.
Technically, you can use a different log base, but that just amounts to a scaling factor.
I agree; the typical human brain balks and runs away when faced with a scale of merit whose max-point is 0.
Yes.
In other words, my honor as an epistemic rationalist should be a mix of calibration and predictive power. An amusing but arbitrary formula might be just to give yourself 2x honor when your binary prediction with probability x comes true and to dock yourself ln (1-x) honor when it doesn’t. If you make 20 predictions each at p = 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, and 0.95 for a total of 200 predictions a day and you are perfectly calibrated, you would expect to lose about 3.4 honor each day.
There’s gotta be a way to fix this so that a perfectly calibrated person would gain a tiny amount of honor each day rather than lose it. It might not be elegant, though. Got any ideas?
Zero does seem more appropriate either as a minimum or a midpoint. If everything is going to be negative then flip it around and say ‘less is good’! But the main problem I have with only losing honor based on making predictions is that it essentially rewards never saying anything of importance that could be contradicted. That sounds a bit too much like real life for some reason. ;)
The tricky part is not so much making up the equations but in determining what criteria to rate the scale against. We would inevitably be injecting something arbitrary.
You’re supposed to have a probability for everything. The closest you can do to not guessing is give every possibility equal probabilities, in which case you’d lose honor even faster than normal.
You could give yourself honor equal to the square of the probability you gave, but that means you’d have incentive to phrase it in as many questions possible. After all, if you gave a single probability for what happens for your entire life, you couldn’t get more than one point of honor. With the system I mentioned first, you’d lose exactly the same honor.
Honour I don’t know about; I feel like any honour lost you could gain back by giving us a costly signal that you are recalibrating. But it does let us determine how badly calibrated you are, and then we can make judgements like pr(wedrifid is wrong | wedrifid is badly calibrated).
:P
Particularly when the ‘prediction’ was largely my way of complimenting the quote in a non-boring way. :P
I was actually relieved when I didn’t found it wasn’t in the quotes thread. I wasn’t sure what I would update to if it was a double post. Slightly upward, only a little—there were too many complications. I can even imagine lowering p(double post | a quote is awesome and relevant) based finding that the instance is, in fact, a double post. (If the probability is particularly high and the underlying reasoning was such that I expected comments of that level of awesome to have been reposted half a dozen times.)
The tricky part now is not to prevent my intuitive expectation from updating too much. I’ve paid particular attention to this instance so by default I would expect my intuitions to base to much on the single case.
The hard part would then be making that list algorithmically. An easier algorithmic method would be to do approximate string matches with previous quote threads, using something like the Smith-Waterman algorithm for pairwise local sequence alignment. This is what biologists do when they have a gene sequence and want to know if something like it is already in the databases, and there’s no reason why the method shouldn’t also apply just as well to English text.
The way this would look to users is just a text box where you paste in the quote, and it’ll tell you if the quote has been posted before. Even easier to use than a full list of quotes.