I am continuing recording predictions on PB.com, focusing right now on importing predictions from Intrade. I recently passed 1000 registered predictions (currently at more like 1070), and as I make more predictions, I’m taking notes for an article on what I’ve learned.
I am almost finished with my blind testing of Adderall pills; my main conclusion is currently shaping up to be ‘it works for me, but they’re expensive and they can badly mess up my sleep; armodafinil is better’. (Once I’m done, I can put the daily predictions of placebo vs Adderall into PB.com as well; then I need to check my Zeo records to see whether I am imagining the sleep thing or not.)
I’m a few characters away from finishing my half-Japanese character database; this is a silly non-rigorous project to see whether anime is biased and contrary to reality in presenting mostly characters with foreign mothers rather than foreign fathers. When I finish researching the last few characters, I can start posting requests for omitted characters in various forums and graph what I’ve found so far.
I have begun a new Wikipedia project: collecting all the instances where I posted a link or a link + excerpts on Wikipedia talk pages, so I can see whether they are almost universally ignored and rarely actually used in the article (as is my impression). I expect to find that my non-anime link/excerpts are incorporated more than the anime ones, inasmuch as the anime community on Wikipedia has been decimated.
Regarding 1: It seems like for most predictions you are taking whatever estimate people are making and then putting the same value in but with 5-10% less confidence. Is that your primary approach? You seem to be getting extremely accurate calibration this way especially when I compare the overall calibration to your calibration.
It’s my primary approach when making predictions on things I’m not familiar with (any prediction starting with ‘I’...). My general view is that when people aren’t being outrageously incorrect in whatever fashion (like XiXiDu’s recent Khan Academy prediction), they tend to be overconfident; the solution to that is adjusting their prediction towards 50%.
If you look at my Intrade-based predictions, you’ll see that while sometimes I just punt and copy Intrade, sometimes I differ severely. It’s a case by case thing.
(Also, I’m not sure you’re interpreting the graphs right. My understanding is that the graphs show that I am substantially underconfident as compared to PB in general. EDIT: I seem to be wrong here.)
(Also, I’m not sure you’re interpreting the graphs right. My understanding is that the graphs show that I am substantially underconfident as compared to PB in general.)
Every 10% range should have an actual certainty about midway in the range right? So for example, for the “50%” range a perfect calibration would be 55% (assuming equidistribution over the whole 50-60 range). For PB as a whole, every category is at least 10% off. For PB as a whole we get: 55% compared to an abysmal 37%, 65% compared to 58%, 75% compared to 58%, 85% compared to 70%, 95% compared to 79%. And the real kicker is that the 100% category is wrong one fifth of the time. In contrast, your numbers are 55% going to 41%, 65% going to 51%, 75% going to 60%, 85% going to 86%, and 95% going to 92%. Your 100% goes to 93%. Thus, with the exception of the 60-70 range every one of yours is better calibrated, and your 80-90 and 90-100 ranges are nearly spot on. Am I misinterpreting the graphs?
You know, I thought that I was supposed to have as flat a line as possible, but now I’m not sure. Re-reading the two axis, I guess the ideal graph is not the green line, but a line at a 45 degree angle going from 50%/50% in the middle-left to 100%/100% in the upper-right.
Have I been misreading the graphs this entire time? How embarrassing! I guess these graphs could be clearer, and explicitly graph the ‘ideal’ line...
You know, I sort of presumed you were one of the people who had been involved in setting up PB because you spend so much time with it and seem to know its ins and outs. But your comment suggests that’s not the case. Who does run it?
Tricycle runs it, like LW (see Eliezer’s ANN). Matthew Fallenshaw seems to be the one most involved with it—at least, I’ve always corresponded with him about it.
I am continuing recording predictions on PB.com, focusing right now on importing predictions from Intrade. I recently passed 1000 registered predictions (currently at more like 1070), and as I make more predictions, I’m taking notes for an article on what I’ve learned.
I am almost finished with my blind testing of Adderall pills; my main conclusion is currently shaping up to be ‘it works for me, but they’re expensive and they can badly mess up my sleep; armodafinil is better’. (Once I’m done, I can put the daily predictions of placebo vs Adderall into PB.com as well; then I need to check my Zeo records to see whether I am imagining the sleep thing or not.)
I continue my alternate-day ‘standing on one leg before going to bed’ experiment, based on Seth Roberts’ results. So far I don’t see much difference, so I’m probably going to have to continue this for at least as long as my melatonin Zeo experiment (EDIT: I’ve decided to stop it today. 70 nights of data ought to be enough. The analysis is at http://www.gwern.net/Zeo#one-legged-standing-analysis )
I’m a few characters away from finishing my half-Japanese character database; this is a silly non-rigorous project to see whether anime is biased and contrary to reality in presenting mostly characters with foreign mothers rather than foreign fathers. When I finish researching the last few characters, I can start posting requests for omitted characters in various forums and graph what I’ve found so far.
I have begun a new Wikipedia project: collecting all the instances where I posted a link or a link + excerpts on Wikipedia talk pages, so I can see whether they are almost universally ignored and rarely actually used in the article (as is my impression). I expect to find that my non-anime link/excerpts are incorporated more than the anime ones, inasmuch as the anime community on Wikipedia has been decimated.
I look forward to the article about predictions.
It’s up at http://lesswrong.com/lw/7z9/1001_predictionbook_nights/ Hope you enjoyed it.
Regarding 1: It seems like for most predictions you are taking whatever estimate people are making and then putting the same value in but with 5-10% less confidence. Is that your primary approach? You seem to be getting extremely accurate calibration this way especially when I compare the overall calibration to your calibration.
It’s my primary approach when making predictions on things I’m not familiar with (any prediction starting with ‘I’...). My general view is that when people aren’t being outrageously incorrect in whatever fashion (like XiXiDu’s recent Khan Academy prediction), they tend to be overconfident; the solution to that is adjusting their prediction towards 50%.
If you look at my Intrade-based predictions, you’ll see that while sometimes I just punt and copy Intrade, sometimes I differ severely. It’s a case by case thing.
(Also, I’m not sure you’re interpreting the graphs right. My understanding is that the graphs show that I am substantially underconfident as compared to PB in general. EDIT: I seem to be wrong here.)
Every 10% range should have an actual certainty about midway in the range right? So for example, for the “50%” range a perfect calibration would be 55% (assuming equidistribution over the whole 50-60 range). For PB as a whole, every category is at least 10% off. For PB as a whole we get: 55% compared to an abysmal 37%, 65% compared to 58%, 75% compared to 58%, 85% compared to 70%, 95% compared to 79%. And the real kicker is that the 100% category is wrong one fifth of the time. In contrast, your numbers are 55% going to 41%, 65% going to 51%, 75% going to 60%, 85% going to 86%, and 95% going to 92%. Your 100% goes to 93%. Thus, with the exception of the 60-70 range every one of yours is better calibrated, and your 80-90 and 90-100 ranges are nearly spot on. Am I misinterpreting the graphs?
You know, I thought that I was supposed to have as flat a line as possible, but now I’m not sure. Re-reading the two axis, I guess the ideal graph is not the green line, but a line at a 45 degree angle going from 50%/50% in the middle-left to 100%/100% in the upper-right.
Have I been misreading the graphs this entire time? How embarrassing! I guess these graphs could be clearer, and explicitly graph the ‘ideal’ line...
You know, I sort of presumed you were one of the people who had been involved in setting up PB because you spend so much time with it and seem to know its ins and outs. But your comment suggests that’s not the case. Who does run it?
Tricycle runs it, like LW (see Eliezer’s ANN). Matthew Fallenshaw seems to be the one most involved with it—at least, I’ve always corresponded with him about it.
How are you blind testing Adderall? Have you ever been mistaken about whether you were taking the placebo?
Both of those questions are answered in the link.