Why is downvoting disabled, for how long has it been like this, and when will it be back?
VincentYu
In support of your point, MIRI itself changed (in the opposite direction) from its former stance on AI research.
You’ve been around long enough to know this, but for others: The former ambition of MIRI in the early 2000s—back when it was called the SIAI—was to create artificial superintelligence, but that ambition changed to ensuring AI friendliness after considering the “terrible consequences [now] feared by the likes of MIRI”.
In the words of Zack_M_Davis 6 years ago:
(Disclaimer: I don’t speak for SingInst, nor am I presently affiliated with them.)
But recall that the old name was “Singularity Institute for Artificial Intelligence,” chosen before the inherent dangers of AI were understood. The unambiguous for is no longer appropriate, and “Singularity Institute about Artificial Intelligence” might seem awkward.
I seem to remember someone saying back in 2008 that the organization should rebrand as the “Singularity Institute For or Against Artificial Intelligence Depending on Which Seems to Be a Better Idea Upon Due Consideration,” but obviously that was only a joke.
I’ve always thought it’s a shame they picked the name MIRI over SIFAAIDWSBBIUDC.
who on lesswrong tracks their predictions outside of predictionbook, and their thoughts on that method
Just adding to the other responses: I also use Metaculus and like it a lot. In another thread, I posted a rough note about its community’s calibration.
Compared to PredictionBook, the major limitation of Metaculus is that users cannot create and predict on arbitrary questions, because questions are curated. This is an inherent limitation/feature for a website like Metaculus because they want the community to focus on a set of questions of general interest. In Metaculus’s case, ‘general interest’ translates mostly to ‘science and technology’; for questions on politics, I suggest taking a look at GJ Open instead.
Here is the full text article that was actually published by Kahneman et al. (2011) in Harvard Business Review, and here is the figure that was in HBR:
Is there any information on how well-calibrated the community predictions are on Metaculus?
Great question! Yes. There was a post on the official Metaculus blog that addressed this, though this was back in Oct 2016. In the past, they’ve also sent to subscribed users a few emails that looked at community calibration.
I’ve actually done my own analysis on this around two months ago, in private communication. Let me just copy two of the plots I created and what I said there. You might want to ignore the plots and details, and just skip to the “brief summary” at the end.
(Questions on Metaculus go through an ‘open’ phase then a ‘closed’ phase; predictions can only be made and updated while the question is open. After a question closes, it gets resolved either positive or negative once the outcome is known. I based my analysis on the 71 questions that have been resolved as of 2 months ago; there are around 100 resolved questions now.)
First, here’s a plot for the 71 final median predictions. The elements of this plot:
Of all monotonic functions, the black line is the one that, when applied to this set of median predictions, performs the best (in mean score) under every proper scoring rule given the realized outcomes. This can be interpreted as a histogram with adaptive bin widths. So for instance, the figure shows that, binned together, predictions from 14% to 45% resolved positive around 0.11 of the time. This is also the maximum-likelihood monotonic function.
The confidence bands are for the null hypothesis that the 71 predictions are all perfectly calibrated and independent, so that we can sample the distribution of counterfactual outcomes simply by treating the outcome of each prediction with credence p as an independent coin flip with probability p of positive resolution. I sampled 80,000 sets of these 71 outcomes, and built the confidence bands by computing the corresponding maximum-likelihood monotonic function for each set. The inner band is pointwise 1 sigma, whereas the outer is familywise 2 sigma. So the corner of the black line that exceeds the outer band around predictions of 45% is a p < 0.05 event under perfect calibration, and it looks to me that predictions around 30% to 40% are miscalibrated (underconfident).
The two rows of tick marks below the x-axis show the 71 predictions, with the upper green row comprising positive resolutions, and the lower red row comprising negatives.
The dotted blue line is a rough estimate of the proportion of questions resolving positive along the range of predictions, based on kernel density estimates of the distributions of predictions giving positive and negative resolutions.
Now, a plot of all 3723 final predictions on the 71 questions.
The black line is again the monotonic function that minimizes mean proper score, but with the 1% and 99% predictions removed because—as I expected—they were especially miscalibrated (overconfident) compared to nearby predictions.
The two black dots indicate the proportion of question resolving positive for 1% and 99% predictions (around 0.4 and 0.8).
I don’t have any bands indicating dispersion here because these predictions are a correlated mess that I can’t deal with. But for predictions below 20%, the deviation from the diagonal looks large enough that I think it shows miscalibration (overconfidence).
Along the x-axis I’ve plotted kernel density estimates of the predictions resolving positive (green, solid line) and negative (red, dotted line). Kernel densities were computed under log-odds with Gaussian kernels, then converted back to probabilities in [0, 1].
The blue dotted line is again a rough estimate of the proportion resolving positive, using these two density estimates.
Brief summary:
Median predictions around 30% to 40% occur less often than claimed.
User predictions below around 20% occur more often than claimed.
User predictions at 1% and 99% are obviously overconfident.
Other than these, calibration seems okay everywhere else; at least, they aren’t obviously off.
I’m very surprised that user predictions look fairly accurate around 90% and 95% (resolving positive around 0.85 and 0.90 of the time). I expected strong overconfidence like that shown by the predictions below 20%.
Also, if one wanted to get into it, could you describe what your process is?
Is there anything in particular that you want to hear about? Or would you rather have a general description of 1) how I’d suggest starting out on Metaculus, and/or 2) how I approach making and updating predictions on the site, and/or 3) something else?
(The FAQ is handy for questions about the site. It’s linked to by the ‘help’ button at the button of every page.)
- 9 Feb 2017 11:04 UTC; 0 points) 's comment on Open thread, Feb. 06 - Feb. 12, 2017 by (
That’s some neat data and observation! Could there be other substantial moderating differences between the days when you generate ~900 kJ and the days when you don’t? (E.g., does your mental state before you ride affect how much energy you generate? This could suggest a different causal relationship.) If there are, maybe some of these effects can be removed if you independently randomize the energy you generate each time you ride, so that you don’t get to choose how much you ride.
To make this a single-blinded experiment, just wear a blindfold; to double blind, add a high-beam lamp to your bike; and to triple blind, equip and direct high beams both front and rear.
… okay, there will be no blinding.
Polled.
I generally do only a quick skim of post titles and open threads (edit: maybe twice a month on average; I’ll try visiting more often). I used to check LW compulsively prior to 2013, but now I think both LW and I have changed a lot and diverged from each other. No hard feelings, though.
I rarely click link posts on LW. I seldom find them interesting, but I don’t mind them as long as other LWers like them.
I mostly check LW through a desktop browser. Back in 2011–2012, I used Wei Dai’s “Power Reader” script to read all comments. I also used to rely on Dbaupp’s “scroll to new comments” script after they posted it in 2011, but these days I use Bakkot’s “comment highlight” script. (Thanks to all three of you!)
I’ve been on Metaculus a lot over the past year. It’s a prediction website focusing on science and tech (the site’s been mentioned a few times on LW, and in fact that’s how I heard of it). It’s sort of like a gamified and moderated PredictionBook. (Edit: It’s also similar to GJ Open, but IMO, Metaculus has way better questions and scoring.) It’s a more-work-less-talk kind of website, so it’s definitely not a site for general discussions.
I’ve been meaning to write an introductory post about Metaculus… I’ll get to that sometime.
Given that one of LW’s past focus was on biases, heuristics, and the Bayesian interpretation of probability, I think some of you might find it worthwhile and fun to do some real-world practice on manipulating subjective probabilities based on finding evidence. Metaculus is all about that sort of stuff, so join us! (My username there is ‘v’. I recognize a few of you, especially WhySpace, over there.) The site itself is under continual improvement and work, and I know that the admins have high ambitions for it.
Edit: By the way, this is a great post and idea. Thanks!
I haven’t been around for a while, but I expect to start fulfilling the backlog of requests after Christmas. Sorry for the long wait.
Do we know which country Wright was living in during 2010?
Here.
Requested.
The article is available on various websites by exact phrase searching, but there are some minor transcription errors in these copies. I’ve transcribed it below using Google’s copy of the scanned article to correct these errors. There seems to be a relevant captioned figure (maybe a photo of Fuller?) on p. 63 of the magazine that is missing from the scan.
Dymaxion Sleep
Sleep is just a bad habit. So said Socrates and Samuel Johnson, and so for years has thought grey-haired Richard Buckminster Fuller, futurific [sic] inventor of the Dymaxion* house (Time, Aug. 22, 1932), the Dymaxion car and the Dymaxion globe. Fuller made a deliberate attempt to break the sleep habit, with excellent results. Last week he announced his Dymaxion system of sleeping. Two hours of sleep a day, he said firmly, is plenty.
Fuller reasoned that man has a primary store of energy, quickly replenished, and a secondary reserve (second wind) that takes longer to restore. Therefore, he thought, a man should be able to cut his rest periods shorter by relaxing as soon as he has used up his primary energy. Fuller trained himself to take a nap at the first sign of fatigue (i.e., when his attention to his work began to wander). These intervals came about every six hours; after a half-hour’s nap he was completely refreshed.
For two years Fuller thus averaged two hours of sleep in 24. Result: “The most vigorous and alert condition I have ever enjoyed.” Life-insurance doctors who examined him found him sound as a nut. Eventually he had to quit because his schedule conflicted with that of his business associates, who insisted on sleeping like other men. Now working for the Foreign Economic Administration, Buckminster Fuller finds Dymaxion working and sleeping out of the question. But he wishes the nation’s “key thinkers” could adopt his schedule; he is convinced it would shorten the war.
Intermittent sleeping was not originated by Fuller, has respectable scientific backing. [sic] Last week the Industrial Bulletin of Arthur D. Little, Inc., famed Cambridge, Mass. research firm, which published Fuller’s sleeping plan, noted a strong point in its favor: most sleep investigators agree that the first hours of sleep are the soundest. Some pro-Fuller evidence:
Photographs and electric devices to record movements show that the average sleeper, who changes position at least 40 times during an eight-hour stretch, is quietest in the first two hours, then grows progressively more restless.
At Colgate University sleep investigator Donald A. Laird found that people awakened after four hours’ sleep were just as alert, well-coordinated physically and resistant to fatigue as those who slept eight hours (but they did lose in accuracy and concentration).
* A Fuller word representing “dynamic” and “maximum service.”
Here. Figures 4 and 5 are missing from the scan that I received. Dope ads.
Requested.
From the linked Wired article:
The PGP key associated with Nakamoto’s email address and references to an upcoming “cryptocurrency paper” and “triple entry accounting” were added sometime after 2013.
Gwern’s comment in the Reddit thread:
[...] this is why we put our effort into nailing down the creation and modification dates of the blog post in third-party archives like the IA and Google Reader.
These comments seem to partly refer to the 2013 mass archive of Google Reader just before it was discontinued. For others who want to examine the data: the relevant WARC records for
gse-compliance.blogspot.com
are in line 110789824 to line 110796183 ofgreader_20130604001315.megawarc.warc
, which is about three-quarters of the way into the file. I haven’t checked the directory and stats grabs and don’t plan to, as I don’t want to spend any more time on this.NB: As for any other large compressed archives, if you plan on saving the data, then I suggest decompressing the stream as you download it and recompressing into a seekable structure. Btrfs with compression works well, but blocked compression implementations like
bgzip
should also work in a pinch. If you leave the archive as a single compressed stream, then you’ll pull all your hair out when you try to look through the data.
3. Here.
Huh. I never knew there were so many other plants that had similar effects on cats.
Anyway, best of luck getting Todd’s work… and getting cats high.
Why the interest in catnip?
Requested.
Sadly, I can’t request entire dissertations. I’m sure there are Harvard students on LW; maybe try asking for help in the open thread?
Requested.
Thanks for writing such a comprehensive explanation!