I’d say sample size is more important if any experiment can get any statistical significance with the right sample size but not any sample size can get any statistical significance with the right experiment. But you’re right, I overstated my case; amended; thank you.
abstractapplic
The original D-K papers also found different curves for different subject matter.
I can think of several explanations for this, all of which
might be trueare definitely at least a little true:Some subjects have higher variance in performance, resulting in steeper D-K curves.
Some subjects have higher variance in test-ability-to-measure-performance, again resulting in steeper D-K curves.
An actual D-K effect does exist, sometimes, superposed over the statistical mirage; and it’s stronger for some subjects than others.
An anti-D-K effect exists, and it’s stronger for some subjects than others.
Something else is happening I don’t know about.
And they made the unusual choice of dividing their populations into quartiles, throwing away quite a bit of resolution.
Doesn’t seem unusual to me ( . . . or suspicious, if that’s what you’re getting at). I get away with using deciles at my day job because I work on large datasets with low-variance data, and I get away with it here because I can just add zeroes to the number of elves simulated until my plots look as smooth as I want; Dunning & Kruger had a much smaller sample since they were studying college classes full of real round-eared human beings, and sensibly chose to bucket them into fewer buckets.
You’re probably overestimating how well you understand Dunning-Kruger
It could, for a game with an unusually small & clean dataset (I’m thinking in particular of On The Construction Of Impossible Structures and How The Grinch Pessimized Christmas) . . . but realistically a LWer solving a problem like that on paper would spend the entire time lamenting that they weren’t using a computer, which doesn’t seem like a mental state conducive to personal growth. So nvm.
(I do have other thoughts on potential epistemic grounding activities but they’re all obvious: board games, 2-4-6 tests[1], pub quizzes with confidence intervals attached, etc.)
- ^
With different rules than the original 2-4-6 test, obviously.
- ^
Fwiw, the scenarios don’t have to be solved collaboratively online, and in fact most players play most of them solo. For that matter, they don’t need internet access: would-be players could make sure they have the problem description & the dataset & their favorite analysis tools downloaded, then cut the wifi.
(. . . unless “be fully present” rules out laptops too, in which case yeah nvm.)
At CFAR workshops, people often become conscious of new ways their minds can work, and new things they can try. But we don’t have enough “and now I’ll try to repair my beautiful electronic sculpture, which I need to do right now because the windstorm just blew it all apart, and which will incidentally give me a bunch of real-world grounding” mixed in.
I’d love suggestions here.
I’ll try to make sure I’m running a D&D.Sci scenario over both of the spans you mentioned: data-science-y attendants would get a chance to test their data-science-y skills against small but tricky problems with knowable right answers, and non-data-science-y attendants would probably still get something out of spectating (especially if they make a point of trying to predict which participants are closest to said right answer).
(. . . and if anyone else has some kind of [inference|decision]-centric moderately-but-not-excessively-demanding public puzzle/challenge they’ve been meaning to run, those spans look like the time to do it.)
Today, I estimate a 30–50% chance of significantly reshaping education for nearly 700,000 students and 50,000 staff.
I’d be interested to hear how that pans out a year from now.[1]
The lesson:
Don’t spend energy forcing people into actions they’re not already motivated to take.
I guess that’s a valid moral to this story? I think most LWers would see this as further evidence for “political stuff gets you a lot more impact per unit effort if you’re making significant use of your comparative advantages and/or taking stances orthogonal to existing party lines (‘pulling ropes sideways’)”.
Regardless, strong-upvoted for doing interesting things in the real world and then writing about them.
- ^
. . . how do we not already have a custom emoji for this sentiment?
- ^
Amended; ty for your honesty.
D&D.Sci: Serial Healers [Evaluation & Ruleset]
IIRC, the AI 2027 scenario is those researchers’ median outcome in the sense that it’s a slightly pessimistic view of what they think could plausibly happen if nothing disruptive happens in the next two years; they expect disruptive things will probably happen and move the timeline back; 2027 might be their modal guess, but it’s not their median as most people use the term.
(Also, what Rana Dexsin said.)
I think I technically count as one of those? It’s not my day job, but I contributed a task to METR’s Long Tasks paper, and I’ve made minor contributions to a handful of other AI-Safety-ish papers.
Anyway, if it counts: I support a ban as well. (I don’t have a very high p(doom), but I don’t think it needs to be very high to be Too High.)
To what extent does LessWrong expect knowledge in the STEM fields?
I mean, it helps? I wouldn’t say it’s required.
My understanding is that rationalist thinking is based on Bayesian probability calculation
It’s less to do with Bayes as in actually-doing-the-calculation and more to do with Bayes as in recognizing-there’s-an-ideal-to-approximate. Letting evidence shift your position at all is the main thing. (If you do an explicit Bayesian calculation about something real, you’ll have done about as many explicit Bayesian calculations as the median LW user has this year.)
and being self-aware of cognitive biases
I mean, we were. Then we found out that most published scientific findings, including the psych literature which includes the bias literature, don’t replicate. And AI, the topic most of us were trying to de-bias ourselves to think about, went kinda crazy over the last five years. So now we talk about AI more than biases. (If you can find something worthwhile to say about biases, please do!)
The problem is I’m more on the philosophical side if anything
If you pick two dozen or so posts at random, I’d expect you’ll get more Philosophical ones than STEMmy ones. (AI posts don’t count for either column imo; also, they usually don’t hard-require technical background other than “LLMs are a thing now” and “inhuman intellects being smarter than humans is kinda scary”.)
Given this, how appropriate is it for me to enter this community?
Extremely. Welcome aboard!
Upvoted; I think this was worth making, and more people should do more things like this.
Notes:
The Resource Selection is effectively another part of the select-the-difficulty-level component, but is implicitly treated as part of the regular game. I think this could be signposted better.
I suspect the game would be more engaging—and feel longer, and feel more like a game—if it were more serialized and less parallelized. Instead of one “which of these do we fund?” choice, you could present us with a chain of “do we fund this? how about this? how about this?” choices; instead of one nine-option choice of overall strategy, you could have a three-option choice of Big Picture Strategy followed by another three-option choice of And What Would That Actually Look Like Strategy.
You say we’re free to steal the idea but I can’t find an open-source license in the repo. (There is a “License and Attribution” section in the AI-generated README, but it contains neither licenses nor attributions.)
I’ve now had multiple people tell me that I shouldn’t have released anything game-shaped during what is apparently Silksong week. Accordingly, I’m changing the deadline to Sep 22nd; apologies for any inconvenience, and you’re welcome for any convenience.
(I assume N/A means “not in the city”?)
Correct.
Also, strong-upvoted for asking a good question. For a community that spends so much time thinking and talking about bets and prediction markets, we really haven’t engaged with practicalities like “but what about people who [have | worry about] life-ruining gambling addictions?”
As I see it, most of the epistemic benefit of betting comes from:
A) Having any kind of check on nonsense whatsoever.
B) The fact it forces people with different opinions to talk and agree about something (even if it’s just the form of their disagreement).
C) The way involving money requires people to operationalize exactly what they mean (incidentally revealing when they don’t mean anything at all).
None of this requires betting large amounts of money; afaict, most bets in the rat community have been small & symbolic amounts relative to the bettors’ annual income. So an easy way to 80⁄20 this would be to set yourself a modest monthly gambling budget (which doesn’t roll over from one month to the next, and doesn’t get winnings added back in), only use it for political/technological/economic/literary/etc questions (no slot machines & horse races, etc), and immediately stop gambling if you ever exceed it.
Then a valid response to your friend becomes “sorry, that’s over my gambling budget, but I would bet 50 reais at 2:1 odds in your favor, and you get to brag about it if it turns out I’m wrong”. (. . . and if you wouldn’t have made that bet either, you’d have learned something important and not even have had to risk the money.)
You get scored based on the number of mages you correctly accuse; and a valid accusation requires you to specify at least one kind of illegal healing they’ve done. (So if you’ve already got Exampellius the Explanatory for healing Chucklepox, you don’t get anything extra for identifying that he’s also healed several cases of Rumblepox.)
Yes; edited to clarify; ty.
This seems like the sort of thing best addressed by me adding a warning / attention-conservation-notice at the start of the article, though I’m not sure what would be appropriate. “Content Note: Trolling”?