Dath Ilan is a parallel Earth on which human civilization has its act together, in ways that actual-Earth does not. Like actual-Earth, citizens of Dath Ilan sometimes take standardized tests, both to figure out what sort of jobs they’d be suited for, to make sure that its educational institutions are functioning, and to give people guidance about what they might want to study. Unlike Earth’s, Dath Ilan’s tests have had a lot of thought put into the choice of topics: rather a lot more economics, rather a lot less trigonometry and literature. Topics are selected based on cost/benefit; something that takes a long time to learn would need to be a lot more useful, or have major positive externalities to more people knowing it.
I want to create a test, that will tell people what topics they ought to learn, and enable people to make their knowledgeability legible.
What topics belong on it?
Intuitive and non-intuitive findings of systems theory, like what adding a second queue does or increasing the variability of a flow rate or changing the size of a buffer.
Relatedly and perhaps even more fundamentally, the basic discipline of thinking about a system and implementing a mathematical model or simulation to explore these topics, which drove the insights you mention. And in many ways, it’s easier to test without worrying about people gaming the system, because you can give new examples and require them to actually explore the question.
Being presented a plan/system/technology/mathematical proof and trying to find reasons why it doesn’t/won’t work
Conducting simple cost-benefit analyses (mostly using the four basic arithmetic operations + dimensional analysis), with a time-limit of 5/30/60 minutes
This includes knowledge about expected value (and it’s possible shortcomings)
This maybe even includes knowledge about the expected value of perfect information
Performing Monte-Carlo calculations using something like Guesstimate, ideally on the same problem as a previous cost-benefit analysis.
Calibration training
Over unknown variables, such as “Which person was born earlier”, or “Which country has a bigger population”
By forecasting events on different timescales, ranging from a day to a month (maybe more (a decade would be great), but the exam format probably doesn’t allow that)
Reading and writing Dath Illan’s spoken language.
Arithmetic.
Individual and collective decision making and execution
How to make a good decision.
Individually
As part of a collective
How to recognize a good decision made by others
How to execute a good decision when you or others have made it—especially in the face of akrasia.
also signing Dath Illan’s nonverbal language, because why would you only make use of the visual modality when you happen to have access to a pen and writing surface?
Given the following charts, statistics, and arguments based on those charts and statistics, point out the important flaws in the arguments and state what unjustified conclusions the arguers are trying to cause you to reach.
Eliezer wrote:
The thing I’m trying to get at, is not so much “a test that is required in order to be able to get any job” (as the US high school diploma accidentally became), but rather, “a test which can identify the Very Serious People”.
I would expect that Dath Ilan doesn’t have a concept of Very Serious People. That seems to be a concept focused on status instead of getting things done.
Why would you expect it to have such a concept and what kind of decisions do you think would be made based on whether someone is part of Very Serious People?
There’s more that’s going into a good test. A good test is not about whether a person learned a topic but about whether the person is likely perform well in whatever they want to do.
Dath Ilan is going to have many different tests for different jobs. There will be regular evaluations of how the test predicts performance in the job and the test will be adapted to optimize for predicting performance.
Instead of picking topic by putting in “a lot of thought” the main way will be picking topics empirically by looking at whether the knowledge predicts performance.
The only thing that does need thought is thinking about how to prevent goodharting.
That completely depends on the job. It doesn’t make sense to have a uniform test that everyone is taking given that as a society you want diversity in skills in your population.
Partly agree with your criticism of the quoted claim, but there are two things I think you should consider.
First, evaluating tests for long-term outcomes is fundamentally hard. The extent to which a 5th grade civics or math test predicts performance in policy or engineering is negligible. In fact, I would expect that the feedback from test scores in determining what a child focuses on has a far larger impact on a child’s trajectory than the object level prediction allows.
Second, standardizing tests greatly reduces cost of development, and allows larger sample sizes for validation. For either reason alone, it makes sense to use standardized tests as much as possible.
I don’t believe that 5th grade civics or math tests are a good idea. At that age you want to encourage children to learn by following their curiosity and if you teach math in a structured way you likely want to have instant computer driven feedback and not the idea that children are supposed to have a certain level of math knowledge at a certain age for which they get tested.
That’s fine, but choosing the question set to give the self-motivated children on which you provide the instant computer driven feedback is the same type of question; what is it that we want the child interested in X to learn?
Concretely, my 8 year old son likes math. He’s fine with multiplication and division, but enjoys thinking about math. If I want him to be successful applying math later in life, should I start him on knot theory, pre-algebra equation solving, adding and subtracting unlike fractions, or coding in python? I see real advantages to each of these; proof-based thinking and abstraction from concrete to theoretical ideas, more abstract thinking about and manipulation of numbers, getting ahead of what he’ll need next to continue at the math he will need to learn, or giving him other tools that will expand his ability to think and apply ideas, respectively.
I’d love feedback about which of these (or which combination of these) is most likely to ensure he’s learning the things that are useful in helping him apply math in a decade, but I can’t get useful feedback without trying it on large samples over the course of decades. Or, since I don’t live in Dath Ilan, I can use my best judgement and ask others for feedback in an ad-hoc fashion.
My case for trigonometry: We want to people understand social cycles. For example, heroin becomes fashionable among young people because it feels good. Time goes by and problems emerge with tolerance, addiction, and overdose. The next cohort of young people see what happened to aunts and uncles etc, and give heroin a miss. The cohort after that see their aunts and uncles living clean lives, lives that give no warning. They experiment and find that heroin feels good. The cycle repeats.
These cycles can arise because the fixed points of the dynamics are unstable. The classic simple example uses a second order linear differential equation as a model with a solution such as $e^{at} \sin kt$. We really want people to have some sense of cycles arising from instabilities without anyone driving them. We probably cannot give simple examples of what we mean with trigonometric functions.
I’d say that this is a better argument for calculus and PDEs than trigonometry- the sine function can be defined purely from a calculus point-of-view, and that definition is more similar to what you describe than the trigonometry perspective
Well, before we can discuss the “answers” the first step is to come up with an algorithm that measures how much a given piece of civics knowledge pays rent. Is it useful to know the nominal structure of the government (vs the ‘actual’ structure in practice)? Is it useful to know the retold tales of George Washington, victorious general and American hero? (not to mention the high probability of errors in the ‘knowledge’ given the time that has passed and the likely bias)
A reasonable argument could be made that in our form of democracy, civics knowledge is of little use to the average citizen. This is because that each of us has such an infinitesimal ‘vote’, and each person well educated in civics has their vote drowned out. If 1 in 1000 citizens are genuinely well educated in civics, and nearly all elections are decided by a margin greater than 0.1% (or this is below the noise floor for the voting machinery itself...), civics knowledge is useless.
If this hypothesis is true (not saying it is or isn’t), the culprit would be a failure of colleges and others to be accountable to a measurable cost/benefit ratio for the things that they teach. Instead of using measurable metrics they use arguments like “this knowledge is clearly worth it because of what it is” or “tradition says we have to keep teaching it”.
This is one reason why we all had to waste some of our lifespan on trigonometry and literature instead of say learning to use Python more effectively. Arguably knowing how to tell a machine to do your math for you effectively is thousands of times more valuable than useless derivations you won’t need to know unless you become a mathematician.
IMO the assumption that civics knowledge is only useful when voting, is itself a concerning failure of civics education. Above-average civics knowledge might reveal high-value opportunities such as advocacy, focussed policy submissions, talking to friends about particular policies, raising public awareness of important problems, etc.
Increasing the average level of civics knowledge is also (again, IMO) very valuable. The obvious benefits include that this disproportionately benefits good policymaking; beyond that I’d also expect volunteering to become both more common and more effective, along with improved coordination generally. Civics is basically the study of “how does our society coordinate”, after all!
Note that being a volunteer super-agitator is also not being an average citizen...
In a sane society it would be a task that an average citizen understood and could take on if necessary.
A point in defence of George Washington and literature: having a shared culture, with a common background knowledge of legends and sacred texts, is extremely important for maintaining high trust and coordination. These stories have an enormous utility as such, even if the information is not directly useful. The reason why we teach children about George Washington is not because the historical facts about him are directly, instrumentally important, but because we want to maintain the mythopoetic commons by ensuring that everyone has a common grounding in the founding myths and sacred values of the civic religion.
The same is doubtlessly true of Dath Ilan, unless part of the fiction is that humans there are psychologically very different from our own humans.
The important distinction here is between “does this provide any value” and “does this provide the most value it could, per learning-hour”.
A piece of literature, or a founding myth, can in principle provide shared cultural context, which makes interfacing with other people easier. But, first, the fact that it could provide cultural context doesn’t mean that it is doing that. And, second, being-shared-culture is a property that anything can have, by being common enough, including things that already provided value in a different way. And third—things that are independently valuable tend to do a better job of being-shared-context.
My school made everyone read The Great Gatsby. Never once in my life did I ever encounter a reference to it, or make a reference to it, or even remember that it existed at all until I just now queried my memory for “books that seemed kinda pointless to have read”.
By contrast, I’ve heard and made a fair number of references to HPMoR—because the people around me have read it, and because it is actually optimized for having lessons and analogies worth referring to.
George Washington and the American Civic Religion falls somewhere in between. It’s not Great Gatsby-level pointless, but it’s still conspicuously unoptimized.
Part of teaching people about George Washington is teaching people that the US consitution is more then a piece of paper but sacred. Shared understanding of the US constitution as sacred does help with coordination.
The problem with optimizing teaching sacredness is that it in itself makes the teaching less sacred.
No, perceived sacredness isn’t only a byproduct; it’s something that’s actually pretty easy to optimize for directly. See for example Petrov Day or Winter Solstice.
I take your points but your arguments are in the form of:
a. “this knowledge is clearly worth it because of what it is”
Instead of showing a mathematical estimate of rent paid you argue that “having a shared culture is required for high trust and coordination and mythopoetic commons”. But you don’t have a measurement of this. (and one may not exist yet I am just explaining the flaw in your argument)
b. Similarly by saying “reason why we teach children” you are implicitly just saying “tradition says we have to keep teaching it” by referring to something that has been taught over and over.
One note is there are highly successful individuals and entire nations who know absolutely none of any of these specific bits of knowledge regarding American civics. That is a pretty strong argument for it being non-essential, and possibly not paying any rent.
Note that knowledge of the characters and superpowers of the MCU is also a way to gain access to a “mythopoetic commons”. Yet I think we can both agree that knowing about the MCU, no matter how cool, doesn’t pay rent for almost all of us.
No, it’s a week argument. A nation benefits from common knowledge of it’s own history to have a sense of patriotism and sacredness of the shared societal order. Knowing about civics of other countries has much less value to society.
The fact that you can be successful by defecting in prisoner dilema also suggests that while individuals who don’t focus on coordinating well might be successful their society is still worse off for it.
How do you propose that one could measure this? Do you have a counter-example of a highly coordinated society that DOESN’T have a shared mythic/legendary/literary canon? I reject the implication that I have to have a quantitative measure on hand in order to suggest that something is valuable.
Irrelevant. I am not suggesting that American civics are important to every society ever, but that they’re important to American society, precisely because these legends are part of what give American society its American character. (If you aren’t American, feel free to substitute the founding legends of your own country.)
I agree, but I don’t see how this is a counter-argument. The exact content of your society’s legendarium is always to some degree arbitrary (though it certainly has downstream effects), but its arbitrariness doesn’t prevent it from functioning as cultural glue.
Is not “enables socialization” a form of rent?
For the first paragraph, I acknowledge that measurement is difficult.
For the second bit: my definition of “pays rent”: a fact or algorithm that enables a human being to make a decision that has a higher expected value. Information that doesn’t pay rent is by definition worthless. (as an aside in most contexts, “trivia” knowledge doesn’t pay any rent)
For the third bit: you haven’t taken it far enough. If a living being needs to ‘socialize’ to survive, the facts/technique that give them the greatest chance of socializing successfully with the subgroup of humans who are most valuable is the highest paying rent item. All knowledge even for the purpose of a shared “mythopoetic” base is not equal.
While I acknowledge that I don’t have a way to directly measure this, likely knowledge of popular TV shows or bands is more useful to most humans than knowledge of civics.
Which is rather interesting, because this actually has some explaining power.
In favor of this particular point, I know about the MCU despite disliking superhero movies and comics (except Watchmen) precisely because it is helpful in my social circles.
Regarding @jaspax’s main point, it is not obvious that formal education is necessary to generate a shared mythopoetic structure. OTOH I can’t think of an example of a long-lasting one that does not have a group actively involved in educating people about it. So, it is not obvious that it is a poor candidate for formal education either.
Suppose it were possible to write down a list of every fact or algorithm known by a living human being. This isn’t impossible, if you could use an AI system to translate audio recordings of someone’s entire life to text, and everything they ever read, you would have it in a file. Then you would map from [text] to [common fact or algorithm] by comparing thousands of these files and to fact or algorithm written in sources like encyclopedias. Or more likely you would find commonality with a clustering algorithm.
The “knowledge that pays the most rent” is either the most common fact or algorithm known by all humans, or the most common one that separates successful humans from failures. (if there is a measurable difference)
Then an education system adds the most value by teaching the or algorithms in order of greatest value to least value, or reordering in complexity tiers, and in each tier teaching the most value elements first until the time allotted for education is over.
If civics has any value at all, this algorithm would find it. (though as described it is subject agnostic)
This is how you get Latin courses.
Because higher class humans know it and so it would “differentiate” them from less successful humans? Even though it is likely not causative? Yeah you would need some method to detect causation.
It’s worse than that. If everyone is accurately judging which information is valuable and studying things in descending order of priority, then what distinguishes successful and unsuccessful people is that the successful ones got further down the list. So if you compare, the differences will be things that people explicitly judged to be low priority. Ie, Latin classes.
Again this requires Latin classes to pay rent. That is for there to be a measurable difference in personal success not attributed to inherited opportunities and resources. And not found in people who skipped Latin but did everything else.
I think specifically they’re getting at that the “steady state” isn’t stable. As soon as everyone uses the algorithm uniformly it falls apart in a few generations tops. You’d have to never stop A/B testing the importance of various subjects; your control group for “do/don’t teach this subject” could never shrink all the way to zero for roughly the same reason that bayesian probability updates don’t work on probabilities 0 and 1.
It sounds like you’re counting on natural human variation to temper that, but to the extent the alogithm actually worked with large effect size it’s not clear that’d be sufficent. Undeniably good ideas do have a way of eventually getting fixed in a population.
It’s just a thought experiment. It is improbable to ever come up. Once we have a way to create lists of all the things humans should know in order to be the best possible humans, well...
An AI system (in the 20-100 years when we can do this) could probably consume the same list. And ‘think with’ massive chunks of digital machinery that are very close to error-free, don’t need to sleep, don’t age, don’t have agendas that aren’t inherent in their utility function, and run at 4-5ghz instead of 1 khz. And learn in parallel from thousands or millions of instances of themselves. And their algorithms can be structured where they objectively “know if they know” and “know if they don’t know” what to do to achieve the desired outcome. (so they don’t act at all if they don’t know).
Anyways with all these advantages your AI software has to be almost infinitely stupid to not beat humans at any task where it is possible to score whether or not you have succeeded.
(‘scoring’ generally requires a way to simulate the task, both the actor and the environment, and give the machine many tries to find a way to get a good score. So essentially any task on earth where it is possible for a computer to accurately and mostly completely determine if the task criterion was met or not. All tasks that involve manipulating some physical object with reasonably modelable properties fall into this class)
The tasks that don’t fit are ones with humans as either the direct target or appealing to aspects only humans can perceive. Aka art, politics, teaching, counseling, cleaning tasks where the thing being cleaned a robot can’t perceive (such as bad smells), landscaping design, architecture for appearances (but not structural or electrical or plumbing), and so on.
May want a further filter, to look specifically for facts/algorithms that people know because they received explicit instruction or training (or some measure of knowing it better and more deeply because of explicit instruction)
Otherwise you end up duplicating things that people were already learning informally. Potentially taking those things into the “ownership” of formal teaching and convincing people you need to be taught them in a classroom for it to count.