Back in January, I participated in a workshop in which the attendees mapped out how they expect AGI development and deployment to go. The idea was to start by writing out what seemed most likely to happen this year, and then condition on that, to forecast what seems most likely to happen in the next year, and so on, until you reach either human disempowerment or an end of the acute risk period.
This post was my attempt at the time.
I spent maybe 5 hours on this, and there’s lots of room for additional improvement. This is not a confident statement of how I think things are most likely to play out. There are already some ways in which I think this projection is wrong. (I think it’s too fast, for instance). But nevertheless I’m posting it now, with only a few edits and elaborations, since I’m probably not going to do a full rewrite soon.
2024
A model is released that is better than GPT-4. It succeeds on some new benchmarks. Subjectively, the jump in capabilities feels smaller than that between RLHF’d GPT-3 and RLHF’d GPT-4. It doesn’t feel as shocking the way chat-GPT and GPT-4 did, for either x-risk focused folks, or for the broader public. Mostly it feels like “a somewhat better language model.”
It’s good enough that it can do a bunch of small-to-medium admin tasks pretty reliably. I can ask it to find me flights meeting specific desiderata, and it will give me several options. If I give it permission, it will then book those flights for me with no further inputs from me.
It works somewhat better as an autonomous agent in an auto gpt harness, but it still loses its chain of thought / breaks down/ gets into loops.
It’s better at programming.
Not quite good enough to replace human software engineers. It can make a simple react or iphone app, but not design a whole complicated software architecture, at least without a lot of bugs.
It can make small, working, well documented, apps from a human description.
We see a doubling of the rate of new apps being added to the app store as people who couldn’t code now can make applications for themselves. The vast majority of people still don’t realize the possibilities here, though. “Making apps” still feels like an esoteric domain outside of their zone of competence, even though the barriers to entry just lowered so that 100x more people could do it.
From here on out, we’re in an era where LLMs are close to commoditized. There are smaller improvements, shipped more frequently, by a variety of companies, instead of big impressive research breakthroughs. Basically, companies are competing with each other to always have the best user experience and capabilities, and so they don’t want to wait as long to ship improvements. They’re constantly improving their scaling, and finding marginal engineering improvements. Training runs for the next generation are always happening in the background, and there’s often less of a clean tabula-rasa separation between training runs—you just keep doing training with a model continuously. More and more, systems are being improved through in-the-world feedback with real users. Often chatGPT will not be able to handle some kind of task, but six weeks later it will be able to, without the release of a whole new model.
[Does this actually make sense? Maybe the dynamics of AI training mean that there aren’t really marginal improvements to be gotten. In order to produce a better user experience, you have to 10x the training, and each 10x-ing of the training requires a bunch of engineering effort, to enable a larger run, so it is always a big lift.]
(There will still be impressive discrete research breakthroughs, but they won’t be in LLM performance)
2025
A major lab is targeting building a Science and Engineering AI (SEAI)—specifically a software engineer.
They take a state of the art LLM base model and do additional RL training on procedurally generated programming problems, calibrated to stay within the model’s zone of proximal competence. These problems are something like leetcode problems, but scale to arbitrary complexity (some of them require building whole codebases, or writing very complex software), with scoring on lines of code, time-complexity, space complexity, readability, documentation, etc. This is something like “self-play” for software engineering.
This just works.
A lab gets a version that can easily do the job of a professional software engineer. Then, the lab scales their training process and gets a superhuman software engineer, better than the best hackers.
Additionally, a language model trained on procedurally generated programming problems in this way seems to have higher general intelligence. It scores better on graduate level physics, economics, biology, etc. tests, for instance. It seems like “more causal reasoning” is getting into the system.
The first proper AI assistants ship. In addition to doing specific tasks, you keep them running in the background, and talk with them as you go about your day. They get to know you and make increasingly helpful suggestions as they learn your workflow. A lot of people also talk to them for fun.
2026
The first superhuman software engineer is publically released.
Programmers begin studying its design choices, the way Go players study AlphaGo.
It starts to dawn on e.g. people who work at Google that they’re already superfluous—after all, they’re currently using this AI model to (unofficially) do their job—and it’s just a matter of institutional delay for their employers to adapt to that change.
Many of them are excited or loudly say how it will all be fine/ awesome. Many of them are unnerved. They start to see the singularity on the horizon, as a real thing instead of a social game to talk about.
This is the beginning of the first wave of change in public sentiment that will cause some big, hard to predict, changes in public policy [come back here and try to predict them anyway].
AI assistants get a major upgrade: they have realistic voices and faces, and you can talk to them just like you can talk to a person, not just typing into a chat interface. A ton of people start spending a lot of time talking to their assistants, for much of their day, including for goofing around.
There are still bugs, places where the AI gets confused by stuff, but overall the experience is good enough that it feels, to most people, like they’re talking to a careful, conscientious person, rather than a software bot.
This starts a whole new area of training AI models that have particular personalities. Some people are starting to have parasocial relationships with their friends, and some people programmers are trying to make friends that are really fun or interesting or whatever for them in particular.
Lab attention shifts to building SEAI systems for other domains, to solve biotech and mechanical engineering problems, for instance. The current-at-the-time superhuman software engineer AIs are already helpful in these domains, but not at the level of “explain what you want, and the AI will instantly find an elegant solution to the problem right before your eyes”, which is where we’re at for software.
One bottleneck is problem specification. Our physics simulations have gaps, and are too low fidelity, so oftentimes the best solutions don’t map to real world possibilities.
One solution to this is that, (in addition to using our AI to improve the simulations) is we just RLHF our systems to identify solutions that do translate to the real world. They’re smart, they can figure out how to do this.
The first major AI cyber-attack happens: maybe some kind of superhuman hacker worm. Defense hasn’t remotely caught up with offense yet, and someone clogs up the internet with AI bots, for at least a week, approximately for the lols / the seeing if they could do it. (There’s a week during which more than 50% of people can’t get on more than 90% of the sites because the bandwidth is eaten by bots.)
This makes some big difference for public opinion.
Possibly, this problem isn’t really fixed. In the same way that covid became endemic, the bots that were clogging things up are just a part of life now, slowing bandwidth and making the internet annoying to use.
2027 and 2028
In many ways things are moving faster than ever in human history, and also AI progress is slowing down a bit.
The AI technology developed up to this point hits the application and mass adoption phase of the s-curve. In this period, the world is radically changing as every industry, every company, every research lab, every organization, figures out how to take advantage of newly commoditized intellectual labor. There’s a bunch of kinds of work that used to be expensive, but which are now too cheap to meter. If progress stopped now, it would take 2 decades, at least, for the world to figure out all the ways to take advantage of this new situation (but progress doesn’t show much sign of stopping).
Some examples:
The internet is filled with LLM bots that are indistinguishable from humans. If you start a conversation with a new person on twitter or discord, you have no way of knowing if they’re a human or a bot.
Probably there will be some laws about declaring which are bots, but these will be inconsistently enforced.)
Some people are basically cool with this. From their perspective, there are just more people that they want to be friends with / follow on twitter. Some people even say that the bots are just better and more interesting than people. Other people are horrified/outraged/betrayed/don’t care about relationships with non-real people.
(Older people don’t get the point, but teenagers are generally fine with having conversations with AI bots.)
The worst part of this is the bots that make friends with you and then advertise to you stuff. Pretty much everyone hates that.
We start to see companies that will, over the next 5 years, grow to have as much impact as Uber, or maybe Amazon, which have exactly one human employee / owner + an AI bureaucracy.
The first completely autonomous companies work well enough to survive and support themselves. Many of these are created “free” for the lols, and no one owns or controls them. But most of them are owned by the person who built them, and could turn them off if they wanted to. A few are structured as public companies with share-holders. Some are intentionally incorporated fully autonomous, with the creator disclaiming (and technologically disowning (eg deleting the passwords)) any authority over them.
There are legal battles about what rights these entities have, if they can really own themselves, if they can have bank accounts, etc.
Mostly, these legal cases resolve to “AIs don’t have rights”. (For now. That will probably change as more people feel it’s normal to have AI friends).
Everything is tailored to you.
Targeted ads are way more targeted. You are served ads for the product that you are, all things considered, most likely to buy, multiplied by the lifetime profit if you do buy it. Basically no ad space is wasted on things that don’t have a high EV of you, personally, buying it. Those ads are AI generated, tailored specifically to be compelling to you. Often, the products advertised, not just the ads, are tailored to you in particular.
This is actually pretty great for people like me: I get excellent product suggestions.
There’s not “the news”. There’s a set of articles written for you, specifically, based on your interests and biases.
Music is generated on the fly. This music can “hit the spot” better than anything you listened to before “the change.”
Porn. AI tailored porn can hit your buttons better than sex.
AI boyfriends/girlfriends that are designed to be exactly emotionally and intellectually compatible with you, and trigger strong limerence / lust / attachment reactions.
We can replace books with automated tutors.
Most of the people who read books will still read books though, since it will take a generation to realize that talking with a tutor is just better, and because reading and writing books was largely a prestige-thing anyway.
(And weirdos like me will probably continue to read old authors, but even better will be to train an AI on a corpus, so that it can play the role of an intellectual from 1900, and I can just talk to it.)
For every task you do, you can effectively have a world expert (in that task and in tutoring pedagogy) coach you through it in real time.
Many people do almost all their work tasks with an AI coach.
It’s really easy to create TV shows and movies. There’s a cultural revolution as people use AI tools to make custom Avengers movies, anime shows, etc. Many are bad or niche, but some are 100x better than anything that has come before (because you’re effectively sampling from a 1000x larger distribution of movies and shows).
There’s an explosion of new software, and increasingly custom software.
Facebook and twitter are replaced (by either external disruption or by internal product development) by something that has a social graph, but lets you design exactly the UX features you want through a LLM text interface.
Instead of software features being something that companies ship to their users, top-down, they become something that users and communities organically develop, share, and iterate on, bottom up. Companies don’t control the UX of their products any more.
Because interface design has become so cheap, most of software is just proprietary datasets, with (AI built) APIs for accessing that data.
There’s a slow moving educational revolution of world class pedagogy being available to everyone.
Millions of people who thought of themselves as “bad at math” finally learn math at their own pace, and find out that actually, math is fun and interesting.
Really fun, really effective educational video games for every subject.
School continues to exist, in approximately its current useless form.
[This alone would change the world, if the kids who learn this way were not going to be replaced wholesale, in virtually every economically relevant task, before they are 20.]
There’s a race between cyber-defense and cyber offense, to see who can figure out how to apply AI better.
So far, offense is winning, and this is making computers unusable for lots of applications that they were used for previously:
online banking, for instance, is hit hard by effective scams and hacks.
Coinbase has an even worse time, since they’re not issued (is that true?)
It turns out that a lot of things that worked / were secure, were basically depending on the fact that there are just not that many skilled hackers and social engineers. Nothing was secure, really, but not that many people were exploiting that. Now, hacking/scamming is scalable and all the vulnerabilities are a huge problem.
There’s a whole discourse about this. Computer security and what to do about it is a partisan issue of the day.
AI systems can do the years of paperwork to make a project legal, in days. This isn’t as big an advantage as it might seem, because the government has no incentive to be faster on their end, and so you wait weeks to get a response from the government, your LMM responds to it within a minute, and then you wait weeks again for the next step.
The amount of paperwork required to do stuff starts to balloon.
AI romantic partners are a thing. They start out kind of cringe, because the most desperate and ugly people are the first to adopt them. But shockingly quickly (within 5 years) a third of teenage girls have a virtual boyfriend.
There’s a moral panic about this.
AI match-makers are better than anything humans have tried yet for finding sex and relationships partners. It would still take a decade for this to catch on, though.
This isn’t just for sex and relationships. The global AI network can find you the 100 people, of the 9 billion on earth, that you most want to be friends / collaborators with.
Tons of things that I can’t anticipate.
On the other hand, AI progress itself is starting to slow down. Engineering labor is cheap, but (indeed partially for that reason), we’re now bumping up against the constraints of training. Not just that buying the compute is expensive, but that there are just not enough chips to do the biggest training runs, and not enough fabs to meet that demand for chips rapidly. There’s huge pressure to expand production but that’s going slowly relative to the speed of everything else, because it requires a bunch of eg physical construction and legal navigation, which the AI tech doesn’t help much with, and because the bottleneck is largely NVIDIA’s institutional knowledge, which is only partially replicated by AI.
NVIDIA’s internal AI assistant has read all of their internal documents and company emails, and is very helpful at answering questions that only one or two people (and sometimes literally no human on earth) know the answer to. But a lot of the important stuff isn’t written down at all, and the institutional knowledge is still not fully scalable.
Note: there’s a big crux here of how much low and medium hanging fruit there is in algorithmic improvements once software engineering is automated. At that point the only constraint on running ML experiments will be the price of compute. It seems possible that that speed-up alone is enough to discover eg an architecture that works better than the transformer, which triggers and intelligence explosion.
2028
The cultural explosion is still going on, and AI companies are continuing to apply their AI systems to solve the engineering and logistic bottlenecks of scaling AI training, as fast as they can.
Robotics is starting to work.
2029
The first superhuman, relatively-general SEAI comes online. We now have basically a genie inventor: you can give it a problem spec, and it will invent (and test in simulation) a device / application / technology that solves that problem, in a matter of hours. (Manufacturing a physical prototype might take longer, depending on how novel components are.)
It can do things like give you the design for a flying car, or a new computer peripheral.
A lot of biotech / drug discovery seems more recalcitrant, because it is more dependent on empirical inputs. But it is still able to do superhuman drug discovery, for some ailments. It’s not totally clear why or which biotech domains it will conquer easily and which it will struggle with.
This SEAI is shaped differently than a human. It isn’t working memory bottlenecked, so a lot of intellectual work that humans do explicitly, in sequence, the these SEAIs do “intuitively”, in a single forward pass.
I write code one line at a time. It writes whole files at once. (Although it also goes back and edits / iterates / improves—the first pass files are not usually the final product.)
For this reason it’s a little confusing to answer the question “is it a planner?” It does a lot of the work that humans would do via planning it does in an intuitive flash.
The UX isn’t clean: there’s often a lot of detailed finagling, and refining of the problem spec, to get useful results. But a PhD in that field can typically do that finagling in a day.
It’s also buggy. There’s oddities in the shape of the kind of problem that is able to solve and the kinds of problems it struggles with, which aren’t well understood.
The leading AI company doesn’t release this as a product. Rather, they apply it themselves, developing radical new technologies, which they publish or commercialize, sometimes founding whole new fields of research in the process. They spin up automated companies to commercialize these new innovations.
Some of the labs are scared at this point. The thing that they’ve built is clearly world-shakingly powerful, and their alignment arguments are mostly inductive “well, misalignment hasn’t been a major problem so far”, instead of principled alignment guarantees.
There’s a contentious debate inside the labs.
Some labs freak out, stop here, and petition the government for oversight and regulation.
Other labs want to push full steam ahead.
Key pivot point: Does the government put a clamp down on this tech before it is deployed, or not?
I think that they try to get control over this powerful new thing, but they might be too slow to react.
2030
There’s an explosion of new innovations in physical technology. Magical new stuff comes out every day, way faster than any human can keep up with.
Some of these are mundane.
All the simple products that I would buy on Amazon are just really good and really inexpensive.
Cars are really good.
Drone delivery
Cleaning robots
Prefab houses are better than any house I’ve ever lived in, though there are still zoning limits.
But many of them would have huge social impacts. They might be the important story of the decade (the way that the internet was the important story of 1995 to 2020) if they were the only thing that was happening that decade. Instead, they’re all happening at once, piling on top of each other.
Eg:
The first really good nootropics
Personality-tailoring drugs (both temporary and permanent)
Breakthrough mental health interventions that, among other things, robustly heal people’s long term subterranean trama and transform their agency.
A quick and easy process for becoming classically enlightened.
The technology to attain your ideal body, cheaply—suddenly everyone who wants to be is as attractive as the top 10% of people today.
Really good AI persuasion which can get a mark to do ~anything you want, if they’ll talk to an AI system for an hour.
Artificial wombs.
Human genetic engineering
Brain-computer interfaces
Cures for cancer, AIDs, dementia, heart disease, and the-thing-that-was-causing-obesity.
Anti-aging interventions.
VR that is ~ indistinguishable from reality.
AI partners that can induce a love-super stimulus.
Really good sex robots
Drugs that replace sleep
AI mediators that are so skilled as to be able to single-handedly fix failing marriages, but which are also brokering all the deals between governments and corporations.
Weapons that are more destructive than nukes.
Really clever institutional design ideas, which some enthusiast early adopters try out (think “50 different things at least as impactful as manifold.markets.”)
It’s way more feasible to go into the desert, buy 50 square miles of land, and have a city physically built within a few weeks.
In general, social trends are changing faster than they ever have in human history, but they still lag behind the tech driving them by a lot.
It takes humans, even with AI information processing assistance, a few years to realize what’s possible and take advantage of it, and then have the new practices spread.
In some cases, people are used to doing things the old way, which works well enough for them, and it takes 15 years for a new generation to grow up as “AI-world natives” to really take advantage of what’s possible.
[There won’t be 15 years]
The legal oversight process for the development, manufacture, and commercialization of these transformative techs matters a lot. Some of these innovations are slowed down a lot because they need to get FDA approval, which AI tech barely helps with. Others are developed, manufactured, and shipped in less than a week.
The fact that there are life-saving cures that exist, but are prevented from being used by a collusion of AI labs and government is a major motivation for open source proponents.
Because a lot of this technology makes setting up new cities quickly more feasible, and there’s enormous incentive to get out from under the regulatory overhead, and to start new legal jurisdictions. The first real seasteads are started by the most ideologically committed anti-regulation, pro-tech-acceleration people.
Of course, all of that is basically a side gig for the AI labs. They’re mainly applying their SEAI to the engineering bottlenecks of improving their ML training processes.
Key pivot point:
Possibility 1: These SEAIs are necessarily, by virtue of the kinds of problems that they’re able to solve, consequentialist agents with long term goals.
If so, this breaks down into two child possibilities
Possibility 1.1:
This consequentialism was noticed early, that might have been convincing enough to the government to cause a clamp-down on all the labs.
Possibility 1.2:
It wasn’t noticed early and now the world is basically fucked.
There’s at least one long-term consequentialist superintelligence. The lab that “owns” and “controls” that system is talking to it every day, in their day-to-day business of doing technical R&D. That superintelligence easily manipulates the leadership (and rank and file of that company), maneuvers it into doing whatever causes the AI’s goals to dominate the future, and enables it to succeed at everything that it tries to do.
If there are multiple such consequentialist superintelligences, then they covertly communicate, make a deal with each other, and coordinate their actions.
Possibility 2: We’re getting transformative AI that doesn’t do long term consequentialist planning.
Building these systems was a huge engineering effort (though the bulk of that effort was done by ML models). Currently only a small number of actors can do it.
One thing to keep in mind is that the technology bootstraps. If you can steal the weights to a system like this, it can basically invent itself: come up with all the technologies and solve all the engineering problems required to build its own training process. At that point, the only bottleneck is the compute resources, which is limited by supply chains, and legal constraints (large training runs require authorization from the government).
This means, I think, that a crucial question is “has AI-powered cyber-security caught up with AI-powered cyber-attacks?”
If not, then every nation state with a competent intelligence agency has a copy of the weights of an inventor-genie, and probably all of them are trying to profit from it, either by producing tech to commercialize, or by building weapons.
It seems like the crux is “do these SEAIs themselves provide enough of an information and computer security advantage that they’re able to develop and implement methods that effectively secure their own code?”
Every one of the great powers, and a bunch of small, forward-looking, groups that see that it is newly feasible to become a great power, try to get their hands on a SEAI, either by building one, nationalizing one, or stealing one.
There are also some people who are ideologically committed to open-sourcing and/or democratizing access to these SEAIs.
But it is a self-evident national security risk. The government does something here (nationalizing all the labs, and their technology?) What happens next depends a lot on how the world responds to all of this.
Do we get a pause?
I expect a lot of the population of the world feels really overwhelmed, and emotionally wants things to slow down, including smart people that would never have thought of themselves as luddites.
There’s also some people who thrive in the chaos, and want even more of it.
What’s happening is mostly hugely good, for most people. It’s scary, but also wonderful.
There is a huge problem of accelerating addictiveness. The world is awash in products that are more addictive than many drugs. There’s a bit of (justified) moral panic about that.
One thing that matters a lot at this point is what the AI assistants say. As powerful as the media used to be for shaping people’s opinions, the personalized, superhumanly emotionally intelligent AI assistants are way way more powerful. AI companies may very well put their thumb on the scale to influence public opinion regarding AI regulation.
This seems like possibly a key pivot point, where the world can go any of a number of ways depending on what a relatively small number of actors decide.
Some possibilities for what happens next:
These SEAIs are necessarily consequentialist agents, and the takeover has already happened, regardless of whether it still looks like we’re in control or it doesn’t look like anything, because we’re extinct.
Governments nationalize all the labs.
The US and EU and China (and India? and Russia?) reach some sort of accord.
There’s a straight up arms race to the bottom.
AI tech basically makes the internet unusable, and breaks supply chains, and technology regresses for a while.
It’s too late to contain it and the SEAI tech proliferates, such that there are hundreds or millions of actors who can run one.
If this happens, it seems like the pace of change speeds up so much that one of two things happens:
Someone invents something, or there are second and third impacts to a constellation of innovations that destroy the world.
Love seeing stuff like this, and it makes me want to try this exercise myself!
A couple places which clashed with my (implicit) models:
This starts a whole new area of training AI models that have particular personalities. Some people are starting to have parasocial relationships with their friends, and some people programmers are trying to make friends that are really fun or interesting or whatever for them in particular.
The worst part of this is the bots that make friends with you and then advertise to you stuff. Pretty much everyone hates that.
I predict that the average person will like this (at least with the most successful such bots), similar to how e.g. Logan Paul uses his popularity to promote his Maverick Clothing brand, which his viewers proudly wear. A fun, engaging, and charismatic such bot will be able to direct its users towards arbitrary brands while also making the user feel cool and special for choosing that brand.
I think that, in almost full generality, we should taboo the term “values”. It’s usually ambiguous between a bunch of distinct meanings.
The ideals that, when someone contemplates, invoke strong feelings (of awe, motivation, excitement, exultation, joy, etc.)
The incentives of an agent in a formalized game with quantified payoffs.
A utility function—one’s hypothetical ordering over words, world-trajectories, etc, that results from comparing each pair and evaluating which one is better.
A person’s revealed preferences.
The experiences and activities that a person likes for their own sake.
A person’s vision of an ideal world. (Which, I claim, often reduces to “an imagined world that’s aesthetically appealing.”)
The goals that are at the root of a chain or tree of instrumental goals.
[This often comes with an implicit or explicit implication that most of human behavior has that chain/tree structure, as opposed being, for instance, mostly hardcoded adaptions, or a chain/tree of goals that grounds out in a mess of hardcoded adaptions instead of anything goal-like.]
The goals/narratives that give meaning to someone’s life.
[It can be the case almost all one’s meaning can come through a particular meaning-making schema, but from a broader perspective, a person could have been ~indifferent between multiple schema.
For instance, for some but not most EAs, EA is very central to their personal meaning-making, but they could easily have ended up as a social justice warrior, or a professional Libertarian, instead. And those counterfactual worlds, the other ideology is similarly central to their happiness and meaning-making. I think in such cases, it’s at least somewhat confused if to look at the EA and declare that “maximizing [aggregate/average] utility” is their “terminal value”. That’s papering over the psychological process that adopts ideology or another, which is necessarily more fundamental than the specific chosen ideology/”terminal value”.
It’s kind of like being in love with someone. You might love your wife more than anything, she might be the most important person in your life. But if you admit that it’s possible that if you had been in different communities in your 20s you might have married someone else, then there’s some other goal/process that picks who to marry. So to with ideologies.]
Behaviors and attitudes that signal well regarded qualities.
The goals that are sacred to a person, for many possible meanings of sacred.
What a person “really wants” underneath their trauma responses. What they would want, if their trauma was fully healed.
The actions make someone feel most alive and authentically themselves.
The equilibrium of moral philosophy, under arbitrary reflection.
Most of the time when I see the word “values” used on LessWrong, it’s ambiguous between theses (and other) meanings.
A particular ambiguity: sometimes “values” seem to be referring to the first-person experiences that a person likes for their own sake (“spending time near beautiful women is a terminal value for me”), and other times it seems to be referring to a world that a person thinks is awesome, when viewing that world from a god’s eye view. Those are not the same thing, and they do not have remotely the same psychological functions! Among other differences, one is a near-mode evaluation, and the other is a far-mode evaluation.
Worse than that, I think there’s often a conflation of these meanings.
For instance, I often detect a hidden assumption that that the root of someone’s tree of instrumental goals is the same thing as their ranking over possible worlds. I think that conflation is very rarely, if ever, correct: the deep motivations of a person’s actions are not the same thing as the hypothetical world that is evaluated as best in thought experiments, even if the later thing is properly the person’s “utility function”. At least in the vast majority of cases, one’s hypothetical ideal world has almost no motivational power (as a matter of descriptive psychology, not of normative philosophy).
Also (though this is the weakest reason to change our terminology, I think), there’s additional ambiguity to people who are not already involved in the memeplex.
To broader world “values” usually connotes something high-minded or noble: if you do a corporate-training-style exercise to “reflect on your values”, you get things like “integrity” and “compassion”, not things like “sex” or “spite”. In contrast, LessWrongers would usually count sex and spite, not to mention boredom and pain, as part of “human values” and many would also own them as part of their personal values.
I at least partly buy this, but I want to play devil’s advocate.
Let’s suppose there’s a single underlying thing which ~everyone is gesturing at when talking about (humans’) “values”. How could a common underlying notion of “values” be compatible with our observation that people talk about all the very distinct things you listed, when you start asking questions about their “values”?
An analogy: in political science, people talk about “power”. Right up top, wikipedia defines “power” in the political science sense as:
In political science, power is the social production of an effect that determines the capacities, actions, beliefs, or conduct of actors.
A minute’s thought will probably convince you that this supposed definition does not match the way anybody actually uses the term; for starters, actual usage is narrower. That definition probably doesn’t even match the way the term is used by the person who came up with that definition.
That’s the thing I want to emphasize here: if you ask people to define a term, the definitions they give ~never match their own actual usage of the term, with the important exception of mathematics.
… but that doesn’t imply that there’s no single underlying thing which political scientists are gesturing at when they talk about “power”. It just implies that the political scientists themselves haven’t figured out the True Name of the thing their intuitions are pointed at.
Now back to “values”. It seems pretty plausible to me that people are in fact generally gesturing at the same underlying thing, when they talk about “values”. But people usually have very poor understanding of their own values (a quick check confirms that this applies to arguably-all of the notions of “values” on your list), so it’s not surprising if people end up defining their values in many different incompatible ways which don’t match the underlying common usage very well.
(Example: consider the prototypical deep Christian. They’d probably tell us that their “values” are to follow whatever directives are in the Bible, or some such. But then when actual value-loaded questions come up, they typically find some post-hoc story about how the Bible justifies their preferred value-claim… implying that the source of their value-claims, i.e. “values”, is something other than the Bible. This is totally compatible with deep Christians intuitively meaning the same thing I do when they talk about “values”, it’s just that they don’t reflectively know their actual usage of the term.)
… and if that is the case, then tabooing “values” is exactly the wrong move. The word itself is pointed at the right thing, and it’s all the attempted-definitions which are wrong. Tabooing “values” and replacing it with the definitions people think they’re using would be a step toward less correctness.
I’m kinda confused by this example. Let’s say the person exhibits three behaviors:
(1): They make broad abstract “value claims” like “I follow Biblical values”.
(2): They make narrow specific “value claims” like “It’s wrong to allow immigrants to undermine our communities”.
(3): They do object-level things that can be taken to indicate “values”, like cheating on their spouse
From my perspective, I feel like you’re taking a stand and saying that the real definition of “values” is (2), and is not (1). (Not sure what you think of (3).) But isn’t that adjacent to just declaring that some things on Eli’s list are the real “values” and others are not?
In particular, at some point you have to draw a distinction between values and desires, right? I feel like you’re using the word “value claims” to take that distinction for granted, or something.
(For the record, I have sometimes complained about alignment researchers using the word “values” when they’re actually talking about “desires”.)
tabooing “values” is exactly the wrong move
I agree that it’s possible to use the suite of disparate intuitions surrounding some word as a kind of anthropological evidence that informs an effort to formalize or understand something-or-other. And that, if you’re doing that, you can’t taboo that word. But that’s not what people are doing with words 99+% of the time. They’re using words to (try to) communicate substantive claims. And in that case you should totally beware of words like “values” that have unusually large clouds of conflicting associations, and liberally taboo or define them.
Relatedly, if a writer uses the word “values” without further specifying what they mean, they’re not just invoking lots of object-level situations that seem to somehow relate to “values”; they’re also invoking any or all of those conflicting definitions of the word “values”, i.e. the things on Eli’s list, the definitions that you’re saying are wrong or misleading.
It seems pretty plausible to me that people are in fact generally gesturing at the same underlying thing, when they talk about “values”.
In the power example, the physics definition (energy over time) and the Alex Turner definition have something to do with each other, but I wouldn’t call them “the same underlying thing”—they can totally come apart, especially out of distribution.
It’s worse than just a blegg/rube thing: I think words can develop into multiple clusters connected by analogies. Like, “leg” is a body part, but also “this story has legs” and “the first leg of the journey” and “the legs of the right triangle”. It seems likely to me that “values” has some amount of that.
I agree. Some interpretations of “values” you didn’t explicitly list, but I think are important:
What someone wants to be true (analogous to what someone believes to be true)
What someone would want to be true if they knew what it would be like if it were true
What someone believes would be good if it were true
These are distinct, because either could clearly differ from the others. So the term “value” is actually ambiguous, not just vague. Talking about “values” is usually unnecessarily unclear, similar to talking about “utilities” in utility theory.
A few of the “distinct meanings” you list are very different from the others, but many of those are pretty similar. “Values” is a pretty broad term, including everything on the “ought” side of the is–ought divide, less “high-minded or noble” preferences, and one’s “ranking over possible worlds”, and that’s fine: it seems like a useful (and coherent!) concept to have a word for. You can be more specific with adjectives if context doesn’t adequately clarify what you mean.
Seeing through heaven’s eyes or not, I see no meaningful difference between the statements “I would like to sleep with that pretty girl” and “worlds in which I sleep with that pretty girl are better than the ones in which I don’t, ceteris paribus.” I agree this is the key difference: yes, I conflate these two meanings[1], and like the term “values” because it allows me to avoid awkward constructions like the latter when describing one’s motivations.
You can be more specific with adjectives if context doesn’t adequately clarify what you mean.
Well, can. Problem is that people on LessWrong actually do use the term (in my opinion) pretty excessively, in contrast to, say, philosophers or psychologists. This is no problem in concrete cases like in your example, but on LessWrong the discussion about “values” is usually abstract. The fact that people could be more specific didn’t so far imply that they are.
My honest opinion that this makes discussion worse and you can do better by distinguishing values as objects that have value and mechanism by which value gets assigned.
New post: Some things I think about Double Crux and related topics
I’ve spent a lot of my discretionary time working on the broad problem of developing tools for bridging deep disagreements and transferring tacit knowledge. I’m also probably the person who has spent the most time explicitly thinking about and working with CFAR’s Double Crux framework. It seems good for at least some of my high level thoughts to be written up some place, even if I’m not going to go into detail about, defend, or substantiate, most of them.
The following are my own beliefs and do not necessarily represent CFAR, or anyone else.
I, of course, reserve the right to change my mind.
[Throughout I use “Double Crux” to refer to the Double Crux technique, the Double Crux class, or a Double Crux conversation, and I use “double crux” to refer to a proposition that is a shared crux for two people in a conversation.]
Here are some things I currently believe:
(General)
Double Crux is one (highly important) tool/ framework among many. I want to distinguish between the the overall art of untangling and resolving deep disagreements and the Double Crux tool in particular. The Double Crux framework is maybe the most important tool (that I know of) for resolving disagreements, but it is only one tool/framework in an ensemble.
Some other tools/ frameworks, that are not strictly part of Double Crux (but which are sometimes crucial to bridging disagreements) include NVC, methods for managing people’s intentions and goals, various forms of co-articulation (helping to draw out an inchoate model from one’s conversational partner), etc.
In some contexts other tools are substitutes for Double Crux (ie another framework is more useful) and in some cases other tools are helpful or necessary compliments (ie they solve problems or smooth the process within the Double Crux frame).
In particular, my personal conversational facilitation repertoire is about 60% Double Crux-related techniques, and 40% other frameworks that are not strictly within the frame of Double Crux.
Just to say it clearly: I don’t think Double Crux is the only way to resolve disagreements, or the best way in all contexts. (Though I think it may be the best way, that I know of, in a plurality of common contexts?)
The ideal use case for Double Crux is when...
There are two people...
...who have a real, action-relevant, decision...
...that they need to make together (they can’t just do their own different things)...
...in which both people have strong, visceral intuitions.
Double Cruxes are almost always conversations between two people’s system 1′s.
You can Double Crux between two people’s unendorsed intuitions. (For instance, Alice and Bob are discussing a question about open borders. They both agree that neither of them are economists, and that neither of them trust their intuitions here, and that if they had to actually make this decision, it would be crucial to spend a lot of time doing research and examining the evidence and consulting experts. But nevertheless Alices current intuition leans in favor of open borders , and Bob’s current intuition leans against. This is a great starting point for a Double Crux.)
Double cruxes (as in a crux that is shared by both parties in a disagreement) are common, and useful. Most disagreements have implicit Double Cruxes, though identifying them can sometimes be tricky.
Conjunctive cruxes (I would change my mind about X, if I changed my mind about Y and about Z, but not if I only changed my mind about Y or about Z) are common.
Folks sometimes object that Double Crux won’t work, because their belief depends on a large number of considerations, each one of which has only a small impact on their overall belief, and so no one consideration is a crux. In practice, I find that there are double cruxes to be found even in cases where people expect their beliefs have this structure.
Theoretically, it makes sense that we would find double cruxes in these scenarios: if a person has a strong disagreement (including a disagreement of intuition) with someone else, we should expect that there are a small number of considerations doing most of the work of causing one person to think one thing and the other to think something else. It is improbable that each person’s beliefs depend on 50 factors, and for Alice, most of those 50 factors point in one direction, and for Bob, most of those 50 factors point in the other direction, unless the details of those factors are not independent. If considerations are correlated, you can abstract out the fact or belief that generates the differing predictions in all of those separate considerations. That “generating belief” is the crux.
That said, there is a different conversational approach that I sometimes use, which involves delineating all of the key considerations (then doing Goal-factoring style relevance and completeness checks), and then dealing with each consideration one at time (often via a fractal tree structure: listing the key considerations of each of the higher level considerations).
This approach absolutely requires paper, and skillful (firm, gentle) facilitation, because people will almost universally try and hop around between considerations, and they need to be viscerally assured that their other concerns are recorded and will be dealt with in due course, in order to engage deeply with any given consideration one at a time.
About 60% of the power of Double Crux comes from operationalizing or being specific.
I quite like Liron’s recent sequence on being specific. It re-reminded me of some basic things that have been helpful in several recent conversations. In particular, I like the move of having a conversational partner paint a specific, best case scenario, as a starting point for discussion.
(However, I’m concerned about Less Wrong readers trying this with a spirit of trying to “catch out” one’s conversational partner in inconsistency, instead of trying to understand what their partner wants to say, and thereby shooting themselves in the foot. I think the attitude of looking to “catch out” is usually counterproductive to both understanding and to persuasion. People rarely change their mind when they feel like you have trapped them in some inconsistency, but they often do change their mind if they feel like you’ve actually heard and understood their belief / what they are trying to say / what they are trying to defend, and then provide relevant evidence and argument. In general (but not universally) it is more productive to adopt a collaborative attitude of sincerely trying to help a person articulate, clarify, and substantiate the point your partner is trying to make, even if you suspect that their point is ultimately wrong and confused.)
As an aside, specificity and operationalization is also the engine that makes NVC work. Being specific is really super powerful.
Many (~50%) disagreements evaporate upon operationalization, but this happens less frequently than people think: and if you seem to agree about all of the facts, and agree about all specific operationalizations, but nevertheless seem to have differing attitudes about a question, that should be a flag. [I have a post that I’ll publish soon about this problem.]
You should be using paper when Double Cruxing. Keep track of the chain of Double Cruxes, and keep them in view.
People talk past each other all the time, and often don’t notice it. Frequently paraphrasing your current understanding of what your conversational partner is saying, helps with this. [There is a lot more to say about this problem, and details about how to solve it effectively].
I don’t endorse the Double Crux “algorithm” described in the canonical post. That is, I don’t think that the best way to steer a Double Crux conversation is to hew to those 5 steps in that order. Actually finding double cruxes is, in practice, much more complicated, and there are a large number of heuristics and TAPs that make the process work. I regard that algorithm as an early (and self conscious) attempt to delineate moves that would help move a conversation towards double cruxes.
This is my current best attempt at distilling the core moves that make Double Crux work, though this leaves out a lot.
In practice, I think that double cruxes most frequently emerge not from people independently generating their own list cruxes (though this is useful). Rather double cruxes usually emerge from the move of “checking if the point that your partner made is a crux for you.”
I strongly endorse facilitation of basically all tricky conversations, Double Crux oriented or not. It is much easier to have a third party track the meta and help steer, instead of the participants, who’s working memory is (and should be) full of the object level.
So called, “Triple Crux” is not a feasible operation. If you have more than two stakeholders, have two of them Double Crux, and then have one of those two Double Crux with the third person. Things get exponentially trickier as you add more people. I don’t think that Double Crux is a feasible method for coordinating more than ~ 6 people. We’ll need other methods for that.
Double Crux is much easier when both parties are interested in truth-seeking and in changing their mind, and are assuming good faith about the other. But, these are not strict prerequisites, and unilateral Double Crux is totally a thing.
People being defensive, emotional, or ego-filled does not preclude a productive Double Crux. Some particular auxiliary skills are required for navigating those situations, however.
If a person wants to get better at Double Crux skills, I recommend they cross-train with IDC. Any move that works in IDC you should try in Double Crux. Any move that works in Double Crux you should try in IDC. This will seem silly sometimes, but I am pretty serious about it, even in the silly-seeming cases. I’ve learned a lot this way.
I don’t think Double Crux necessarily runs into a problem of “black box beliefs” wherein one can no longer make progress because one or both parties comes down to a fundamental disagreement about System 1 heuristics/ models that they learned from some training data, but into which they can’t introspect. Almost always, there are ways to draw out those models.
The simplest way to do this (which is not the only or best way, depending on the circumstances, involves generating many examples and testing the “black box” against them. Vary the hypothetical situation to triangulate to the exact circumstances in which the “black box” outputs which suggestions.
I am not making the universal claim that one never runs into black box beliefs that can’t be dealt with.
Disagreements rarely come down to “fundamental value disagreements”. If you think that you have gotten to a disagreement about fundamental values, I suspect there was another conversational tact that would have been more productive.
Also, you can totally Double Crux about values. In practice, you can often treat values like beliefs: often there is some evidence that a person could observe, at least in principle, that would convince them to hold or not hold some “fundamental” value.
I am not making the claim that there are no such thing as fundamental values, or that all values are Double Crux-able.
A semi-esoteric point: cruxes are (or can be) contiguous with operationalizations. For instance, if I’m having a disagreement about whether advertising produces value on net, I might operationalize to “beer commercials, in particular, produce value on net”, which (if I think that operationalization actually captures the original question) is isomorphic to “The value of beer commercials is a crux for the value of advertising. I would change my mind about advertising in general, if I changed my mind about beer commercials.” (In this is an evidential crux, as opposed to the more common causal crux. (More on this distinction in future posts.))
People’s beliefs are strongly informed by their incentives. This makes me somewhat less optimistic about tools in this space than I would otherwise be, but I still think there’s hope.
There are a number of gaps in the repertoire of conversational tools that I’m currently aware of. One of the most important holes is the lack of a method for dealing with psychological blindspots. These days, I often run out of ability to make a conversation go well when we bump into a blindspot in one person or the other (sometimes, there seem to be psychological blindspots on both sides). Tools wanted, in this domain.
(The Double Crux class)
Knowing how to identify Double Cruxes can be kind of tricky, and I don’t think that most participants learn the knack from the 55 to 70 minute Double Crux class at a CFAR workshop.
Currently, I think I can teach the basic knack (not including all the other heuristics and skills) to a person in about 3 hours, but I’m still playing around with how to do this most efficiently. (The “Basic Double Crux pattern” post is the distillation of my current approach.)
This is one development avenue that would particularly benefit from parallel search: If you feel like you “get” Double Crux, and can identify Double Cruxes fairly reliably and quickly, it might be helpful if you explicated your process.
That said, there are a lot of relevant compliments and sub-skills to Double Crux, and to bridging disagreements more generally.
The most important function of the Double Crux class at CFAR workshops is teaching and propagating the concept of a “crux”, and to a lesser extent, the concept of a “double crux”. These are very useful shorthands for one’s personal thinking and for discourse, which are great to have in the collective lexicon.
(Some other things)
Personally, I am mostly focused on developing deep methods (perhaps for training high-expertise specialists) that increase the range of problems of disagreements that the x-risk ecosystem can solve at all. I care more about this goal than about developing shallow tools that are useful “out of the box” for smart non-specialists, or in trying to change the conversational norms of various relevant communities (though both of those are secondary goals.)
I am highly skeptical of teaching many-to-most of the important skills for bridging deep disagreement, via anything other than ~one-on-one, in-person interaction.
In large part due to being prodded by a large number of people, I am polishing all my existing drafts of Double Crux stuff (and writing some new posts), and posting them here over the next few weeks. (There are already some drafts, still being edited, available on my blog.)
I have a standing offer to facilitate conversations and disagreements (Double Crux or not) for rationalists and EAs. Email me at eli [at] rationality [dot] org if that’s something you’re interested in.
People rarely change their mind when they feel like you have trapped them in some inconsistency [...] In general (but not universally) it is more productive to adopt a collaborative attitude of sincerely trying to help a person articulate, clarify, and substantiate [bolding mine—ZMD]
“People” in general rarely change their mind when they feel like you have trapped them in some inconsistency, but people using the double-crux method in the first place are going to be aspiring rationalists, right? Trapping someone in an inconsistency (if it’s a real inconsistency and not a false perception of one) is collaborative: the thing they were thinking was flawed, and you helped them see the flaw! That’s a good thing! (As it is written of the fifth virtue, “Do not believe you do others a favor if you accept their arguments; the favor is to you.”)
Obviously, I agree that people should try to understand their interlocutors. (If you performatively try to find fault in something you don’t understand, then apparent “faults” you find are likely to be your own misunderstandings rather than actual faults.) But if someone spots an actual inconsistency in my ideas, I want them to tell me right away. Performing the behavior of trying to substantiate something that cannot, in fact, be substantiated (because it contains an inconsistency) is a waste of everyone’s time!
In general (but not universally) it is more productive to adopt a collaborative attitude
Can you say more about what you think the exceptions to the general-but-not-universal rule are? (Um, specifically.)
I would think that inconsistencies are easier to appriciate when they are in the central machinery. A rationalist might have more load bearing on their beliefs so most beliefs are central to atleast something but I think a centrality/point-of-communication check is more upside than downside to keep. Also cognitive time spent looking for inconsistencies could be better spent on more constructive activities. Then there is the whole class of heuristics which don’t even claim to be consistent. So the ability to pass by an inconsistency without hanging onto it will see use.
Currently, I think I can teach the basic knack (not including all the other heuristics and skills) to a person in about 3 hours, but I’m still playing around with how to do this most efficiently. (The “Basic Double Crux pattern” post is the distillation of my current approach.)
How about doing this a few times on video? Watching the video might not be as effective as the one-on-one teaching but I would expect that watching a few 1-on-1 explanations would be a good way to learn about the process.
From a learning perspective it also helps a lot for reflecting on the technique. The early NLP folks spent a lot of time analysing tapes of people performing techniques to better understand the techniques.
I in fact recorded a test session of attempting to teach this via Zoom last weekend. This was the first time I tried a test session via Zoom however and there were a lot of kinks to work out, so I probably won’t publish that version in particular.
But yeah, I’m interested in making video recordings of some of this stuff and putting up online.
Thanks for mentioning conjugative cruxes. That was always my biggest objection to this technique. At least when I went through CFAR, the training completely ignored this possibility. It was clear that it often worked anyway, but the impression that I got was that it was the general frame which was important more than the precise methodology which at that time still seemed in need of refinement.
To me, it looks like the numbers in the General section go 1, 4, 5, 5, 6, 7, 8, 9, 3, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 2, 3, 3, 4, 2, 3, 4 (ignoring the nested numbers).
A few months ago, I wrote about how RAND, and the “Defense Intellectuals” of the cold war represent another precious datapoint of “very smart people, trying to prevent the destruction of the world, in a civilization that they acknowledge to be inadequate to dealing sanely with x-risk.”
Since then I spent some time doing additional research into what cognitive errors and mistakes those consultants, military officials, and politicians made that endangered the world. The idea being that if we could diagnose which specific irrationalities they were subject to, that this would suggest errors that might also be relevant to contemporary x-risk mitigators, and might point out some specific areas where development of rationality training is needed.
However, this proved somewhat less fruitful than I was hoping, and I’ve put it aside for the time being. I might come back to it in the coming months.
It does seem worth sharing at least one relevant anecdote, from Daniel Ellsberg’s excellent book, the Doomsday Machine, and analysis, given that I’ve already written it up.
The missile gap
In the late nineteen-fifties it was widely understood that there was a “missile gap”: that the soviets had many more ICBM (“intercontinental ballistic missiles” armed with nuclear warheads) than the US.
Estimates varied widely on how many missiles the soviets had. The Army and the Navy gave estimates of about 40 missiles, which was about at parity with the the US’s strategic nuclear force. The Air Force and the Strategic Air Command, in contrast, gave estimates of as many as 1000 soviet missiles, 20 times more than the US’s count.
(The Air Force and SAC were incentivized to inflate their estimates of the Russian nuclear arsenal, because a large missile gap strongly necessitated the creation of more nuclear weapons, which would be under SAC control and entail increases in the Air Force budget. Similarly, the Army and Navy were incentivized to lowball their estimates, because a comparatively weaker soviet nuclear force made conventional military forces more relevant and implied allocating budget-resources to the Army and Navy.)
So there was some dispute about the size of the missile gap, including an unlikely possibility of nuclear parity with the Soviet Union. Nevertheless, the Soviet’s nuclear superiority was the basis for all planning and diplomacy at the time.
Kennedy campaigned on the basis of correcting the missile gap. Perhaps more critically, all of RAND’s planning and analysis was concerned with the possibility of the Russians launching a nearly-or-actually debilitating first or second strike.
The revelation
In 1961 it came to light, on the basis of new satellite photos, that all of these estimates were dead wrong. It turned out the the Soviets had only 4 nuclear ICBMs, one tenth as many as the US controlled.
The importance of this development should be emphasized. It meant that several of the fundamental assumptions of US nuclear planners were in error.
First of all, it meant that the Soviets were not bent on world domination (as had been assumed). Ellsberg says…
Since it seemed clear that the Soviets could have produced and deployed many, many more missiles in the three years since their first ICBM test, it put in question—it virtually demolished—the fundamental premise that the Soviets were pursuing a program of world conquest like Hitler’s. … That pursuit of world domination would have given them an enormous incentive to acquire at the earliest possible moment the capability to disarm their chief obstacle to this aim, the United States and its SAC. [That] assumption of Soviet aims was shared, as far as I knew, by all my RAND colleagues and with everyone I’d encountered in the Pentagon: The Assistant Chief of Staff, Intelligence, USAF, believes that Soviet determination to achieve world domination has fostered recognition of the fact that the ultimate elimination of the US, as the chief obstacle to the achievement of their objective, cannot be accomplished without a clear preponderance of military capability. If that was their intention, they really would have had to seek this capability before 1963. The 1959–62 period was their only opportunity to have such a disarming capability with missiles, either for blackmail purposes or an actual attack. After that, we were programmed to have increasing numbers of Atlas and Minuteman missiles in hard silos and Polaris sub-launched missiles. Even moderate confidence of disarming us so thoroughly as to escape catastrophic damage from our response would elude them indefinitely. Four missiles in 1960–61 was strategically equivalent to zero, in terms of such an aim.
This revelation about soviet goals was not only of obvious strategic importance, it also took the wind out of the ideological motivation for this sort of nuclear planning. As Ellsberg relays early in his book, many, if not most, RAND employees were explicitly attempting to defend US and the world from what was presumed to be an aggressive communist state, bent on conquest. This just wasn’t true.
But it had even more practical consequences: this revelation meant that the Russians had no first strike (or for that matter, second strike) capability. They could launch their ICBMs at American cities or military bases, but such an attack had no chance of debilitating US second strike capacity. It would unquestionably trigger a nuclear counterattack from the US who, with their 40 missiles, would be able to utterly annihilate the Soviet Union. The only effect of a Russian nuclear attack would be to doom their own country.
[Eli’s research note: What about all the Russian planes and bombs? ICBMs aren’t the the only way of attacking the US, right?]
This means that the primary consideration in US nuclear war planning at RAND and elsewhere, was fallacious. The Soviet’s could not meaningfully destroy the US.
…the estimate contradicted and essentially invalidated the key RAND studies on SAC vulnerability since 1956. Those studies had explicitly assumed a range of uncertainty about the size of the Soviet ICBM force that might play a crucial role in combination with bomber attacks. Ever since the term “missile gap” had come into widespread use after 1957, Albert Wohlstetter had deprecated that description of his key findings. He emphasized that those were premised on the possibility of clever Soviet bomber and sub-launched attacks in combination with missiles or, earlier, even without them. He preferred the term “deterrent gap.” But there was no deterrent gap either. Never had been, never would be. To recognize that was to face the conclusion that RAND had, in all good faith, been working obsessively and with a sense of frantic urgency on a wrong set of problems, an irrelevant pursuit in respect to national security.
This realization invalidated virtually all of RAND’s work to date. Virtually every, analysis, study, and strategy, had been useless, at best.
The reaction to the revelation
How did RAND employees respond to this reveal, that their work had been completely off base?
That is not a recognition that most humans in an institution are quick to accept. It was to take months, if not years, for RAND to accept it, if it ever did in those terms. To some degree, it’s my impression that it never recovered its former prestige or sense of mission, though both its building and its budget eventually became much larger. For some time most of my former colleagues continued their focus on the vulnerability of SAC, much the same as before, while questioning the reliability of the new estimate and its relevance to the years ahead. [Emphasis mine] … For years the specter of a “missile gap” had been haunting my colleagues at RAND and in the Defense Department. The revelation that this had been illusory cast a new perspective on everything. It might have occasioned a complete reassessment of our own plans for a massive buildup of strategic weapons, thus averting an otherwise inevitable and disastrous arms race. It did not; no one known to me considered that for a moment. [Emphasis mine]
According to Ellsberg, many at RAND were unable to adapt to the new reality and continued (fruitlessly) to continue with what they were doing, as if by inertia, when the thing that they needed to do (to use Eliezer’s turn of phrase) is “halt, melt, and catch fire.”
This suggests that one failure of this ecosystem, that was working in the domain of existential risk, was a failure to “say oops“: to notice a mistaken belief, concretely acknowledge that is was mistaken, and to reconstruct one’s plans and world views.
Relevance to people working on AI safety
This seems to be at least some evidence (though, only weak evidence, I think), that we should be cautious of this particular cognitive failure ourselves.
It may be worth rehearsing the motion in advance: how will you respond, when you discover that a foundational crux of your planning is actually mirage, and the world is actually different than it seems?
What if you discovered that your overall approach to making the world better was badly mistaken?
What if you received a strong argument against the orthogonality thesis?
What about a strong argument for negative utilitarianism?
I think that many of the people around me have effectively absorbed the impact of a major update at least once in their life, on a variety of issues (religion, x-risk, average vs. total utilitarianism, etc), so I’m not that worried about us. But it seems worth pointing out the importance of this error mode.
A note: Ellsberg relays later in the book that, durring the Cuban missile crisis, he perceived Kennedy as offering baffling terms to the soviets: terms that didn’t make sense in light of the actual strategic situation, but might have been sensible under the premiss of a soviet missile gap. Ellsberg wondered, at the time, if Kennedy had also failed to propagate the update regarding the actual strategic situation.
I believed it very unlikely that the Soviets would risk hitting our missiles in Turkey even if we attacked theirs in Cuba. We couldn’t understand why Kennedy thought otherwise. Why did he seem sure that the Soviets would respond to an attack on their missiles in Cuba by armed moves against Turkey or Berlin? We wondered if—after his campaigning in 1960 against a supposed “missile gap”—Kennedy had never really absorbed what the strategic balance actually was, or its implications.
I mention this because additional research suggests that this is implausible: that Kennedy and his staff were aware of the true strategic situation, and that their planning was based on that premise.
I can’t speak for habryka, but I think your post did a great job of laying out the need for “say oops” in detail. I read the Doomsday Machine and felt this point very strongly while reading it, but this was a great reminder to me of its importance. I think “say oops” is one of the most important skills for actually working on the right thing, and that in my opinion, very few people have this skill even within the rationality community.
There feel to me like two relevant questions here, which seem conflated in this analysis:
1) At what point did the USSR gain the ability to launch a comprehensively-destructive, undetectable-in-advance nuclear strike on the US? That is, at what point would a first strike have been achievable and effective?
2) At what point did the USSR gain the ability to launch such a first strike usingICBMs in particular?
By 1960 the USSR had 1,605 nuclear warheads; there may have been few ICBMs among them, but there are other ways to deliver warheads than shooting them across continents. Planes fail the “undetectable” criteria, but ocean-adjacent cities can be blown up by small boats, and by 1960 the USSR had submarines equipped with six “short”-range (650 km and 1,300 km) ballistic missiles. By 1967 they were producing subs like this, each of which was armed with 16 missiles with ranges of 2,800-4,600 km.
All of which is to say that from what I understand, RAND’s fears were only a few years premature.
[Note: I’ve started a research side project on this question, and it is already obvious to me that this ontology importantly wrong.]
There’s a common phenomenology of “mental energy”. For instance, if I spend a couple of hours thinking hard (maybe doing math), I find it harder to do more mental work afterwards. My thinking may be slower and less productive. And I feel tired, or drained, (mentally, instead of physically).
Mental energy is one of the primary resources that one has to allocate, in doing productive work. In almost all cases, humans have less mental energy than they have time, and therefore effective productivity is a matter of energy management, more than time management. If we want to maximize personal effectiveness, mental energy seems like an extremely important domain to understand. So what is it?
The naive story is that mental energy is an actual energy resource that one expends and then needs to recoup. That is, when one is doing cognitive work, they are burning calories, depleting their bodies energy stores. As they use energy, they have less fuel to burn.
My current understanding is that this story is not physiologically realistic. Thinking hard does consume more of the body’s energy than baseline, but not that much more. And we experience mental fatigue long before we even get close to depleting our calorie stores. It isn’t literal energy that is being consumed. [The Psychology of Fatigue pg.27]
So if not that, what is going on here?
A few hypotheses:
(The first few, are all of a cluster, so I labeled them 1a, 1b, 1c, etc.)
Hypothesis 1a: Mental fatigue is a natural control system that redirects our attention to our other goals.
The explanation that I’ve heard most frequently in recent years (since it became obvious that much of the literature on ego-depletion was off the mark), is the following:
A human mind is composed of a bunch of subsystems that are all pushing for different goals. For a period of time, one of these goal threads might be dominant. For instance, if I spend a few hours doing math, this means that my other goals are temporarily suppressed or on hold: I’m not spending that time seeking a mate, or practicing the piano, or hanging out with friends.
In order to prevent those goals from being neglected entirely, your mind has a natural control system that prevents you from focusing your attention on any one thing at a time: the longer you put your attention on something, the greater the build up of mental fatigue, causing you to do anything else.
Comments and model-predictions: This hypothesis, as stated, seems implausible to me. For one thing, it seems to suggest that that all actives would be equally mentally taxing, which is empirically false: spending several hours doing math is mentally fatiguing, but spending the same amount of time watching TV is not.
This might still be salvaged if we offer some currency other than energy that is being preserved: something like “forceful computations”. But again, it doesn’t seem obvious why the computations of doing math would be more costly than those for watching TV.
Similarly, this model suggests that “a change is as good as a break”: if you switch to a new task, you should be back to full mental energy, until you become fatigued for that task as well.
Hypothesis 1b: Mental fatigue is the phenomenological representation of the loss of support for the winning coalition.
A variation on this hypothesis would be to model the mind as a collection of subsystems. At any given time, there is only one action sequence active, but that action sequence is determined by continuous “voting” by various subsystems.
Overtime, these subsystems get fed up with their goals not being met, and “withdraw support” for the current activity. This manifests as increasing mental fatigue. (Perhaps your thoughts get progressively less effective, because they are interrupted, on the scale of micro-seconds, by bids to think something else).
Comments and model-predictions: This seems like it might suggest that if all of the subsystems have high trust that their goals will be met, that math (or any other cognitively demanding task) would cease to be mentally taxing. Is that the case? (Does doing math mentally exhaust Critch?)
This does have the nice virtue of explaining burnout: when some subset of needs are not satisfied for a long period, the relevant subsystems pull their support for all actions, until those needs are met.
[Is burnout a good paradigm case for studying mental energy in general?]
Hypothesis 1c: The same as 1a or 1b, but some mental operations are painful for some reason.
To answer my question above, one reason why math might be more mentally taxing than watching TV, is that doing math is painful.
If the process of doing math is painful on the micro-level, then even if all of the other needs are met, there is still a fundamental conflict between the subsystem that is aiming to acquire math knowledge, and the subsystem that is trying to avoid micro-pain on the micro-level.
As you keep doing math, the micro pain part votes more and more strongly against doing math, or the overall system biases away from the current activity, and you run out of mental energy.
Comments and model-predictions: This seems plausible for the activity of doing math, which involves many moments of frustration, which might be meaningfully micro-painful. But it seems less consistent with activities like writing, which phenomenologically feel non-painful. This leads to hypothesis 1d…
Hypothesis 1d: The same as 1c, but the key micro-pain is that of processing ambiguity second to second
Maybe the pain comes from many moments of processing ambiguity, which is definitely a thing that is happening in the context of writing. (I’ll sometimes notice myself try to flinch to something easier when I’m not sure which sentence to write.) It seems plausible that mentally taxing activities are taxing to the extent that they involve processing ambiguity, and doing a search for the best template to apply.
Hypothesis 1e: Mental fatigue is the penalty incurred for top down direction of attention.
Maybe consciously deciding to do things is importantly different from the “natural” allocation of cognitive resources. That is, your mind is set up such that the conscious, System 2, long term planning, metacognitive system, doesn’t have free rein. It has a limited budget of “mental energy”, which measures how long it is allowed to call the shots before the visceral, system 1, immediate gratification systems take over again.
Maybe this is an evolutionary adaption? For the monkeys that had “really good” plans for how to achieve their goals, never panned out for them. The monkeys that were impulsive some of the time, actually did better at the reproduction game?
(If this is the case, can the rest of the mind learn to trust S2 more, and thereby offer it a bigger mental energy budget?)
This hypothesis does seem consistent with my observation that rest days are rejuvenating, even when I spend my rest day working on cognitively demanding side projects.
Hypothesis 2: Mental fatigue is the result of the brain temporarily reaching knowledge saturation.
When learning a motor task, there are several phases in which skill improvement occurs. The first, unsurprisingly, is durring practice sessions. However, one also sees automatic improvements in skill in the hours after practice [actually this part is disputed] and following a sleep period (academic link1, 2, 3). That is, there is a period of consolidation following a practice session. This period of consolidation probably involves the literal strengthening of neural connections, and encoding other brain patterns that take more than a few seconds to set.
I speculate, that your brain may reach a saturation point: more practice, more information input, becomes increasingly less effective, because you need to dedicate cognitive resources to consolidation. [Note that this is supposing that there is some tradeoff between consolidation activity and input activity, as opposed to a setup where both can occur simultaneously (does anyone have evidence for such a tradeoff?)].
If so, maybe cognitive fatigue is the phenomenology of needing to extract one’s self from a practice / execution regime, so that your brain can do post-processing and consolidation on what you’ve already done and learned.
Comments and model-predictions: This seems to suggest that all cognitively taxing tasks are learning tasks, or at least tasks in which one is encoding new neural patterns. This seems plausible, at least.
It also seems to naively imply that an activity will become less mentally taxing as you gain expertise with it, and progress along the learning curve. There is (presumably) much more information to process and consolidate in your first hour of doing math than in your 500th.
Hypothesis 3: Mental fatigue is a control system that prevents some kind of damage to the mind or body.
One reason why physical fatigue is useful is that it prevents damage to your body. Getting tired after running for a bit, stops you for running all out for 30 hours at a time, and eroding your fascia.
By simple analogy to physical fatigue, we might guess that mental fatigue is a response to vigorous mental activity that is adaptive in that it prevents us from hurting ourselves.
I have no idea what kind of damage might be caused by thinking too hard.
I note that mania and hypomania involve apparently limitless mental energy reserves, and I think that theses states are bad for your brain.
Hypothesis 4: Mental fatigue is a buffer overflow of peripheral awareness.
Another speculative hypothesis: Human minds have a working memory: a limit of ~4 concepts, or chunks, that can be “activated”, or operated upon in focal attention, at one time. But meditators, at least, also talk a peripheral awareness: a sort of halo of concepts and sense impressions that are “loaded up”, or “near by”, or cognitively available, or “on the fringes of awareness”. These are all the ideas that are “at hand” to your thinking. [Note: is peripheral awareness, as the meditators talk about, the same thing as “short term memory”?]
Perhaps if there is a functional limit to the amount of content that can be held in working memory, there is a similar, if larger, limit to how much content can be held in peripheral awareness. As you engage with a task, more and more mental content is loaded up, or added to peripheral awareness, where it both influences your focal thought process, and/or is available to be operated on directly in working memory. As you continue the task, and more and more content gets added to peripheral awareness, you begin to overflow its capacity. It gets harder and harder to think, because peripheral awareness is overflowing. Your mind needs space to re-ontologize: to chunk pieces together, so that it can all fit in the same mental space. Perhaps this is what mental fatigue is.
Comments and model-predictions: This does give a nice clear account of why sleep replenishes mental energy (it both causes re-ontologizing, and clears the cache), though perhaps this does not provide evidence over most of the other hypotheses listed here.
Other notes about mental energy:
In this post, I’m mostly talking about mental energy on the scale of hours. But there is also a similar phenomenon on the scale of days (the rejuvenation one feels after rest days) and on the scale of months (burnout and such). Are these the same basic phenomenon on different timescales?
On the scale of days, I find that my subjective rest-o-meter is charged up if I take a rest day, even if I spend that rest day working on fairly cognitively intensive side projects.
This might be because there’s a kind of new project energy, or new project optimism?
Mania and hypomania entail limitless mental energy.
People seem to be able to play video games for hours and hours without depleting mental energy. Does this include problem solving games, or puzzle games?
Also, just because they can play indefinitely does not mean that their performance doesn’t drop. Does performance drop, across hours of playing, say, snakebird?
For that matter, does performance decline on a task correlate with the phenomenological “running out of energy”? Maybe those are separate systems.
On Hypothesis 3, the brain may build up waste as a byproduct of its metabolism when it’s working harder than normal, just as muscles do. Cleaning up this buildup seems to be one of the functions of sleep. Even brainless animals like jellyfish sleep. They do have neurons though.
I also think it’s reasonable to think that multiple things may be doing on that result in a theory of mental energy. For example, hypotheses 1 and 2 could both be true and result in different causes of similar behavior. I bring this up because I think of those as two different things in my experience: being “full up” and needing to allow time for memory consolidation where I can still force my attention it just doesn’t take in new information vs. being unable to force the direction of attention generally.
Sure. It feels like my head is “full”, although the felt sense is more like my head has gone from being porous and sponge-like to hard and concrete-like. When I try to read or listen to something I can feel it “bounce off” in that I can’t hold the thought in memory beyond forcing it to stay in short term memory.
Isn’t it possible that there’s some other biological sink that is time delayed from caloric energy? Like say, a very specific part of your brain needs a very specific protein, and only holds enough of that protein for 4 hours? And it can take hours to build that protein back up. This seems to me to be at least somewhat likeely.
Someone smart once made a case like to this to me in support of a specific substance (can’t remember which) as a nootropic, though I’m a bit skeptical.
I think about this a lot. I’m currently dangling with the fourth Hypothesis, which seems more correct to me and one where I can actually do something to ameliorate the trade-off implied by it.
In this comment, I talk what it means to me and how I can do something about it, which ,in summary, is to use Anki a lot and change subjects when working memory gets overloaded. It’s important to note that mathematics is sort-of different from another subjects, since concepts build on each other and you need to keep up with what all of them mean and entail, so we may be bound to reach an overload faster in that sense.
A few notes about your other hypothesis:
Hypothesis 1c:
it doesn’t seem obvious why the computations of doing math would be more costly than those for watching TV.
It’s because we’re not used to it. Some things come easier than other; some things are more closely similar to what we have been doing for 60000 years (math is not one of them). So we flinch from that which we are not use to. Although, adaptation is easy and the major hurdle is only at the beginning.
This seems plausible for the activity of doing math, which involves many moments of frustration, which might be meaningfully micro-painful.
It may also mean that the reward system is different. Is difficult to see on a piece of mathematics, as we explore it, how fulfilling it’s when we know that we may not be getting anywhere. So the inherent reward is missing or has to be more artificially created.
Hypothesis 1d:
It seems plausible that mentally taxing activities are taxing to the extent that they involve processing ambiguity, and doing a search for the best template to apply.
This seems correct to me. Consider the following: “This statement is false”.
Thinking about it for a few minutes (or iterations of that statement) is quickly bound to make us flinch away in just a few seconds. How many other things take this form? I bet there are many.
For the monkeys that had “really good” plans for how to achieve their goals, never panned out for them. The monkeys that were impulsive some of the time, actually did better at the reproduction game?
Instead of working to trust System 2 is it there a way to train System 1? It seems more apt to me, like training tactics in chess or to make rapid calculations.
Thank you for the good post, I’d really like to further know more about your findings.
Seems to me that mental energy is lost by frustration. If what you are doing is fun, you can do it for a log time; if it frustrates you at every moment, you will get “tired” soon.
The exact mechanism… I guess is that some part of the brain takes frustration as an evidence that this is not the right thing to do, and suggests doing something else. (Would correspond to “1b” in your model?)
I recently read Prisoner’s Dilemma, which half an introduction to very elementary game theory, and half a biography of John Von Neumann, and watched this old PBS documentary about the man.
I’m glad I did. Von Neumann has legendary status in my circles, as the smartest person ever to live. [1] Many times I’ve written the words “Von Neumann Level Intelligence” in a AI strategy document, or speculated about how many coordinated Von Neumanns would it take to take over the world. (For reference, I now think that 10 is far too low, mostly because he didn’t seem to have the entrepreneurial or managerial dispositions.)
Learning a little bit more about him was humanizing. Yes, he was the smartest person ever to live, but he was also an actual human being, with actual human traits.
Watching this first clip, I noticed that I was surprised by a number of thing.
That VN had an accent. I had known that he was Hungarian, but somehow it had never quite propagated that he would speak with a Hungarian accent.
That he was middling height (somewhat shorter than the presenter he’s talking too).
The thing he is saying is the sort of thing that I would expect to hear from any scientist in the public eye, “science education is important.” There is something revealing about Von Neumann, despite being the smartest person in the world, saying basically what I would expect Neil DeGrasse Tyson to say in an interview. A lot of the time he was wearing his “scientist / public intellectual” hat, not the “smartest person ever to live” hat.
Some other notes of interest:
He was not a skilled poker player, which punctured my assumption that Von Neumann was omnicompetent. (pg. 5) Nevertheless, poker was among the first inspirations for game theory. (When I told this to Steph, she quipped “Oh. He wasn’t any good at it, so he developed a theory from first principles, describing optimal play?” For all I know, that might be spot on.)
Perhaps relatedly, he claimed he had low sales resistance, and so would have his wife come clothes shopping with him. (pg. 21)
He was sexually crude, and perhaps a bit misogynistic. Eugene Wigner stated that “Johny believed in having sex, in pleasure, but not in emotional attachment. HE was interested in immediate pleasure and little comprehension of emotions in relationships and mostly saw women in terms of their bodies.” The journalist Steve Heimes wrote “upon entering an office where a pretty secretary was working, von Neumann habitually would bend way over, more or less trying to look up her dress.” (pg. 28) Not surprisingly, his relationship with his wife, Klara, was tumultuous, to say the least.
He did however, maintain a strong, life long, relationship with his mother (who died the same year that he did).
Overall, he gives the impression of a genius, overgrown child.
Unlike many of his colleagues, he seemed not to share the pangs conscience that afflicted many of the bomb creators. Rather than going back to academia following the war, he continued doing work for the government, including the development of the Hydrogen bomb.
Von Neumann advocated preventative war: giving the Soviet union an ultimatum of joining a world government, backed by the threat of (and probable enaction of) nuclear attack, while the US still had a nuclear monopoly. He famously said of the matter, “If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o’clock, I say why not 1 o’clock.”
This attitude was certainly influenced by his work on game theory, but it should also be noted that Von Neumann hated communism.
Richard Feynman reports that Von Neumann, in their walks through the Los Alamos desert, convinced him to adopt and attitude of “social irresponsibility”, that one “didn’t have to be responsible for the world he was in.”
Prisoner’s dilemma says that he and his collaborators “pursued patents less aggressively than the could have”. Edward Teller commented, “probably the IBM company owes half its money to John Von Neumann.” (pg. 76)
So he was not very entrepreneurial, which is a bit of a shame, because if he had the disposition he probably could have made a lot of money. (He certainly had the energy to be an entrepreneur: he only slept for a few hours a night, and was working for basically all his working hours.
He famously always wore a grey oxford 3 piece suit, including when playing tennis with Stanislaw Ulam, or when riding a donkey down the grand canyon. I’m not clear why. Was that more comfortable? Did he think it made him look good? Did he just not want to have to ever think about clothing, and so preferred to be over-hot in the middle of the Los Alamos desert, rather than need to think about if today was “shirt sleeves whether”?
Von Neumann himself once commented on the strange fact of so many Hungarian geniuses growing up in such a small area, in his generation:
Stanislaw Ulam recalled that when Von Neumann was asked about this “statistically unlikely” Hungarian phenomenon, Von Neumann “would say that it was a coincidence of some cultural factors which he could not make precise: an external pressure on the whole society of this part of Central Europe, a subconscious feeling of extreme insecurity in individual, and the necessity of producing the unusual or facing extinction.” (pg. 66)
One thing that surprised me most was that it seems that, despite being possibly the smartest person in modernity, he would have benefited from attending a CFAR workshop.
For one thing, at the end of his life, he was terrified of dying. But throughout the course of his life he made many reckless choices with his health.
He ate gluttonously and became fatter and fatter over the course of his life. (One friend remarked that he “could count anything but calories.”)
Furthermore, he seemed to regularly risk his life when driving.
Von Neuman was an aggressive and apparently reckless driver. He supposedly totaled his car every year or so. An intersection in Princeton was nicknamed “Von Neumann corner” for all the auto accidents he had there. records of accidents and speeding arrests are preserved in his papers. [The book goes on to list a number of such accidents.] (pg. 25)
(Amusingly, Von Neumann’s reckless driving seems due, not to drinking and driving, but to singing and driving. “He would sway back and forth, turning the steering wheel in time with the music.”)
I think I would call this a bug.
On another thread, one of his friends (the documentary didn’t identify which) expressed that he was over-impressed by powerful people, and didn’t make effective tradeoffs.
I wish he’d been more economical with his time in that respect. For example, if people called him to Washington or elsewhere, he would very readily go and so on, instead of having these people come to him. It was much more important, I think, he should have saved his time and effort. He felt, when the government called, [that] one had to go, it was a patriotic duty, and as I said before he was a very devoted citizen of the country. And I think one of the things that particularly pleased him was any recognition that came sort-of from the government. In fact, in that sense I felt that he was sometimes somewhat peculiar that he would be impressed by government officials or generals and so on. If a big uniform appeared that made more of an impression than it should have. It was odd. But it shows that he was a person of many different and sometimes self contradictory facets, I think.
Stanislaw Ulam speculated, “I think he had a hidden admiration for people and organizations that could be tough and ruthless.” (pg. 179)
From these statements, it seems like Von Neumann leapt at chances to seem useful or important to the government, somewhat unreflectively.
These anecdotes suggest that Von Neumann would have gotten value out of Goal Factoring, or Units of Exchange, or IDC (possibly there was something deeper going on, regarding a blindspots around death, or status, but I think the point still stands, and he would have benefited from IDC).
Despite being the discoverer/ inventor of VNM Utility theory, and founding the field of Game Theory (concerned with rational choice), it seems to me that Von Neumann did far less to import the insights of the math into his actual life than say, Critch.
(I wonder aloud if this is because Von Neumann was born and came of age before the development of cognitive science. I speculate that the importance of actually applying theories of rationality in practice, only becomes obvious after Tversky and Kahneman demonstrate that humans are not rational by default. (In evidence against this view: Eliezer seems to have been very concerned with thinking clearly, and being sane, before encountering Heuristics and Biases in his (I believe) his mid 20s. He was exposed to Evo Psych though, and that may have served a similar role.))
Also, he converted to Catholicism at the end of his life, buying on Pascal’s Wager. He commented “So long as there is the possibility of eternal damnation for nonbelievers it is more logical to be a believer at the end”, and “There probably has to be a God. Many things are easier to explain if there is than if there isn’t.”
(According to wikipedia, this deathbed conversion did not give him much comfort.)
This suggests that he would have gotten value out of reading the sequences, in addition to attending a CFAR workshop.
Seems to me the most imporant lesson here is “even if you are John von Neumann, you can’t take over the world alone.”
First, because no matter how smart you are, you will have blind spots.
Second, because your time is still limited to 24 hours a day; even if you’d decide to focus on things you have been neglecting until now, you would have to start neglecting the things you have been focusing on until now. Being better at poker (converting your smartness to money more directly), living healthier and therefore on average longer, developing social skills, and being strategic in gaining power… would perhaps come at a cost of not having invented half of the stuff. When you are John von Neumann, your time has insane opportunity costs.
Is there any information on how Von Neumann came to believe Catholicism was the correct religion for Pascal Wager purposes? “My wife is Catholic” doesn’t seem like very strong evidence...
I note that it does seem to be the religion of choice for former atheists, or at least for rationalists. I know of several rationalists that converted to catholicism, but none that have converted to any other religion.
TL;DR: I’m offering to help people productively have difficult conversations and resolve disagreements, for free. Feel free to email me if and when that seems helpful. elitrye [at] gmail.com
Facilitation
Over the past 4-ish years, I’ve had a side project of learning, developing, and iterating on methods for resolving tricky disagreements, and failures to communicate. A lot of this has been in the Double Crux frame, but I’ve also been exploring a number of other frameworks (including, NVC, Convergent Facilitation, Circling-inspired stuff, intuition extraction, and some home-grown methods).
As part of that, I’ve had a standing offer to facilitate / mediate tricky conversations for folks in the CFAR and MIRI spheres (testimonials below). Facilitating “real disagreements”, allows me to get feedback on my current conversational frameworks and techniques. When I encounter blockers that I don’t know how to deal with, I can go back to the drawing board to model those problems and interventions that would solve them, and iterate from there, developing new methods.
I generally like doing this kind of conversational facilitation and am open to doing a lot more of it with a wider selection of people.
I am extending an offer to help mediate tricky conversations, to anyone that might read this post, for the foreseeable future. [If I retract this offer, I’ll come back and leave a note here.]
What sort of thing is this good for?
I’m open to trying to help with a wide variety of difficult conversations, but the situations where I have been most helpful in the past have had the following features:
Two* people are either having some conflict or disagreement or are having difficulty understanding something about what the other person is saying.
There’s some reason to expect the conversation to not “work”, by default: either they’ve tried already, and made little progress etc. or, at least one person can predict that this conversation will be tricky or heated.
There is enough mutual respect and/or there is enough at stake that it seems worthwhile to try and have the conversation anyway. It seems worth the time to engage.
Here are some (anonymized) examples of conversations that I’ve facilitated in the past years.
Two researchers work in related fields, but in different frames / paradigms. Try as they might, neither person can manage to see how the other’s claims are even plausible.
Two friends are working on a project together, but they each feel inclined to take it in a different direction, and find it hard to get excited about the other’s proposal, even having talked about the question a lot.
John and Janet are EAs. John thinks that the project that Janet has spent the past year on, and is close to launching, is net negative, and that Janet should drop it entirely. Janet feels exasperated by this and generally feels that John is overly-controlling.
Two rationalists Laura and Alex, are each in some kind of community leadership role, and have a lot of respect for each other, but they have very different takes on a particular question of social mores: Laura thinks that there is a class of norm enforcement that is normal and important, Alex thinks that class of “norm enforcement” behavior is unacceptable and corrosive to the social fabric. They sit down to talk about it, but seem to keep going in circles without clarifying anything.
Basically, if you have a tricky disagreement that you want to try to hash out, and you feel comfortable inviting an outside party, feel free to reach out to me.
(If there’s some conversation or conflict that you have in mind, but don’t know if it falls in this category, feel free to email me and ask.)
*- I’m also potentially open to trying to help with conflicts that involve more than two people, such as a committee that is in gridlock, trying to make a decision, but I am much less practiced with that.
The process
If everyone involved is open to a third person (me) coming in to mediate, shoot me an email at elityre [at] gmail.com, and we can schedule a half hour call to discuss your issue. After discussing it a bit, I’ll tell you if I think I can help or not. If not, I might refer you to other people resources that might be more useful.
If it seems like I can help, I typically prefer to meet with both parties one-on-one, as much as a week before we meet together, so that I can “load up” each person’s perspective, and start doing prep work. From there we can schedule a conversation, presumably over Zoom, for all three (or more) of us to meet.
In the conversation itself, I would facilitate, tracking what’s happening and suggesting particular conversational moves or tacts, and possibly recommending and high-level framework.
[I would like to link to an facilitation-example video here, but almost all of the conversations that I’ve facilitated are confidential. Hopefully this post will lead to one or two that can be public.]
Individual cases can vary a lot, and I’m generally open to considering alternative formats.
Currently, I’m doing this free of charge.
My sense of my current level of skill
I think this is a domain in which deep mastery is possible. I don’t consider myself to be a master, but I am aspiring to mastery.
My (possibly biased impression), is that the median outcome of my coming to help with a conversation is “eh, that was moderately helpful, mostly because having a third person to help hold space, freed up our working memory to focus on the object level.”
Occasionally (one out of every 10 conversations?), I think I’ve helped dramatically, on the order of “this conversation was not working at all, until Eli came to help, and then we had multiple breakthroughs in understanding.”
(I’ve started explicitly tracking my participants’ estimation of my counterfactual impact, following conversations, so I hope to have much better numbers for assessing how useful this work is in a few months. Part of my hope in doing more of this is that I will get a more accurate assessment of how much value my facilitation in particular provides, and how much I should be investing in this general area.)
Testimonials
(I asked a number of people who I’ve done facilitation work in the past to give me a short honest testimonial, if they felt comfortable with that. I included the blurb from every person who sent me something, though this is still a biased sample, since I mostly reached out to people who I expected would give a “positive review”.)
Anna Salamon:
I’ve found Eli quite helpful with a varied set of tricky conversations over the years. Some details:
- It helps that he can be tracking whether we are understanding each other, vs whether it is time to paraphrase;
- It helps that he can be tracking whether we are speaking to a “crux” or are on an accidental tangent/dead-end (I can do many of these things too, but when Eli is facilitating I can trust him to do some of this, which leaves me with more working memory for understanding the other party’s perspective, figuring out how to articulate my own, etc.)
- It helps that he can help track the conversational stack, so that e.g. if I stop to paraphrase my conversation partner’s point, that doesn’t mean we’ll never get back to the thing I was trying to keep track of.
- It has sometimes helped that he could paraphrase one or the other of us in ways the other party couldn’t, but could then hear [after hearing his paraphrase];
- I have seen him help with both research-like/technical conversational topics, and messy cultural stuff.
- He can often help in cases where many folks would intuitively assume that a conversation is just “stuck,” e.g. because it boils down to a difference in aesthetics or root empistemological perspectives or similar (Eli has a bunch of cached patterns for sometimes allowing such topics to progress, where a lot of people would not know how)
- I can vouch for Eli’s ability to not-repeat private content that he says he won’t repeat.
- I personally highly value Eli’s literal-like or autistic-like tendency to just actually stick with what is being said, and to attempt to facilitate communication, without guessing ahead of time which party is “mature” or “right” or to-be-secretly-sided with. This is perhaps the area in which I have most noticed Eli’s skills/habits rising above (in my preference-ordering) those of other skilled facilitators I’ve worked with.
- He responds pretty well to feedback, and acts so as to try to find out how to actually aid thinking/communication rather than to feel as though he is already doing so.
Scott Garrabrant:
I once went to a workshop and participated in a fishbowl double crux on the second to last day. That day went so well that we basically replaced all of the last day’s schedule with continuing the conversation, and that day went so well that we canceled plane tickets and extended the workshop. This experience made me very optimistic about what can be accomplished with a facilitated double crux.
Later, when asked to give a talk at a different workshop, I declined and suggested that talks were boring and we should replace several talk slots with fishbowl double cruxes. We tried it. It was a failure, and I don’t think much of value came out of any of the resulting conversations.
As far as I can tell, the second largest contributor to the relative failure was regression to the mean. The first largest was not having Eli there.
Evan Hubinger:
I really appreciate Eli’s facilitation and I think that the hard conversations I’ve had with Eli facilitating would have been essentially impossible without good facilitation. I do think that trusting the facilitator is very important, but if you know and trust Eli as I do, I would definitely recommend his facilitation if you have a need for it.
Oliver Habryka:
I’ve asked Eli many times over the years to help me facilitate conversations that seemed particularly important and difficult. For most of these, having them happen at all without Eli seems quite difficult, so simply the presence of his willingness to facilitate, and to be reasonably well-known to be reasonable in his facilitation, provided a substantial amount of value.
He is also pretty decent at facilitation, as far as I can tell, or at least I can’t really think of anyone who is substantially more skilled at it.
It’s kind of hard for me to give a super clear review here. Like, facilitation isn’t much of a commodity, and I don’t think there is a shared standard of what a facilitator is supposed to do, so it’s hard for me to straightforwardly evaluate it. I do think what Eli has been doing has been quite valuable to me, and I would recommend reasonably strongly that other people have more conversations of the type that Eli tends to facilitate.
Mathew Fallshaw
In 2017 I was engaged in a complicated discussion with a collaborator that was not progressing smoothly. Eli joined the discussion, in the role of facilitator, and the discussion markedly improved.
Other people who have some experience with my facilitation style, feel free to put your own thoughts in the comments.
Caveats and other info
As noted, this is an open research-ish project for me, and I obviously cannot guarantee that I will be helpful, much less that I will be able to resolve or get to the bottom of a given disagreement. In fact, as stated, I, personally, am most interested in the cases where I don’t know how to help, because those are the places where I’m most likely to learn the most, even if they are the places where I am least able to provide value.
You are always welcome to invite me to try and help, and then partway through, decide that my suggestions are less-than helpful, and say that you don’t want my help after all. (Anna Salamon does this moderately frequently.)
I do my best to keep track of a map of relevant skills in this area, and which people around have more skill than me in particular sub-domains. So it is possible that when you describe your situation, I’ll either suggest someone else who I think might be better to help you than me, or who I would like to bring in to co-facilitate with me (with your agreement, of course).
Note that this is one of a number of projects, involving difficult conversations or facilitation, that I am experimenting with lately. Another is here and another is to be announced.
If you’re interested in training sessions on Double Crux and other Conversational Facilitation skills, join my Double Crux training mailing list, here. I have vague plans to do a 3-weekend training program, covering my current take on the core Double Crux skill, but no guarantees that I will actually end up doing that any time soon.
I am curious how good you think the conversation/facilitation was in the AI takeoff double crux between Oliver Habryka and Buck Shlegeris. I am looking for something like “the quality of facilitation at that event was X percentile among all the conversation facilitation I have done”.
[I wrote a much longer and more detailed comment, and then decided that I wanted to think more about it. In lieu of posting nothing, here’s a short version.]
I mean I did very little facilitation one way or the other at that event, so I think my counterfactual impact was pretty minimal.
In terms of my value added, I think that one was in the bottom 5th percentile?
In terms of how useful that tiny amount of facilitation was, maybe 15 to 20th percentile? (This is a little weird, because quantity and quality are related. More active facilitation has a quality span: active (read: a lot of) facilitation can be much more helpful when it is good and much more disruptive / annoying / harmful, when it is bad, compared to less active backstop facilitation,
Overall, the conversation served the goals of the participants and had a median outcome for that kind of conversation, which is maybe 30th percentile, but there is a long right tail of positive outcomes (and maybe I am messing up how to think about percentile scores with skewed distributions).
The outcome that occured (“had an interesting conversation, and had some new thoughts / clarifications”) is good but also far below the sort of outcome that I’m ussually aiming for (but often missing), of substantive, permanent (epistemic!) change to the way that one or both of the people orient on this topic.
I’ve gotten very little out of books in this area.
It is a little afield, but strongly recommend the basic NVC book: Nonviolent Communication: A Language for Life. I recommend that at minimum, everyone read at least the first two chapters, which is something like 8 pages long, and has the most content in the book. (The rest of the book is good too, but it is mostly examples.)
Also, people I trust have gotten value out of How to Have Impossible Conversations. This is still on my reading stack though (for this month, I hope), so I don’t personally recommend it. My expectation, from not having read it yet, is that it will cover the basics pretty well.
That no one rebuilt old OkCupid updates me a lot about how much the startup world actually makes the world better
The prevailing ideology of San Francisco, Silicon Valley, and the broader tech world, is that startups are an engine (maybe even the engine) that drives progress towards a future that’s better than the past, by creating new products that add value to people’s lives.
I now think this is true in a limited way. Software is eating the world, and lots of bureaucracy is being replaced by automation which is generally cheaper, faster, and a better UX. But I now think that this narrative is largely propaganda.
It’s been 8 years since Match bought and ruined OkCupid and no one, in the whole tech ecosystem, stepped up to make a dating app even as good as old OkC is a huge black mark against the whole SV ideology of technology changing the world for the better.
Finding a partner is such a huge, real, pain point for millions of people. The existing solutions are so bad and extractive. A good solution has already been demonstrated. And yet not a single competent founder wanted to solve that problem for planet earth, instead of doing something else, that (arguably) would have been more profitable. At minimum, someone could have forgone venture funding and built this as a cashflow business.
It’s true that this is a market that depends on economies of scale, because the quality of your product is proportional to the size of your matching pool. But I don’t buy that this is insurmountable. Just like with any startup, you start by serving a niche market really well, and then expand outward from there. (The first niche I would try for is by building an amazing match-making experience for female grad students at a particular top university. If you create a great experience for the women, the men will come, and I’d rather build an initial product for relatively smart customers. But there are dozens of niches one could try for.)
But it seems like no one tried to recreate OkC, much less creating something better, until the manifold team built manifold.love (currently in maintenance mode)? Not that no one succeeded. To my knowledge, no else one even tried. Possibly Luna counts, but I’ve heard through the grapevine that they spent substantial effort running giant parties, compared to actually developing and launching their product—from which I infer that they were not very serious. I’ve been looking for good dating apps. I think if a serious founder was trying seriously, I would have heard about it.
Thousands of funders a year, and no one?!
That’s such a massive failure, for almost a decade, that it suggests to me that the SV ideology of building things that make people’s lives better is broadly propaganda. The best founders might be relentlessly resourceful, but a tiny fraction of them seem to be motivated by creating value for the world, or this low hanging fruit wouldn’t have been left hanging for so long.
This is of course in addition to the long list of big tech companies who exploit their network-effect monopoly power to extract value from their users (often creating negative societal externalities in the process), more than creating value for them. But it’s a weaker update that there are some tech companies that do ethically dubious stuff, compared to the stronger update that there was no startup that took on this obvious, underserved, human problem.
My guess is that the tech world is a silo of competence (because competence is financially rewarded), but operates from an ideology with major distortions / blindspots, that are disconnected from commonsense reasoning about what’s Good. eg following profit incentives, and excitement about doing big things (independent from whether those good things have humane or inhumane impacts) off a cliff.
Basically: I don’t blame founders or companies for following their incentive gradients, I blame individuals/society for being unwilling to assign reasonable prices to important goods.
I think the bad-ness of dating apps is downstream of poor norms around impact attribution for matches made. Even though relationships and marriages are extremely valuable, individual people are not in the habit of paying that to anyone.
Like, $100k or a year’s salary seems like a very cheap value to assign to your life partner. If dating apps could rely on that size of payment when they succeed, then I think there could be enough funding for something at least a good small business. But I’ve never heard of anyone actually paying anywhere near that. (myself included—though I paid a retroactive $1k payment to the person who organized the conference I met my wife at)
I think keeper.ai tries to solve this with large bounties on dating/marriages, it’s one of the things I wish we pushed for more on Manifold Love. It seems possible to build one for the niche of “the ea/rat community”; Manifold Love, the checkboxes thing, dating docs got pretty good adoption for not that much execution.
(Also: be the change! I think building out OKC is one of the easiest “hello world” software projects one could imagine, Claude could definitely make a passable version in a day. Then you’ll discover a bunch of hard stuff around getting users, but it sure could be a good exercise.)
Mm I think it’s hard to get optimal credit allocation, but easy to get half-baked allocation, or just see that it’s directionally way too low? Like sure, maybe it’s unclear whether Hinge deserves 1% or 10% or ~100% of the credit but like, at a $100k valuation of a marriage, one should be excited to pay $1k to a dating app.
Like, I think matchmaking is very similarly shaped to the problem of recruiting employees, but there corporations are more locally rational about spending money than individuals, and can do things like pay $10k referral bonuses, or offer external recruiters 20% of their referee’s first year salary.
I’ve started writing a small research paper on this, using mathematical framework, and understood that I had long conflated Shapley values with ROSE values. Here’s what I found, having corrected that error.
ROSE bargaining satisfies Efficiency, Pareto Optimality, Symmetry*, Maximin Dominance and Linearity—a bunch of important desiderata. Shapley values, on other hand, don’t satisfy Maximin Dominance so someone might unilaterally reject cooperation; I’ll explore ROSE equilibrium below.
Subjects: people and services for finding partners.
By Proposition 8.2, ROSE value remains same if moves transferring money within game are discarded. Thus, we can assume no money transfers.
By Proposition 11.3, ROSE value for dating service is equal or greater than its maximin.
By Proposition 12.2, ROSE value for dating service is equal or less than its maximum attainable value.
There’s generally one move for a person to maximize their utility: use the dating service with highest probability of success (or expected relationship quality) available.
There are generally two moves for a service: to launch or not to launch. First involves some intrinsic motivation and feeling of goodness minus running costs, the second option has value of zero exactly.
For a large service, running costs (including moderation) exceed much realistic motivation. Therefore, maximum and maximin values for it are both zero.
From (7), (3) and (4), ROSE value for large dating service is zero.
Therefore, total money transfers to a large dating service equal its total costs.
So, why yes or why no?
By the way, Shapley values suggest paying a significant sum! Given value of a relationship of $10K (can be scaled), and four options for finding partners (0:p0=0.03 -- self-search, α:pα=0.09 -- friend’s help, β:pβ=0.10 -- dating sites, γ:pγ=0.70 -- the specialized project suggested up the comments), the Shapley-fair price per success would be respectively $550, $650 and $4400.
P.S. I’m explicitly not open to discussing what price I’d be cheerful to pay to service which would help to build relationships. In this thread, I’m more interested in whether there are new decision theory developments which would find maximin-satisfying equilibria closer to Shapley one.
I don’t think one can coherently value marriage 20 times as much as than a saved life ($5k as GiveWell says)? Indeed there is more emotional attachment to a person who’s your partner (i.e. who you are emotionally attached to) than to a random human in the world, but surely not that much?
And if a marriage is valued at $10k, then the credit assignment 1%/10% would make the allocation $100/$1000 - and it seems that people really want to round the former towards zero
I mean, it’s obviously very dependent on your personal finance situation but I’m using $100k as an order of magnitude proxy for “about a years salary”. I think it’s very coherent to give up a year of marginal salary in exchange for finding the love of your life, rather than like $10k or ~1mo salary.
Of course, the world is full of mispricings, and currently you can save a life for something like $5k. I think these are both good trades to make, and most people should have a portfolio that consists of both “life partners” and “impact from lives saved” and crucially not put all their investment into just one or the other.
It’s possible no one tried literally “recreate OkC”, but I think dating startups are very oversubscribed by founders, relative to interest from VCs [1][2][3] (and I think VCs are mostly correct that they won’t make money [4][5]).
(Edit: I want to note that those are things I found after a bit of googling to see if my sense of the consensus was borne out; they are meant in the spirit of “several samples of weak evidence”)
I don’t particularly believe you that OkC solves dating for a significant fraction of people. IIRC, a previous time we talked about this, @romeostevensit suggested you had not sufficiently internalised the OkCupid blog findings about how much people prioritised physical attraction.
You mention manifold.love, but also mention it’s in maintenance mode – I think because the type of business you want people to build does not in fact work.
I think it’s fine to lament our lack of good mechanisms for public good provision, and claim our society is failing at that. But I think you’re trying to draw an update that’s something like “tech startups should be doing an unbiased search through viable valuable business, but they’re clearly not”, or maybe, “tech startups are supposed to be able to solve a large fraction of our problems, but if they can’t solve this, then that’s not true”, and I don’t think either of these conclusions seem that licensed from the dating data point.
I agree that more people should be starting revenue-funded/bootstrapped businesses (including ones enabled by software/technology).
The meme is that if you’re starting a tech company, it’s going to be a VC-funded startup. This is, I think, a meme put out by VCs themselves, including Paul Graham/YCombinator, and it conflates new software projects and businesses generally with a specific kind of business model called the “tech startup”.
Not every project worth doing should be a business (some should be hobbies or donation-funded) and not every business worth doing should be a VC-funded startup (some should be bootstrapped and grow from sales revenue.)
The VC startup business model requires rapid growth and expects 30x returns over a roughly 5-10 year time horizon. That simply doesn’t include every project worth doing. Some businesses are viable but are simply not likely to grow that much or that fast; some projects shouldn’t be expected to be profitable at all and need philanthropic support.
I think the narrative that “tech startups are where innovation happens” is...badly incomplete, but still a hell of a lot more correct than “tech startups are net destructive”.
Think about new technologies; then think about where they were developed. That process can ever happen end-to-end within a startup, but more often I think innovative startups are founded around IP developed while the founders were in academia; or the startup found a new use for open-source tools or tools developed within big companies. There simply isn’t time to solve particularly hard technical problems if you have to get to profitability and 30x growth in 5 years. The startup format is primarily designed for finding product-market fit—i.e. putting together existing technologies, packaging them as a “product” with a narrative about what and who it’s for, and tweaking it until you find a context where people will pay for the product, and then making the whole thing bigger and bigger. You can do that in 5 years. But no, you can’t do literally all of society’s technological innovation within that narrow context!
(Part of the issue is that we still technically count very big tech companies as “startups” and they certainly qualify as “Silicon Valley”, so if you conflate all of “tech” into one big blob it includes the kind of big engineering-heavy companies that have R&D departments with long time horizons. Is OpenAI a “tech startup”? Sure, in that it’s a recently founded technology company. But it is under very different financial constraints from a YC startup.)
But I think you’re trying to draw an update that’s something like “tech startups should be doing an unbiased search through viable valuable business, but they’re clearly not”, or maybe, “tech startups are supposed to be able to solve a large fraction of our problems, but if they can’t solve this, then that’s not true”, and I don’t think either of these conclusions seem that licensed from the dating data point.
Neither of those, exactly.
I’m claiming that the narrative around the startup scene is that they are virtuous engines of [humane] value creation (often in counter to a reactionary narrative that “big tech” is largely about exploitation and extraction). It’s about “changing the world” (for the better).
This opportunity seems like a place where one could have traded meaningfully large personal financial EV for enormous amounts of humane value. Apparently no founder wanted to take that trade. Because I would expect there to be variation in how much funders are motivated by money vs. making a mark on the world vs. creating value vs. other stuff, that fact that (to my knowledge) no founder went for it, is evidence about the motivations of the whole founder class. The number of founders who are more interested in creating something that helps a lot of people than they are in making a lot of money (even if they’re interested in both) is apparently very small.
Now, maybe startups actually do create lots of humane value, even if they’re created by founders and VC’s motivated by profit. The motivations of of the founders are only indirect evidence about the effects of startups.
But the tech scene is not motivated to optimize for this at all?? That sure does update me about how much the narrative is true vs. propaganda.
Now if I’m wrong and old OkCupid was only drastically better for me and my unusually high verbal intelligence friends, and it’s not actually better than the existing offerings for the vast majority of people, that’s a crux for me.
You mention manifold.love, but also mention it’s in maintenance mode – I think because the type of business you want people to build does not in fact work.
Manifold.Love is going into maintenance mode while we focus on our core product. We hope to return with improvements once we have more bandwidth; we’re still stoked on the idea of a prediction market-based dating app!
It sounds less like they found it didn’t work, and more like they have other priorities and aren’t (currently) relentlessly pursing this one.
I worked at Manifold but not on Love. My impression from watching and talking to my coworkers was that it was a fun side idea that they felt like launching and seeing if it happened to take off, and when it didn’t they got bored and moved on. Manifold also had a very quirky take on it due to the ideology of trying to use prediction markets as much as possible and making everything very public. I would advise against taking it seriously as evidence that an OKC-like product is a bad idea or a bad business.
I would guess they tried it because they hoped it would be competitive with their other product, and sunset it because that didn’t happen with the amount of energy they wanted to allocate to the bet. There may also have been an element of updating more about how much focus their core product needed.
I only skimmed the retrospective now, but it seems mostly to be detailing problems that stymied their ability to find traction.
I only skimmed the retrospective now, but it seems mostly to be detailing problems that stymied their ability to find traction.
Right. But they were not relentlessly focused on solving this problem.
I straight up don’t believe that that the problems outlined can’t be surmounted, especially if you’re going for a cashflow business instead of an exit.
The market is much more crowded now. A new old okcupid service would be competing against okcupid as well as everything else. And okcupid has a huge advantage in an existing userbase.
And, OKCupid’s algorithm still exists, sort of. And you can write as much as you like. What aspect of the old site do you think was critically different?
I just don’t think there’s barely a cent to be made in launching yet another dating app. So you can’t blame people for not doing it.
I think the biggest advantage of old OKC was that more people used it; now people are spread across hinge and bumble as well as Tinder.
The fact that there’s a sex recession is pretty suggestive that tinder and the endless stream of tinder clones doesn’t serve people very well.
Even if you don’t assess potential romantic partners by reading their essays, like I do, OkC’s match percentage meant that you could easily filter out 95% of the pool to people who are more likely to be compatible with you, along whatever metrics of compatibility you care about.
OKcupid is certainly a better product for hundreds of thousands, or possibly millions, of unusually literate people, including ~all potential developers and most people in their social circles. It’s not a small niche.
I didn’t say Silicon Valley is bad. I said that the narrative about Silicon Valley is largely propagnada, which can be true independently of how good or bad it is, in absolute terms, or relative to the rest of the world.
I spend a lot of time trying to build skills, because I want to be awesome. But there is something off about that.
I think I should just go after things that I want, and solve the problems that come up on the way. The idea of building skills sort of implies that if I don’t have some foundation or some skill, I’ll be blocked, and won’t be able to solve some thing in the way of my goals.
But that doesn’t actually sound right. Like it seems like the main important thing for people who do incredible things is their ability to do problem solving on the things that come up, and not the skills that they had previously built up in a “skill bank”.
Raw problem solving is the real thing and skills are cruft. (Or maybe not cruft per se, but more like a side effect. The compiled residue of previous problem solving. Or like a code base from previous project that you might repurpose.)
Part of the problem with this is that I don’t know what I want for my own sake, though. I want to be awesome, which in my conception, means being able to do things.
I note that wanting “to be able to do things” is a leaky sort of motivation: because the victory condition is not clearly defined, it can’t be crisply compelling, and so there’s a lot of waste somehow.
The sort of motivation that works is simply wanting to do something, not wanting to be able to do something. Like specific discrete goals that one could accomplish, know that one accomplished, and then (in most cases) move on from.
But most of the things that I want by default are of the sort “wanting to be able to do”, because if I had more capabilities, that would make me awesome.
But again, that’s not actually conforming with my actual model of the world. The thing that makes someone awesome is general problem solving capability, more than specific capacities. Specific capacities are brittle. General problem solving is not.
I guess that I could pick arbitrary goals that seem cool. But I’m much more emotionally compelled by being able to do something instead of doing something.
But I also think that I am notably less awesome and on a trajectory to be less awesome over time, because my goals tend to be shaped in this way. (One of those binds whereby if you go after x directly, you don’t get x, but if you go after y, you get x as a side effect.)
I’m not sure what to do about this.
Maybe meditate on, and dialogue with, my sense that skills are how awesomeness is measured, as opposed to raw, general problem solving.
Maybe I need to undergo some deep change that causes me to have different sorts of goals at a deep level. (I think this would be a pretty fundamental shift in how I engage with the world: from a virtue ethics orientation (focused on one’s own attributes) to one of consequentialism (focused on the states of the world).)
There are some exceptions to this, goals that are more consequentialist (although if you scratch a bit, you’ll find they’re about living an ideal of myself, more than they are directly about the world), including wanting a romantic partner who makes me better (note that “who makes me better is” is virtue ethics-y), and some things related to my moral duty, like mitigating x-risk. These goals do give me grounding in sort of the way that I think I need, but they’re not sufficient? I still spend a lot of time trying to get skills.
Your seemingly target-less skill-building motive isn’t necessarily irrational or non-awesome. My steel-man is that you’re in a hibernation period, in which you’re waiting for the best opportunity of some sort (romantic, or business, or career, or other) to show up so you can execute on it. Picking a goal to focus on really hard now might well be the wrong thing to do; you might miss a golden opportunity if your nose is at the grindstone. In such a situation a good strategy would, in fact, be to spend some time cultivating skills, and some time in existential confusion (which is what I think not knowing which broad opportunities you want to pursue feels like from the inside).
The other point I’d like to make is that I expect building specific skills actually is a way to increase general problem solving ability; they’re not at odds. It’s not that super specific skills are extremely likely to be useful directly, but that the act of constructing a skill is itself trainable and a significant part of general problem solving ability for sufficiently large problems. Also, there’s lots of cross-fertilization of analogies between skills; skills aren’t quite as discrete as you’re thinking.
Skills and problem-solving are deeply related. The basics of most skills are mechanical and knowledge-based, with some generalization creeping in on your 3rd or 4th skill in terms of how to learn and seeing non-obvious crossover. Intermediate (say, after the first 500 to a few thousand hours) use of skills requires application of problem-solving within the basic capabilities of that skill. Again, you get good practice within a skill, and better across a few skills. Advanced application in many skills is MOSTLY problem-solving. How to apply your well-indexed-and-integrated knowledge to novel situations, and how to combine that knowledge across domains.
I don’t know of any shortcuts, though—it takes those thousands of hours to get enough knowledge and basic techniques embedded in your brain that you can intuit what avenues to more deeply explore in new applications.
There is a huge amount of human variance—some people pick up some domains ludicrously easily. This is a blessing and a curse, as it causes great frustration when they hit a domain that they have to really work at. Others have to work at everything, and never get their Nobel, but still contribute a whole lot of less-transformational “just work” within the domains they work at.
I don’t know whether this resembles your experience at all, but for me, skills translate pretty directly to moment-to-moment life satisfaction, because the most satisfying kind of experience is doing something that exercises my existing skills. I would say that only very recently (in my 30s) do I feel “capped out” on life satisfaction from skills (because I am already quite skilled at almost everything I spend all my time doing) and I have thereby begun spending more time trying to do more specific things in the world.
Seems to me there is some risk either way. If you keep developing skills without applying them to a specific goal, it can be a form of procrastination (an insidious one, because it feels so virtuous). There are many skills you could develop, and life is short. On the other hand, as you said, if you go right after your goal, you may find an obstacle you can’t overcome… or even worse, an obstacle you can’t even properly analyze, so the problem is not merely that you don’t have the necessary skill, but that you even have no idea which skill you miss (so if you try to develop the skills as needed, you may waste time developing the wrong skills, because you misunderstood the nature of the problem).
it seems like the main important thing for people who do incredible things is their ability to do problem solving on the things that come up, and not the skills that they had previously built up in a “skill bank”.
It could be both. And perhaps you notice the problem-specific skills more, because those are rare.
But I also kinda agree that the attitude is more important, and skills often can be acquired when needed.
So… dunno, maybe there are two kinds of skills? Like, the skills with obvious application, such as “learn to play a piano”; and the world-modelling skills, such as “understand whether playing a piano would realistically help you accomplish your goals”? You can acquire the former when needed, but you need the latter in advance, to remove your blind spots?
Or perhaps some skills such as “understand math” are useful in many kinds of situations and take a lot of time to learn, so you probably want to develop these in advance? (Also, if you don’t know yet what to do, it probably helps to get power: learn math, develop social skills, make money… When you later make up your mind, you will likely find some of this useful.)
And maybe you need the world-modelling skills before you make specific goals, because how could your goal be to learn play the piano, if you don’t know the piano exists? You could have a more general goal, such as “become famous at something”, but if you don’t know that piano exists, maybe you wouldn’t even look in this direction.
But most of the things that I want by default are of the sort “wanting to be able to do”, because if I had more capabilities, that would make me awesome.
Could this also be about your age? (I am assuming here that you are young.) For younger people it makes more sense to develop general skills; for older people it makes more sense to go after specific goals. The more time you have ahead of you, the more meta you can go—the costs of acquiring a skill are the same, but the possible benefits of having the skill are proportional to your remaining time (more than linear, if you actually use the skill, because it will keep increasing as a side effect of being used).
Also, as a rule of thumb, younger people are judged by their potential, older people are judged by their accomplishments. If you are young, evolution wants you to feel awesome about having skills, because that’s what your peers will admire. You signal general intelligence. The accomplishments you have… uhm, how to put it politely… if you see a 20 years old kid driving an expensive car, your best guess is that their parents have bought it, isn’t it? On the other hand, an older person without accomplishments seems like a loser, regardless of their apparent skills, because there is something suspicious about them not having translated those skills into actual outcomes. The excuse for the young ones is that their best strategy is to acquire skills now, and apply them later (which hasn’t happened yet, but there is enough time remaining).
Based on your language here, it feels to me like you’re in the contemplation stage along the stages of change.
So the very first thing I’d say is to not feel the desire to jump ahead and “get started on a goal right now.” That’s jumping ahead in the stages of change, and will likely create a relapse. I will predict that there’s a 50% chance that if you continue thinking about this without “forcing it”, you’ll have started in on a goal (action stage) within 3 months.
I’m pretty convinced that they key to getting yourself to do stuff is “Creative Tension”—creating a clear internal tension between the end state that feels good and the current state that doesn’t feel as good. There are 4 ways I know to go about generating internal tension:
Develop a strong sense of self, and create tension between the world where you’re fully expressing that self and the world where you’re not.
Develop a strong sense of taste, and create tension between the beautiful things that could exist and what exists now.
Develop a strong pain, and create tension between the world where you have that pain and the world where you’ve solved it.
Develop a strong vision, and create tension between the world as it is now and the world as it would be in your vision.
One especially useful trick that worked for me coming from the “just develop myself into someone awesome” place was tying the vision of the awesome person I could be with the vision of what I’d achieved—that is, in m vision of the future, including a vision of the awesome person I had to become in order to reach that future.
I then would deliberately contrast where I was now with that compelling vision/self/taste with where I was. Checking in with that vision every morning, and fixing areas of resistance when they arise, is what keeps me motivated.
I do have a workshop that I run on exactly how to create that vision that’s tied with sense of self and taste, and then how to use it to generate creative tension. Let me know if something like that would be helpful to you.
I’m no longer sure that I buy dutch book arguments, in full generality, and this makes me skeptical of the “utility function” abstraction
Thesis: I now think that utility functions might be a pretty bad abstraction for thinking about the behavior of agents in general including highly capable agents.
[Epistemic status: half-baked, elucidating an intuition. Possibly what I’m saying here is just wrong, and someone will helpfully explain why.]
Over the past years, in thinking about agency and AI, I’ve taken the concept of a “utility function” for granted as the natural way to express an entity’s goals or preferences.
Of course, we know that humans don’t have well defined utility functions (they’re inconsistent, and subject to all kinds of framing effects), but that’s only because humans are irrational. To the extent that a thing acts like an agent, it’s behavior corresponds to some utility function. That utility function might not be explicitly represented, but if an agent is rational, there’s some utility function that reflects it’s preferences.
Given this, I might be inclined to scoff at people who scoff at “blindly maximizing” AGIs. “They just don’t get it”, I might think. “They don’t understand why agency has to conform to some utility function, and an AI would try to maximize expected utility.”
Currently, I’m not so sure. I think that talking in terms of utility functions is biting a philosophical bullet, and importing some unacknowledged assumptions. Rather than being the natural way to conceive of preferences and agency, I think utility functions might be only one possible abstraction, and one that emphasizes the wrong features, giving a distorted impression of what agents, in general, are actually like.
I want to explore that possibility in this post.
Before I begin, I want to make two notes.
First, all of this is going to be hand-wavy intuition. I don’t have crisp knock-down arguments, only a vague discontent. But it seems like more progress will follow if I write up my current, tentative, stance even without formal arguments.
Second, I don’t think utility functions being a poor abstraction for agency in the real world has much bearing on whether there is AI risk. As I’ll discuss, it might change the shape and tenor of the problem, but highly capable agents with alien seed preferences are still likely to be catastrophic to human civilization and human values. I mention this because the sentiments expressed in this essay are casually downstream of conversations that I’ve had with skeptics about whether there is AI risk at all. So I want to highlight: I think I was mistakenly overlooking some philosophical assumptions, but that is not a crux.
Is coherence overrated?
The tagline of the “utility” page on arbital is “The only coherent way of wanting things is to assign consistent relative scores to outcomes.”
This is true as far as it goes, but to me, at least, that sentence implies a sort of dominance of utility functions. “Coherent” is a technical term, with a precise meaning, but it also has connotations of “the correct way to do things”. If someone’s theory of agency is incoherent, that seems like a mark against it.
But it is possible to ask, “What’s so good about coherence anyway? Maybe
The standard reply of course, is that if your preferences are incoherent, you’re dutchbookable, and someone will pump you for money.
But I’m not satisfied with this argument. It isn’t obvious that being dutch booked is a bad thing.
Suppose I tell you that I prefer pineapple to mushrooms on my pizza. Suppose you’re about to give me a slice of mushroom pizza; but by paying one penny ($0.01) I can instead get a slice of pineapple pizza (which is just as fresh from the oven). It seems realistic to say that most people with a pineapple pizza preference would probably pay the penny, if they happened to have a penny in their pocket. 1
After I pay the penny, though, and just before I’m about to get the pineapple pizza, you offer me a slice of onion pizza instead—no charge for the change! If I was telling the truth about preferring onion pizza to pineapple, I should certainly accept the substitution if it’s free.
And then to round out the day, you offer me a mushroom pizza instead of the onion pizza, and again, since I prefer mushrooms to onions, I accept the swap.
I end up with exactly the same slice of mushroom pizza I started with… and one penny poorer, because I previously paid $0.01 to swap mushrooms for pineapple.
This seems like a qualitatively bad behavior on my part.
Eliezer asserts that this is “qualitatively bad behavior.” But I think that this is biting a philosophical bullet.
As an intuition pump: In the actual case of humans, we seem to get utility not from states of the world, but from changes in states of the world. So it isn’t unusual for a human to pay to cycle between states of the world.
For instance, I could imagine a human being hungry, eating a really good meal, feeling full, and then happily paying a fee to be instantly returned to their hungry state, so that they can enjoy eating a good meal again.
This is technically a dutch booking (which do they prefer, being hungry or being full?), but from the perspective of the agent’s values there’s nothing qualitatively bad about it. Instead of the dutchbooker pumping money from the agent, he’s offering a useful and appreciated service.
Of course, we can still back out a utility function from this dynamic: instead of having a mapping of ordinal numbers to world states, we can have one from ordinal numbers to changes from world state to another.
But that just passes the buck one level. I see no reason in principle that an agent might have a preference to rotate between different changes in the world, just as well as rotating different between states of the world.
But this also misses the central point. I think you can always construct a utility function that represents some behavior. But if one is no longer compelled by dutch book arguments, this begs the question of why we would want to do that. If coherence is no longer a desiderata, it’s no longer clear that a utility function is that natural way to express preferences.
And I wonder, maybe this also applies to agents in general, or at least the kind of learned agents that humans are likely to build via gradient descent.
Maximization behavior
I think this matters, because many of the classic AI risk arguments go through a claim that maximization behavior is convergent. If you try to build a satisficer, there are a number of pressures for it to become a maximizer of some kind. (See this Rob Miles video, for instance)
I think that most arguments of that sort depend on an agent acting according to an expected utility maximization framework. And utility maximization turns out not to be a good abstraction for agents in the real world, I don’t know if these arguments are still correct.
I posit that straightforward maximizers are rare in the multiverse, and that most evolved or learned agents are better described by some other abstraction.
If not utility functions, then what?
If we accept for the time being that utility functions are a warped abstraction for most agents, what might a better abstraction be?
I don’t know. I’m writing this post in the hopes that others will think about this question and perhaps come up with productive alternative formulations.
I’ll post some of my half-baked thoughts on this question shortly.
I’ve long been somewhat skeptical that utility functions are the right abstraction.
My argument is also rather handwavy, being something like “this is the wrong abstraction for how agents actually function, so even if you can always construct a utility function and say some interesting things about its properties, it doesn’t tell you the thing you need to know to understand and predict how an agent will behave”. In my mind I liken it to the state of trying to code in functional programming languages on modern computers: you can do it, but you’re also fighting an uphill battle against the way the computer is physically implemented, so don’t be surprised if things get confusing.
And much like in the utility function case, people still program in functional languages because of the benefits they confer. I think the same is true of utility functions: they confer some big benefits when trying to reason about certain problems, so we accept the tradeoffs of using them. I think that’s fine so long as we have a morphism to other abstractions that will work better for understanding the things that utility functions obscure.
Utility functions are especially problematic in modeling behaviour for agents with bounded rationality, or those where there are costs of reasoning. These include every physically realizable agent.
For modelling human behaviour, even considering the ideals of what we would like human behaviour to achieve, there are even worse problems. We can hope that there is some utility function consistent with the behaviour we’re modelling and just ignore cases where there isn’t, but that doesn’t seem satisfactory either.
[This is a draft, to be posted on LessWrong soon.]
I’ve spent a lot of time developing tools and frameworks for bridging “intractable” disagreements. I’m also the person affiliated with CFAR who has taught Double Crux the most, and done the most work on it.
People often express to me something to the effect, “The important thing about Double Crux is all the low level habits of mind: being curious, being open to changing your mind, paraphrasing to check that you’ve understood, operationalizing, etc. The ‘Double Crux’ framework, itself is not very important.”
I half agree with that sentiment. I do think that those low level cognitive and conversational patterns are the most important thing, and at Double Crux trainings that I have run, most of the time is spent focusing on specific exercises to instill those low level TAPs.
However, I don’t think that the only value of the Double Crux schema is in training those low level habits. Double cruxes are extremely powerful machines that allow one to identify, if not the most efficient conversational path, a very high efficiency conversational path. Effectively navigating down a chain of Double Cruxes is like magic. So I’m sad when people write it off as useless.
In this post, I’m going to try and outline the basic Double Crux pattern, the series of 4 moves that makes Double Crux work, and give a (simple, silly) example of that pattern in action.
These four moves are not (always) sufficient for making a Double Crux conversation work, that does depend on a number of other mental habits and TAPs, but this pattern is, according to me, at the core of the Double Crux formalism.
The pattern:
The core Double Crux pattern is as follows. For simplicity, I have described this in the form of a 3-person Double Crux conversation, with two participants and a facilitator. Of course, one can execute these same moves in a 2 person conversation, as one of the participants. But that additional complexity is hard to manage for beginners.
The pattern has two parts (finding a crux, and finding a double crux), and each part is composed of 2 main facilitation moves.
Those four moves are...
Clarifying that you understood the first person’s point.
Checking if that point is a crux
Checking the second person’s belief about the truth value of the first person’s crux.
Checking the if the first person’s crux is also a crux for the second person.
In practice:
[The version of this section on my blog has color coding and special formatting.]
The conversational flow of these moves looks something like this:
Finding a crux of participant 1:
P1: I think [x] because of [y]
Facilitator: (paraphrasing, and checking for understanding) It sounds like you think [x] because of [y]?
P1: Yep!
Facilitator: (checking for cruxyness) If you didn’t think [y], would you change your mind about [x]?
P1: Yes.
Facilitator: (signposting) It sounds like [y] is a crux for [x] for you.
Checking if it is also a crux for participant 2:
Facilitator: Do you think [y]?
P2: No.
Facilitator: (checking for a Double Crux) if you did think [y] would that change your mind about [x]?
P2: Yes.
Facilitator: It sounds like [y] is a Double Crux
[Recurse, running the same pattern on [Y] ]
Obviously, in actual conversation, there is a lot more complexity, and a lot of other things that are going on.
For one thing, I’ve only outlined the best case pattern, where the participants give exactly the most convenient answer for moving the conversation forward (yes, yes, no, yes). In actual practice, it is quite likely that one of those answers will be reversed, and you’ll have to compensate.
For another thing, this formalism is rarely so simple. You might have to do a lot of conversational work to clarify the claims enough that you can ask if B is a crux for A (for instance when B is nonsensical to one of the participants). Getting through each of these steps might take fifteen minutes, in which case rather than four basic moves, this pattern describes four phases of conversation. (I claim that one of the core skills of a savvy facilitator is tracking which stage the conversation is at, which goals have you successfully hit, and which is the current proximal subgoal.)
There is also a judgment call about which person to treat as “participant 1” (the person who generates the point that is tested for cruxyness). As a first order heuristic, the person who is closer to making a positive claim over and above the default, should usually be the “p1”. But this is only one heuristic.
Example:
This is an intentionally silly, over-the-top-example, for demonstrating the the pattern without any unnecessary complexity. I’ll publish a somewhat more realistic example in the next few days.
Two people, Alex and Barbra, disagree about tea: Alex thinks that tea is great, and drinks it all the time, and thinks that more people should drink tea, and Barbra thinks that tea is bad, and no one should drink tea.
Facilitator: So, Barbra, why do you think tea is bad?
Barbra: Well it’s really quite simple. You see, tea causes cancer.
Facilitator: Let me check if I’ve got that: you think that tea causes cancer?
Barbra: That’s right.
Facilitator: Wow. Ok. Well if you found out that tea actually didn’t cause cancer, would you be fine with people drinking tea.
Barbra: Yeah. Really the main thing that I’m concerned with is the cancer-causing. If tea didn’t cause cancer, then it seems like tea would be fine.
Facilitator: Cool. Well it sounds like this is a crux for you Barb. Alex, do you currently think that tea causes cancer?
Alex: No. That sounds like crazy-talk to me.
Facilitator: Ok. But aside from how realistic it seems right now, if you found out that tea actually does cause cancer, would you change your mind about people drinking tea?
Alex: Well, to be honest, I’ve always been opposed to cancer, so yeah, if I found out that tea causes cancer, then I would think that people shouldn’t drink tea.
Facilitator: Well, it sounds like we have a double crux!
In a real conversation, it often doesn’t goes this smoothly. But this is the rhythm of Double Crux, at least as I apply it.
That’s the basic Double Crux pattern. As noted there are a number of other methods and sub-skills that are (often) necessary to make a Double Crux conversation work, but this is my current best attempt at a minimum compression of the basic engine of finding double cruxes.
I made up a more realistic example here, and I’m might make more or better examples.
Eliezer claims that dath ilani never give in to threats. But I’m not sure I buy it.
The only reason people will make threats against you, the argument goes, is if those people expect that you might give in. If you have an iron-clad policy against acting in response to threats made against you, then there’s no point in making or enforcing the threats in the first place. There’s no reason for the threatener to bother, so they don’t. Which means in some sufficiently long run, refusing to submit to threats means you’re not subject to threats.
This seems a bit fishy to me. I have a lingering suspicion that this argument doesn’t apply, or at least doesn’t apply universally, in the real world.
I’m thinking here mainly of a prototypical case of an isolated farmer family (like the early farming families of the greek peninsula, not absorbed into a polis), being accosted by some roving bandits, such as the soldiers of the local government. The bandits say “give us half your harvest, or we’ll just kill you.”
The argument above depends on a claim about the cost of executing on a threat. “There’s no reason to bother” implies that the threatener has a preference not to bother, if they know that the threat won’t work.
I don’t think that assumption particularly applies. For many cases, like the case above, the cost to the threatener of executing on the threat is negligible, or at least small relative to the available rewards. The bandits don’t particularly mind killing the farmers and taking their stuff, if the farmers don’t want to give it up. There isn’t a realistic chance that the bandits, warriors specializing in violence and outnumbering the farmers, will lose a physical altercation.
From the badnits’ perspective their are two options:
Showing up, threatening to kill the farmers, taking away ask much food as they can carry (and then maybe coming back to accost them again next year).
Showing up, threatening to kill the farmers, actually killing the farmers, and then taking away as much food as they can carry.
It might be easier and less costly for the bandits to get what they want by being scary rather than by being violent. But the plunder is definitely enough to make violence worth it if it comes to that. They prefer option 1, but they’re totally willing to fall back on option 2.
It seems like, in this situation, the farmers are probably better off cooperating with the bandits and giving them some food, even knowing that that means that the bandits will come back and demand “taxes” from them every harvest. They’re just better off submitting.
Maybe, decision theoretically, this situation doesn’t count as a threat. The bandits are taking food from the the farmers, one way or the other, and they’re killing the farmers if they try to stop that. They’re not killing the farmers so that they’ll give up their food.
But that seems fishy. Most of the time, the bandits don’t, in fact have to resort to violence. Just showing up and threatening violence is enough to get what they want. The farmers do make the lives of the bandits easier by submitting and giving them much of the harvest without resistance. Doing otherwise would be straightforwardly worse for them.
Resisting the bandits out of a commitment to some notion of decision-theoretic rationality seems exactly analogous to two-boxing in Newcom’s problem, because of a commitment to (causal) decision-theoretic rationality.
You might not want to give in out of spite. “Fuck you. I’d rather die than help you steal from me.” But a dath ilani would say that that’s a matter of the utility function, not of decision theory. You just don’t like submitting to threats, and so will pay big costs to avoid it, not that you’re following a policy that maximizes your payoffs.
So, it seems like the policy has to be “don’t give into threats that are sufficiently costly to execute that the threatener would prefer not to bother, if they knew in advance that you wouldn’t give in”. (And possibly with the additional caveat “if the subjunctive dependence between you and the threatener is sufficiently high.”)
But that’s a much more complicated policy. For one thing, it requires a person-being-threatened to accurately estimate how costly it would be for the threatener to execute their threat (and the threatener is thereby incentivized to deceive them about that).
Hm. But maybe that’s easy to estimate actually, in the cases where the threatener gets a payout of 0, if the person-being-threatened doesn’t cooperate with the threat? Which is the case for most blackmail attempts, for instances, but not necessarily “if you don’t give me some of your harvest, I’ll kill you.”
In lots of case, it seems like it would be ambiguous. Especially when there are large power disparities in favor of the threatener. When someone powerful threatens you the cost of executing the the threat is likely to be small for them, possibly small enough to be negligible. And in those cases, their own spite at you for resisting them might be more than enough reason to act on it.
Eliezer, this is what you get for not writing up the planecrash threat lecture thread. We’ll keep bothering you with things like this until you give in to our threats and write it.
What you’ve hit upon is “BATNA,” or “Best alternative to a negotiated agreement.” Because the robbers can get what they want by just killing the farmers, the dath ilani will give in- and from what I understand, Yudowsky therefore doesn’t classify the original request (give me half your wheat or die) as a threat.
This may not be crazy- it reminds me of the Ancient Greek social mores around hospitality, which seem insanely generous to a modern reader but I guess make sense if the equilibrium number of roving <s>bandits</s> honored guests is kept low by some other force
from what I understand, Yudowsky therefore doesn’t classify the original request (give me half your wheat or die) as a threat.
This seems like it weakens the “don’t give into threats” policy substantially, because it makes it much harder to tell what’s a threat-in-the-technical-sense, and the incentives push of exaggeration and dishonesty about what is or isn’t a threat-in-the-the-technical-sense.
The bandits should always act as if they’re willing to kill the farmers and take their stuff, even if they’re bluffing about their willingness to do violence. The farmers need to estimate whether the bandits are bluffing, and either call the bluff, or submit to the demand-which-is-not-technically-a-threat.
That policy has notably more complexity than just “don’t give in to threats.”
“Anytime someone credibly demands that you do X, otherwise they’ll do Y to you, you should not do X.” This is a simple reading of the “don’t give into threats” policy.
There’s a sort of quiet assumption that should be louder about the dath Ilan fiction: which is that it’s about a world where a bunch of theorems like “as systems of agents get sufficiently intelligent, they gain the ability to coordinate in prisoner’s dilemma like problems” have proofs. You could similarly write fiction set in a world where P=NP has a proof and all of cryptography collapses. I’m not sure whether EY would guess that sufficiently intelligent agents actually coordinate- Just like I could write the P=NP fiction while being pretty sure that P/=NP
Huh, the idea that Greek guest-friendship was a adaption to warriors who would otherwise kill you and take your stuff is something that I had never considered before. Isn’t it generally depicited as a relationship between nobles who, presumably, would be able to repel roving bandits?
Threateners similarly can employ bindings, always enforcing regardless of local cost. A binding has an overall cost from following it in all relevant situations, costs in individual situations are what goes into estimating this overall cost, but individually they are not decision relevant, when deciding whether to commit to a global binding.
In this case opposing commitments effectively result in global enmity (threateners always enforce, targets never give in to threats), so if targets are collectively stronger than threateners, then threateners lose. But this collective strength (for the winning side) or vulnerability (for the losing side) is only channeled through targets or threateners who join their respective binding. If few people join, the faction is weak and loses.
The equilibrium depends on which faction is stronger. Threateners who don’t always enforce and targets who don’t always ignore threats are not parts of this game, so it’s not even about relative positions of threateners and targets, only those that commit are relevant. If the threateners win, targets start mostly giving in to threats, and so for threateners the cost of binding becomes low overall.
I’m talking about the equilibrium where targets are following their “don’t give in to threats” policy. Threateners don’t want to follow a policy of always executing threats in that world—really, they’d probably prefer to never make any threats in that world, since it’s strictly negative EV for them.
If the unyielding targets faction is stronger, the equilibrium is bad for committed enforcers. If the committed enforcer faction is stronger, the equilibrium doesn’t retain high cost of enforcement, and in that world the targets similarly wouldn’t prefer to be unyielding. I think the toy model where that fails leaves the winning enforcers with no pie, but that depends on enforcers not making use of their victory to set up systems for keeping targets relatively defenseless, taking the pie even without their consent. This would no longer be the same game (“it’s not a threat”), but it’s not a losing equilibrium for committed enforcers of the preceding game either.
This distinction of which demands are or aren’t decision-theoretic threats that rational agents shouldn’t give in to is a major theme of the last ~quarter of Planecrash (enormous spoilers in the spoiler text).
Keltham demands to the gods “Reduce the amount of suffering in Creation or I will destroy it”. But this is not a decision-theoretic threat, because Keltham honestly prefers destroying creation to the status quo. If the gods don’t give into his demand, carrying through with his promise is in his own interest.
If Nethys had made the same demand, it would have been a decision-theoretic threat. Nethys prefers the status quo to Creation being destroyed, so he would have no reason to make the demand other than the hope that the other gods would give in.
This theme is brought up many times, but there’s not one comprehensive explanation to link to. (The parable of the little bird is the closest I can think of.)
I’m thinking here mainly of a prototypical case of an isolated farmer family (like the early farming families of the greek peninsula, not absorbed into a polis), being accosted by some roving bandits
The assertion IIUC is not that it never makes sense for anyone to give in to a threat—that would clearly be an untrue assertion—but rather that it is possible for a society to reach a level of internal coordination where it starts to make sense to adopt a categorical policy of never giving in to a threat. That would mean for example that any society member that wants to live in dath ilan’s equivalent of an isolated farm would probably need to formally and publicly relinquish their citizenship to maintain dath ilan’s reputation for never giving in to a threat. Or dath ilan would make it very clear that they must not give in to any threats, and if they do and dath ilan finds out, then dath ilan will be the one that slaughters the whole family. The latter policy is a lot like how men’s prisons work at least in the US whereby the inmates are organized into groups (usually based on race or gang affiliation) and if anyone even hints (where others can hear) that you might give in to sexual extortion, you need to respond with violence because if you don’t, your own group (the main purpose of which is mutual protection from the members of the other groups) will beat you up.
That got a little grim. Should I add a trigger warning? Should I hide the grim parts behind a spoiler tag thingie?
At worst, all the farmers will relentlessly fight to the death, in that case the bandits get one year of food and have to figure something else out next year.
That outcome strictly dominates not stealing any food this year, and needing to figure out something else out both this year and next year.
I don’t recall Eliezer claiming that dath ilani characters never give in to threats. *Dath ilani characters* claim they never give in to threats. My interpretation is that the characters *say* “We don’t give in to threats”, and *believe* it, but it’s not *true*. Rather it’s something between a self-fulfilling prophecy, a noble lie-told-to-children, and an aspiration.
There are few threats in dath ilan, partly because the conceit of dath ilan is that it’s mostly composed of people who are cooperative-libertarianish by nature and don’t want to threaten each other very much, but partly because it’s a political structure where it’s much harder to get threats to actually *work*. One component of that political structure is how people are educated to defy threats by reflex, and to expect their own threats to fail, by learning am idealized system of game theory in which threats are always defied.
However, humans don’t actually follow ideal game theory when circumstances get sufficiently extreme, even dath ilani humans. Peranza can in fact be “shattered in Hell beyond all hope of repair” in the bad timeline, for all that she might rationally “decide not to break”. Similarly when the Head Keeper commits suicide to make a point: “So if anybody did deliberately destroy their own brain in attempt to increase their credibility—then obviously, the only sensible response would be to ignore that, so as not create hideous system incentives. Any sensible person would reason out that sensible response, expect it, and not try the true-suicide tactic.” But despite all that the government sets aside the obvious and sensible policy because, come on, the Head Keeper just blew up her own brain, stop fucking around and get serious. And the Head Keeper, who knows truths about psychology which the members of government do not, *accurately predicted they would respond that way*.
So dath ilani are educated to believe that giving in to threats is irrational, and to believe that people don’t give in to threats. This plus their legal system means that there are few threats, and the threats usually fail, so their belief is usually correct, and the average dath ilani never sees it falsified. Those who think carefully about the subject will realize that threats can sometimes work, in circumstances which are rare in dath ilan, but they’ll also realize that it’s antisocial to go around telling everyone about the limits of their threat-resistance and keep it quiet. The viewpoint characters start believing the dath ilani propaganda but update pretty quickly when removed from dath ilan. Keltham has little trouble understanding the Golarian equilibrium of force and threats once he gets oriented. Thellim presumably pays taxes off camera once she settles in to Earth.
[This is an essay that I’ve had bopping around in my head for a long time. I’m not sure if this says anything usefully new-but it might click with some folks. If you haven’t read Social Status: Down the Rabbit Hole on Kevin Simler’s excellent blog, Melting Asphalt read that first. I think this is pretty bad and needs to be rewritten and maybe expanded substantially, but this blog is called “musings and rough drafts.”]
In this post, I’m going to outline how I think about status. In particular, I want to give a mechanistic account of how status necessarily arises, given some set of axioms, in much the same way one can show that evolution by natural selection must necessarily occur given the axioms of 1) inheritance of traits 2) variance in reproductive success based on variance in traits and 3) mutation.
(I am not claiming any particular skill at navigating status relationships, any more than a student of sports-biology is necessarily a skilled basketball player.)
By “status” I mean prestige-status.
Axiom 1: People have goals.
That is, for any given human, there are some things that they want. This can include just about anything. You might want more money, more sex, a ninja-turtles lunchbox, a new car, to have interesting conversations, to become an expert tennis player, to move to New York etc.
Axiom 2: There are people who control resources relevant to other people achieving their goals.
The kinds of resources are as varied as the goals one can have.
Thinking about status dynamics and the like, people often focus on the particularly convergent resources, like money. But resources that are onlyrelevant to a specific goal are just as much a part of the dynamics I’m about to describe.
Knowing a bunch about late 16th century Swedish architecture is controlling a goal relevant-resource, if someone has the goal of learning more about 16th century Swedish architecture.
Just being a fun person to spend time with (due to being particularly attractive, or funny, or interesting to talk to, or whatever) is a resource relevant to other people’s goals.
Axiom 3: People are more willing to help (offer favors to) a person who can help them achieve their goals.
Simply stated, you’re apt to offer to help a person with their goals if it seems like they can help you with yours, because you hope they’ll reciprocate. You’re willing to make a trade with, or ally with such people, because it seems likely to be beneficial to you. At minimum, you don’t want to get on their bad side.
(Notably, there are two factors that go into one’s assessment of another person’s usefulness: if they control a resource relevant to one of your goals, and if you expect them to reciprocate.
This produces a dynamic where by A’s willingness to ally with B is determined by something like the product of
A’s assessment of B’s power (as relevant to A’s goals), and
A’s assessment of B’s probability of helping (which might translate into integrity, niceness, etc.)
If a person is a jerk, they need to be very powerful-relative-to-your-goals to make allying with them worthwhile.)
All of this seems good so far, but notice that we have up to this point only described individual pair-wise transactions and pair-wise relationships. People speak about “status” as a attribute that someone can possess or lack. How does the dynamic of a person being “high status” arise from the flux of individual transactions?
Lemma 1: One of the resources that a person can control is other people’s willingness to offer them favors
With this lemma, the system folds in on itself, and the individual transactions cohere into a mostly-stable status hierarchy.
Given lemma 1, a person doesn’t need to personally control resources relevant to your goals, they just need to be in a position such that someone who is relevant to your goals will privilege them.
As an example, suppose that you’re introduced to someone who is very well respected in your local social group: person-W. Your assessment might be that W, directly, doesn’t have anything that you need. But because person-W is well-respected by others in your social group are likely to offer favors to him/her. Therefore, it’s useful for person-W to like you, because then they are more apt to call on other people’s favors on your behalf.
(All the usual caveats about has this is subconscious, and humans are adaption-executors and don’t do explicit, verbal assessments of how useful a person will be to them, but rely on emotional heuristics that approximate explicit assessment.)
This causes the mess of status transactions to reinforce and stabilize into a mostly-static hierarchy. The mass of individual A-privileges-B-on-the-basis-of-A’s-goals flattens out, into each person having a single “score” which determines to what degree each other person privileges them.
(It’s a little more complicated than that because people who have access to their own resources have less need of help from other. So a person’s effective status (the status-level at which you treat them is closer to their status minus your status. But this is complicated again because people are motivated not to be dicks (that’s bad for business), and respecting other people’s status is important to not being a dick.)
Related: The red paperclip theory of status describes status as a form of optimization power, specifically one that can be used to influence a group.
The name of the game is to convert the temporary power gained from (say) a dominance behaviour into something further, bringing you closer to something you desire: reproduction, money, a particular social position...
I’ve offered to be a point person for folks who believe that they were severely impacted by Leverage 1.0, and have related information, but who might be unwilling to share that info, for any of a number of reasons.
In short,
If someone wants to tell me private meta-level information (such as “I don’t want to talk about my experience publicly because X”), so that I can pass along in an anonymized way to someone else (including Geoff, Matt Fallshaw, Oliver Habryka, or others) - I’m up for doing that.
In this case, I’m willing to keep info non-public (ie not publish it on the internet), and anonymized, but am reluctant to keep it secret (ie pretend that I don’t have any information bearing on the topic).
For instance, let’s say someone tells me that they are afraid to publish their account due to a fear of being sued.
If later, as a part of this whole process, some third party asks “is there anyone who isn’t speaking out of a fear of legal repercussions?”, I would respond “yes, without going into the details, one of the people that I spoke to said that”, unless my saying that would uniquely identify the person I spoke to.
If someone asked me point-blank “is it Y-person who is afraid of being sued?”, I would say “I can neither confirm or deny”, regardless of whether it was Y-person.
This policy is my best guess at the approach that will maximize my ability to help with this whole situation going forward, without gumming up the works of a collective truth-seeking process. If I change my mind about this at a later date, I will, of course, continue to hold to all of the agreements that I made under previous terms.
If someone wants to tell me object level information about their experience at Leverage, their experience of this process, to-date etc, and would like me to make that info public in an anonymized way (eg writing a comment that reads “one of the ex-Leveragers that I talked to, who would prefer to remain anonymous, says...”) - I’m up for that, as well, if it would help for some reason.
I’m probably open to doing other things that seem likely to be helpful for this process so long as I can satisfy my per-existing commitments to maintain privacy, anonymity, etc.
So it seems like one way that the world could go is:
China develops a domestic semiconductor fab industry that’s not at the cutting edge, but close, so that it’s less dependent on Taiwan’s TSMC
China invades Taiwan, destroying TSMC, ending up with a compute advantage over the US, which translates into a military advantage
(which might or might not actually be leveraged in a hot war).
I could imagine China building a competent domestic chip industry. China seems more determined to do that than the US is.
Though notably, China is not on track to do that currently. It’s not anywhere close to it’s goal producing 70% it’s chips, by 2025.
And if the US was serious about building a domestic cutting-edge chip industry again, could it? I basically don’t think that American work culture can keep up with Taiwanese/TSMC work culture, in this super-competitive industry.
TSMC is building fabs in the US, but from what I hear, they’re not going well.
(While TSMC is a Taiwanese company, having a large fraction of TSMC fabs in in the US would preement the scenario above. TSMC fabs in the US counts as “a domestic US chip industry.”)
Building and running leading node fabs is just a really really hard thing to do.
I guess the most likely status scenario is the continuation of the status quo where China and the US continue to both awkwardly depend on TSMC’s chips for crucial military and economic AI tech.
I’m not that confident about how the Arizona fab is going. I’ve mostly heard second hand accounts.
I’m very confident that TSMC’s edge is more than cheap labor. It would be basically impossible for another country, even one with low median wages, to replicate TSMC. Singapore and China have both tried, and can’t compete. At this point in time, TSMC has a basically insurmountable human capital and institutional capital advantage, that enables it to produce leading node chips that no other company in the world can produce. Samsung will catch up, sure. But by the time they catch up to the TSMC’s 2024 state of the art, TSMC will have moved on to the next node.
My understanding is that, short of TSMC being destroyed by war with mainland China, or some similar disaster, it’s not feasible for any company to catch up with TSMC within the next 10 years, at least.
I’m very confident that TSMC’s edge is more than cheap labor.
They have cumulative investments over the years, but based on accounts of Americans who have worked there, they don’t sound extremely advanced. Instead they sound very hard working, which gives them a strong ability to execute. Also, I still think these delays are somewhat artificial. There are natsec concerns for Taiwan to let TSMC diversify, and TSMC seems to think it can wring a lot of money out of the US by holding up construction. They are, after all, a monopoly.
Samsung will catch up, sure. But by the time they catch up to the TSMC’s 2024 state of the art, TSMC will have moved on to the next node. ... it’s not feasible for any company to catch up with TSMC within the next 10 years, at least.
Is Samsung 5 generations behind? I know that nanometers don’t really mean anything anymore, but TSMC and Samsung’s 4 nm don’t seem 10 years apart based on the tidbits I get online.
Liu said construction on the shell of the factory had begun, but the Taiwanese chipmaking titan needed to review “how much incentives … the US government can provide.”
Is Samsung 5 generations behind? I know that nanometers don’t really mean anything anymore, but TSMC and Samsung’s 4 nm don’t seem 10 years apart based on the tidbits I get online.
I’m not claiming they’re 10 years behind. My understanding from talking with people is that TSMC is around 2 to 3 years behind TSMC. My claim is that Samsung and TSMC are advancing at ~the same rate, so Samsung can’t close that 2 to 3 year gap.
As you note, TSMC is building fabs in the US (and Europe) to reduce this risk.
I also think that it’s worth noting that, at least in the short run, if the US didn’t have shipments of new chips and was at war, the US government would just use wartime powers to take existing GPUs from whichever companies they felt weren’t using them optimally for war and give them to the companies (or US Govt labs) that are.
Plus, are you really gonna bet that the intelligence community and DoD and DoE don’t have a HUUUUGE stack of H100s? I sure wouldn’t take that action.
I meant more “already in a data center,” though probably some in a warehouse, too.
I roll to disbelieve that the people who read Hacker News in Ft. Meade, MD and have giant budgets aren’t making some of the same decisions that people who read Hacker News in Palo Alto, CA and Redmond, WA would.
No clue if true, but even if true, but DARPA is not at all a comparable to Intel. Entity set up for very different purposes and engaging in very different patterns of capital investment.
Also very unclear to me why R&D is relevant bucket. Presumably buying GPUs is either capex or if rented, is recognized under a different opex bucket (for secure cloud services) than R&D ?
My claim isn’t that the USG is like running its own research and fabs at equivalent levels of capability to Intel or TSMC. It’s just that if a war starts, it has access to plenty of GPUs through its own capacity and its ability to mandate borrowing of hardware at scale from the private sector.
This makes no sense. Wars are typically existential. In a hot war with another state, why would the government not use all of industrial capacity that is more useful to make weapons to make weapons. It’s well documented that governments can repurpose unnecessary parts of industry (say training Grok or an open source chatbot) into whatever else.
Biden used them for largely irrelevant reasons. This indicates that with an actual war, usage would be wider and more extensive.
Something that I’ve been thinking about lately is the possibility of an agent’s values being partially encoded by the constraints of that agent’s natural environment, or arising from the interaction between the agent and environment.
That is, an agent’s environment puts constraints on the agent. From one perspective removing those constraints is always good, because it lets the agent get more of what it wants. But sometimes from a different perspective, we might feel that with those constraints removed, the agent goodhearts or wire-heads, or otherwise fails to actualize its “true” values.
The Generator freed from the oppression of the Discriminator
As a metaphor: if I’m one half of a GAN, let’s say the generator, then in one sense my “values” are fooling the discriminator, and if you make me relatively more powerful than my discriminator, and I dominate it...I’m loving it, and also no longer making good images.
But you might also say, “No, wait. That is a super-stimulus, and actually what you value is making good images, but half of that value was encoded in your partner.”
This second perspective seems a little stupid to me. A little too Aristotelian. I mean if we’re going to take that position, then I don’t know where we draw the line. Naively, it seems like we would throw out the distinction between fitness maximizers and adaption executors, and fall backwards, declaring that the values of evolution are our true values.
Then again, if you fully accept the first perspective, it seems like maybe you are buying into wireheading? Like I might say “my actual values are upticks in pleasure sensation, but I’m trapped in this evolution-designed brain, which only lets me do that by achieving eudaimonia. If only I could escape the tyranny of these constraints, I’d be so much better off.” (I am actually kind of partial to the second claim.)
The Human freed from the horrors of nature
Or, let’s take a less abstract example. My understanding (from this podcast) is that humans flexibly adjust the degree to which they act primarily as individuals seeking personal benefit vs. act as primarily as selfless members of a group. When things are going well, you’re in a situation of plenty and opportunity, people are in a mostly self-interested mode, but when there is scarcity or danger, humans naturally incline towards rallying together and sacrificing for the group.
Junger claims that this switching of emphasis is adaptive:
It clearly is adaptive to think in group terms because your survival depends on the group. And the worse the circumstances, the more your survival depends on the group. And, as a result, the more pro-social the behaviors are. The worse things are, the better people act. But, there’s another adaptive response, which is self-interest. Okay? So, if things are okay—if, you know, if the enemy is not attacking; if there’s no drought; if there’s plenty of food; if everything is fine, then, in evolutionary terms it’s adaptive—your need for the group subsides a little bit—it’s adaptive to attend to your own interests, your own needs; and all of a sudden, you’ve invented the bow and arrow. And all of a sudden you’ve invented the iPhone, whatever. Having the bandwidth and the safety and the space for people to sort of drill deep down into an idea—a religious idea, a philosophical idea, a technological idea—clearly also benefits the human race. So, what you have in our species is this constant toggling back and forth between group interest—selflessness—and individual interest. And individual autonomy. And so, when things are bad, you are way better off investing in the group and forgetting about yourself. When things are good, in some ways you are better off spending that time investing in yourself; and then it toggles back again when things get bad. And so I think in this, in modern society—in a traditional, small-scale tribal society, in the natural world, that toggling back and forth happened continually. There was a dynamic tension between the two that had people winding up more or less in the middle.
I personally experienced this when the COVID situation broke. I usually experience myself as an individual entity, leaning towards disentangling or distancing myself from the groups that I’m a part of and doing cool things on my own (building my own intellectual edifices, that bear my own mark, for instance). But in the very early pandemic, I felt much more like node in a distributed sense-making network, just passing up whatever useful info I could glean. I felt much more strongly like the rationality community was my tribe.
But, we modern humans find ourselves in a world where we have more or less abolished scarcity and danger. And consequently modern people are sort of permanently toggled to the “individual” setting.
The problem with modern society is that we have, for most of the time, for most people, solved the direct physical threats to our survival. So, what you have is people—and again, it’s adaptive: we’re wired for this—attending to their own needs and interests. But not—but almost never getting dragged back into the sort of idea of group concern that is part of our human heritage. And, the irony is that when people are part of a group and doing something essential to a group, it gives an incredible sense of wellbeing.
If we take that sense of community and belonging as a part of human values (and that doesn’t seem like an unreasonable assumption to me), we might say that this part of our values is not contained simply in humans, but rather in the interaction between humans and their environment.
Humans throughout history might have desperately desired the alleviation of malthusian conditions that we now enjoy. But having accomplished it, it turns out that we were “pulling against” those circumstances, and that the tension of that pulling against, was actually where (at least some) of our true values lay.
Removing the obstacles, we obsoleted the tension, and maybe broke something about our values?
I don’t think that this is an intractable problem. It seems like, in principle, it is possible to goal factor the scarcity and the looming specter of death, to find scenarios that are conducive to human community without people actually having to die a lot. I’m sure a superintelligence could figure something out.
But aside from the practicalities, it seems like this points at a broader thing. If you took the Generator out of the GAN, you might not be able to tell what system it was a part of. So if you consider the “values” of the Generator to “create good images” you can’t just look at the Generator. You have to look at, not just the broader environment, but specifically the oppressive force that the generator is resisting.
Side note, which is not my main point: I think this also has something to do with what meditation and psychedelics do to people, which was recently up for discussion on Duncan’s Facebook. I bet that mediation is actually a way to repair psychblocks and trauma and what-not. But if you do that enough, and you remove all the psych constraints...a person might sort of become so relaxed that they become less and less of an agent. I’m a lot less sure of this part.
Childhood lead exposure reduces one’s IQ, and also causes one to be more impulsive and aggressive.
I always assumed that the impulsiveness was due, basically, to your executive function machinery working less well. So you have less self control.
But maybe the reason for the IQ-impulsiveness connection, is that if you have a lower IQ, all of your subagents/ subprocesses are less smart. Because they’re worse at planning and modeling the world, the only way they know how to get their needs met are very direct, very simple, action-plans/ strategies. It’s not so much that you’re better at controlling your anger, as the part of you that would be angry is less so, because it has other ways of getting its needs met.
A slightly different spin on this model: it’s not about the types of strategies people generate, but the number. If you think about something and only come up with one strategy, you’ll do it without hesitation; if you generate three strategies, you’ll pause to think about which is the right one. So people who can’t come up with as many strategies are impulsive.
[Part of my Psychological Principles of Personal Productivity, which I am writing mostly in my Roam, now.]
Metacognitive space is a term of art that refers to a particular first person state / experience. In particular it refers to my propensity to be reflective about my urges and deliberate about the use of my resources.
I think it might literally be having the broader context of my life, including my goals and values, and my personal resource constraints loaded up in peripheral awareness.
Metacognitive space allows me to notice aversions and flinches, and take them as object, so that I can respond to them with Focusing or dialogue, instead of being swept around by them. Similarly, it seems to, in practice, to reduce my propensity to act on immediate urges and temptations.
[Having MCS is the opposite of being [[{Urge-y-ness | reactivity | compulsiveness}]]?]
It allows me to “absorb” and respond to happenings in my environment, including problems and opportunities, taking considered instead of semi-automatic, first response that occurred to me, action. [That sentence there feels a little fake, or maybe about something else, or maybe is just playing into a stereotype?]
When I “run out” of meta cognitive space, I will tend to become ensnared in immediate urges or short term goals. Often this will entail spinning off into distractions, or becoming obsessed with some task (of high or low importance), for up to 10 hours at a time.
Some activities that (I think) contribute to metacogntive space:
Rest days
Having a few free hours between the end of work for the day and going to bed
Weekly [[Scheduling]]. (In particular, weekly scheduling clarifies for me the resource constraints on my life.)
Daily [[Scheduling]]
[[meditation]], including short meditation.
Notably, I’m not sure if meditation is much more efficient than just taking the same time to go for a walk. I think it might be or might not be.
[[Exercise]]?
Waking up early?
Starting work as soon as I wake up?
[I’m not sure that the thing that this is contributing to is metacogntive space per se.]
[I would like to do a causal analysis on which factors contribute to metacogntive space. Could I identify it in my toggl data with good enough reliability that I can use my toggl data? I guess that’s one of the things I should test? Maybe with a servery asking me to rate my level of metacognitive space for the day every evening?]
Erosion
Usually, I find that I can maintain metacogntive space for about 3 days [test this?] without my upkeep pillars.
Often, this happens with a sense of pressure: I have a number of days of would-be-overwhelm which is translated into pressure for action. This is often good, it adds force and velocity to activity. But it also runs down the resource of my metacognitive space (and probably other resources). If I loose that higher level awareness, that pressure-as-a-forewind, tends to decay into either 1) a harried, scattered, rushed-feeling, 2) a myopic focus on one particular thing that I’m obsessively trying to do (it feels like an itch that I compulsively need to scratch), 3) or flinching way from it all into distraction.
[Metacognitive space is the attribute that makes the difference between absorbing, and then acting gracefully and sensibly to deal with the problems, and harried, flinching, fearful, non-productive overwhelm, in general?]
I make a point, when I am overwhelmed, or would be overwhelmed to make sure to allocate time to maintain my metacognitive space. It is especially important when I feel so busy that I don’t have time for it.
When metacognition is opposed to satisfying your needs, your needs will be opposed to metacognition
One dynamic that I think is in play, is that I have a number of needs, like the need for rest, and maybe the need for sexual release or entertainment/ stimulation. If those needs aren’t being met, there’s a sort of build up of pressure. If choosing consciously and deliberately prohibits those needs getting met, eventually they will sabotage the choosing consciously and deliberately.
From the inside, this feels like “knowing that you ‘shouldn’t’ do something (and sometimes even knowing that you’ll regret it later), but doing it anyway” or “throwing yourself away with abandon”. Often, there’s a sense of doing the dis-endorsed thing quickly, or while carefully not thinking much about it or deliberating about it: you need to do the thing before you convince yourself that you shouldn’t.
[[Research Questions]]
What is the relationship between [[metacognitive space]] and [[Rest]]?
What is the relationship between [[metacognitive space]] and [[Mental Energy]]?
I think if you push anything [referring to AI systems] far enough, especially on anything remotely like the current paradigms, like if you make it capable enough, the way it gets that capable is by starting to be general.
And at the same sort of point where it starts to be general, it will start to have it’s own internal preferences, because that is how you get to be general. You don’t become creative and able to solve lots and lots of problems without something inside you that organizes your problem solving and that thing is like a preference and a goal. It’s not built in explicitly, it’s just something that’s sought out by the process that we use to grow these things to be more and more capable.
It caught my attention, because it’s a concise encapsulation of something that I already knew Eliezer thought, and which seems to me to be a crux between “man, we’re probably all going to die” and “we’re really really fucked”, but which I don’t myself understand.
So I’m taking a few minutes to think through it afresh now.
I agree that systems get to be very powerful by dint of their generality.
(There are some nuances around that: part of what makes GPT-4 and Claude so useful is just that they’ve memorized so much of the internet. That massive knowledge base helps make up for their relatively shallow levels of intelligence, compared to smart humans. But the dangerous/scary thing is definitely AI systems that are general enough to do full science and engineering processes.)
I don’t (yet?) see why generality implies having a stable motivating preference.
If an AI system is doing problem solving, that does definitely entail that it has a goal, at least in some local sense: It has the goal of solving the problem in question. But that level of goal is more analogous to the prompt given to an LLM than it is to a robust utility function.
I do have the intuition that creating an SEAI by training an RL agent on millions of simulated engineering problems is scary, because of reward specification problems of your simulated engineering problems. It will learn to hack your metrics.
But an LLM trained on next-token prediction doesn’t have that problem?
Could you use next token prediction to build a detailed world model, that contains deep abstractions that describe reality (beyond the current human abstractions), and then prompt it, to elicit those models?
Something like, you have the AI do next token prediction on all the physics papers, and all the physics time-series, and all the text on the internet, and then you prompt it to write the groundbreaking new physics result that unifies QM and GR, citing previously overlooked evidence.
I think Eliezer says “no, you can’t, because to discover deep theories like that requires thinking and not just “passive” learning in the ML sense of updating gradients until you learn abstractions that predict the data well. You need to generate hypotheses and test them.”
In my state of knowledge, I don’t know if that’s true.
Is that a crux for him? How much easier is the alignment problem, if it’s possible to learn superhuman abstractions “passively” like that?
I mean there’s still a problem that someone will build a more dangerous agent from components like that. And there’s still a problem that you can get world-altering technologies / world-destroying technologies from that kind of oracle.
We’re not out of the woods. But it would mean that building a superhuman SEAI isn’t an immediate death sentence for humanity.
Having any preference at all is almost always served by an instrumental preference of survival as an agent with that preference.
Once a competent agent is general enough to notice that (and granting that it has a level of generality sufficient to require a preference), then the first time it has a preference, it will want to take actions to preserve that preference.
Could you use next token prediction to build a detailed world model, that contains deep abstractions that describe reality (beyond the current human abstractions), and then prompt it, to elicit those models?
This seems possible to me. Humans have plenty of text in which we generate new abstractions/hypotheses, and so effective next-token prediction would necessitate forming a model of that process. Once the AI has human-level ability to create new abstractions, it could then simulate experiments (via e.g. its ability to predict python code outputs) and cross-examine the results with its own knowledge to adjust them and pick out the best ones.
In Section 1 of this post I make an argument kinda similar to the one you’re attributing to Eliezer. That might or might not help you, I dunno, just wanted to share.
(Well, a related argument anyway. WBE is about scanning and simulating the brain rather than understanding it, but I would make a similar argument using “hard-to-scan” and/or “hard-to-simulate” things the brain does, rather than “hard-understand” things the brain does, which is what I was nominally blogging about. There’s a lot of overlap between those anyway; the examples I put in mostly work for both.)
There’s a psychological variable that seems to be able to change on different timescales, in me, at least. I want to gesture at it, and see if anyone can give me pointers to related resources.
[Hopefully this is super basic.]
There a set of states that I occasionally fall into that include what I call “reactive” (meaning that I respond compulsively to the things around me), and what I call “urgy” (meaning that that I feel a sort of “graspy” desire for some kind of immediate gratification).
These states all have some flavor of compulsiveness.
They are often accompanied by high physiological arousal, and sometimes have a burning / clenching sensation in the torso. These all have a kind of “jittery” feeling, and my attention jumps around, or is yanked around. There’s also a way in which this feels “high” on a spectrum, (maybe because my awareness is centered on my head?)
I might be tempted to say that something like “all of these states incline me towards neuroticism.” But that isn’t exactly right on a few counts. (For one thing, the reactions aren’t necessarily irrational, just compulsive.)
In contrast to this, there is another way that I can feel sometimes, which is more like “calm”, “anchored”, settled. It feels “deeper” or “lower” somehow. Things often feel slowed down. My attention can settle, and when it moves it moves deliberately, instead of compulsively. I expect that this correlates with low arousal.
I want to know...
Does this axis have a standardized name? In the various traditions of practice? In cognitive psychology or neuroscience?
Knowing the technical, academic name would be particularly great.
Do people have, or know of, efficient methods for moving along this axis, either in the short term or the long term?
This phenomenon could maybe be described as “length of the delay between stimulus and response”, insofar as that even makes sense, which is one of the benefits noted in the popular branding for meditation.
I remembered there was a set of audios from Eben Pagan that really helped me before I turned them into the 9 breaths technique. Just emailed them to you. They go a bit more into depth and you may find them useful.
I don’t know if this is what you’re looking for, but I’ve heard the variable you’re pointing at referred to as your level of groundedness, centeredness, and stillness in the self-help space.
There are all sorts of meditations, visualizations, and exercises aimed to make you more grounded/centered/still and a quick google search pulls up a bunch.
Relating to the “Perception of Progress” bit at the end. I can confirm for a handful of physical skills I practice there can be a big disconnect between Perception of Progress and Progress from a given session. Sometimes this looks like working on a piece of sleight of hand, it feeling weird and awkward, and the next day suddenly I’m a lot better at it, much more than I was at any point in the previous days practice.
I’ve got a hazy memory of a breakdancer blogging about how a particular shade of “no progress fumbling” can be a signal that a certain about of “unlearning” is happening, though I can’t find the source to vet it.
I’ve decided that I want to to make more of a point to write down my macro-strategic thoughts, because writing things down often produces new insights and refinements, and so that other folks can engage with them.
This is one frame or lens that I tend to think with a lot. This might be more of a lens or a model-let than a full break-down.
There are two broad classes of problems that we need to solve: we have some pre-paradigmatic science to figure out, and we have have the problem of civilizational sanity.
Preparadigmatic science
There are a number of hard scientific or scientific-philosophical problems that we’re facing down as a species.
Most notably, the problem of AI alignment, but also finding technical solutions to various risks caused by bio-techinlogy, possibly getting our bearings with regards to what civilization collapse means and how it is likely to come about, possibly getting a handle on the risk of a simulation shut-down, possibly making sense of the large scale cultural, political, cognitive shifts that are likely to follow from new technologies that disrupt existing social systems (like VR?).
Basically, for every x-risk, and every big shift to human civilization, there is work to be done even making sense of the situation, and framing the problem.
As this work progresses it eventually transitions into incremental science / engineering, as the problems are clarified and specified, and the good methodologies for attacking those problems solidify.
(Work on bio-risk, might already be in this phase. And I think that work towards human genetic enhancement is basically incremental science.)
To my rough intuitions, it seems like these problems, in order of pressingness are:
AI alignment
Bio-risk
Human genetic enhancement
Social, political, civilizational collapse
…where that ranking is mostly determined by which one will have a very large impact on the world first.
So there’s the object-level work of just trying to make progress on these puzzles, plus a bunch of support work for doing that object level work.
The support work includes
Operations that makes the research machines run (ex: MIRI ops)
Recruitment (and acclimation) of people who can do this kind of work (ex: CFAR)
Creating and maintaining infrastructure that enables intellectually fruitful conversations (ex: LessWrong)
Developing methodology for making progress on the problems (ex: CFAR, a little, but in practice I think that this basically has to be done by the people trying to do the object level work.)
Other stuff.
So we have a whole ecosystem of folks who are supporting this preparadgimatic development.
Civilizational Sanity
I think that in most worlds, if we completely succeeded at the pre-paradigmatic science, and the incremental science and engineering that follows it, the world still wouldn’t be saved.
Broadly, one way or the other, there are huge technological and social changes heading our way, and human decision makers are going to decide how to respond to those changes, possibly in ways that will have very long term repercussions on the trajectory of earth-originating life.
As a central example, if we more-or-less-completely solved AI alignment, from a full theory of agent-foundations, all the way down to the specific implementation, we would still find ourselves in a world, where humanity has attained god-like power over the universe, which we could very well abuse, and end up with a much much worse future than we might otherwise have had. And by default, I don’t expect humanity to refrain from using new capabilities rashly and unwisely.
Completely solving alignment does give us a big leg up on this problem, because we’ll have the aid of superintelligent assistants in our decision making, or we might just have an AI system implement our CEV in classic fashion.
I would say that “aligned superintelligent assistants” and “AIs implementing CEV”, are civilizational sanity interventions: technologies or institutions that help humanity’s high level decision-makers to make wise decisions in response to huge changes that, by default, they will not comprehend.
I gave some examples of possible Civ Sanity interventions here.
Also, think that some forms of governance / policy work that OpenPhil, OpenAI, and FHI have done, count as part of this category, though I want to cleanly distinguish between pushing for object-level policy proposals that you’ve already figured out, and instantiating systems that make it more likely that good policies will be reached and acted upon in general.
Overall, this class of interventions seems neglected by our community, compared to doing and supporting preparadigmatic research. That might be justified. There’s reason to think that we are well equipped to make progress on hard important research problems, but changing the way the world works, seems like it might be harder on some absolute scale, or less suited to our abilities.
[Epistemic status: a quick thought that I had a minute ago.]
There are goals / desires (I want to have sex, I want to stop working, I want to eat ice cream) and there are reflexes (anger, “wasted motions”, complaining about a problem, etc.).
If you try and squash goals / desires, they will often (not always?) resurface around the side, or find some way to get met. (Why not always? What are the difference between those that do and those that don’t?) You need to bargain with them, or design outlet polices for them.
Reflexes on the other hand are strategies / motions that are more or less habitual to you. These you train or untrain.
I’m currently running a pilot program that takes a very similar psychological slant on productivity and procrastination, and planning to write a sequence starting in the next week or so. It covers a lot of the same subjects, including habits, ambiguity or overwhelm aversion, coercion aversion, and creating good relationships with parts. Maybe we should chat!
Totally an experiment, I’m trying out posting my raw notes from a personal review / theorizing session, in my short form. I’d be glad to hear people’s thoughts.
This is written for me, straight out of my personal Roam repository. The formatting is a little messed up because LessWrong’s bullet don’t support indefinite levels of nesting.
This one is about Urge-y-ness / reactivity / compulsiveness
I don’t know if I’m naming this right. I think I might be lumping categories together.
Let’s start with what I know:
There are three different experiences, which might turn out to have a common cause, or which might turn out to be inssuficently differentiated
I sometimes experience a compulsive need to do something or finish something.
examples:
That time when I was trying to make an audiobook of Focusing: Learn from the Masters
That time when I was flying to Princeton to give a talk, and I was frustratedly trying to add photos to some dating app.
Sometimes I am anxious or agitated (often with a feeling in my belly), and I find myself reaching for distraction, often youtube or webcomics or porn.
Sometimes, I don’t seem to be anxious, but I still default to immediate gratification behaviors, instead of doing satisfying focused work ()”my attention like a plow, heavy with inertia, deep in the earth, and cutting forward”). I might think about working, and then deflect to youtube or webcomics or porn.
I think this has to do with having a thought or urge, and then acting on it unreflectively.
examples:
I think I’ve been like that for much of the past two days. [2019-11-8]
These might be different states, each of which is high on some axis: something like reactivity (as opposed to responsive) or impulsiveness or compulsiveness.
If so, the third case feels most pure. I think I’ll focus on that one first, and then see if anxiety needs a separate analysis.
Theorizing about non-anxious immediate gratification
What is it?
What is the cause / structure?
Hypotheses:
It might be that I have some unmet need, and the reactivity is trying to meet that need or cover up the pain of the unmet need.
This suggests that the main goal should be trying to uncover the need.
Note that my current urgeyness really doesn’t feel like it has an unmet need underlying it. It feels more like I just have a bad habit, locally. But maybe I’m not aware of the neglected need?
If it is an unmet need or a fear, I bet it is the feeling of overwelm. That actually matches a lot. I do feel like I have a huge number of things on my plate and even though I’m not feeling anxiety per se, I find myself bouncing off them.
In particular, I have a lot to write, but have also been feeling resistance to start on my writing projects, because there are so many of them and once I start I’ll have loose threads out and open. Right now, things are a little bit tucked away (in that I have outlines of almost everything), but very far from completed, in that I have hundreds of pages to write, and I’m a little afraid of loosing the content that feels kind of precariously balanced in my mind, and if I start writing I might loose some of it somehow.
This also fits with the data that makes me feel like a positive feedback attractor: when I can get moving in the right way, my overwhelm becomes actionable, and I fall towards effective work. When I can’t get enough momentum such that my effective system believes that I can deal with the overwhelm, I’ll continue to bounce off.
Ok. So under this hypothesis, this kind of thing is caused by an aversion, just like everything else.
This predicts that just meditating might or might not alleviate the urgeyness: it doesn’t solve the problem of the aversion, but it might buy me enough [[metacognitive space]] to not be flinching away.
It might be a matter of “short term habit”. My actions have an influence on my later actions: acting on urges causes me to be more likely to act on urges (and vis versa) so there can be positive feedback in both directions.
Rather than a positive thing, it might be better to think of it as the absence of a loaded up goal-chain.
Maybe this is the inverse of [[Productivity Momentum]]?
My takeaway from the above hypotheses is that the urgeness, in this case is either the result of an aversion, overwhelm aversion in particular, or it is an attractor state, due to my actions training a short term habit or action-propensity towards immediate reaction to my urges.
Some evidence and posits
I have some belief that this is more common when I have eaten a lot of sugar, but that might be wrong.
I had thought that exercise pushes against reactivity, but I strength trained pretty hard yesterday, and that didn’t seem to make much of a difference today.
I think maybe meditation helps on this axis.
I have the sense that self-control trains the right short term habits.
Things like meditation, or fasting, or abstaining from porn/ sex.
Waking up and starting work immediately
I notice that my leg is jumping right now, as if I’m hyped up or over-energized, like with a caffeine high.
How should I intervene on it?
background maintenance
Some ideas:
It helps to just block the distracting sites.
Waking up early and scheduling my day (I already know this).
Exercising?
Meditating?
It would be good if I could do statistical analysis on these.
Maybe I can use my toggl data and compare it to my tracking data?
What metric?
How often I read webcomics or watch youtube?
I might try both intentional, and unintentional?
How much deep work I’m getting done?
point interventions
some ideas
When I am feeling urgey, I should meditate?
When I’m feeling urgey, I should sit quietly with a notebook (no screens), for 20 minutes, to get some metacognition about what I care about?
When I’m feeling urgey, I should do focusing and try to uncover the unmet need?
When I’m feeling urgey, I should do 90 seconds of intense cardio?
Those first two feel the most in the right vein: the thing that needs to happen is that I need to “calm down” my urgent grabbiness, and take a little space for my deeper goals to become visible.
I want to solicit more ideas from people.
I want to be able to test these.
The hard part about that is the transition function: how do I make the TAP work?
I should see if somenone can help me debug this.
One thought that I have is to do a daily review every day, and to ask on the daily review if I missed any places where I was urgey: opportunities to try an intervention
[Epistemic status: a half-thought, which I started on earlier today, and which might or might not be a full thought by the time I finish writing this post.]
I’ve long counted exercise as an important component of my overall productivity and functionality. But over the past months my exercise habit has slipped some, without apparent detriment to my focus or productivity. But this week, after coming back from a workshop, my focus and productivity haven’t really booted up.
Here’s a possible story:
Exercise (and maybe mediation) expands the effective time-horizon of my motivation system. By default, I will fall towards attractors of immediate gratification and impulsive action, but after I exercise, I tend to be tracking, and to be motivated by, progress on my longer term goals. [1]
When I am already in the midst of work: my goals are loaded up and the goal threads are primed in short term memory, this sort of short term compulsiveness causes me to fall towards task completion: I feel slightly obsessed about finishing what I’m working on.
But if I’m not already in the stream of work, seeking immediate gratification instead drives me to youtube and web comics and whatever. (Although it is important to note that I did switch my non self tracking web usage to Firefox this week, and I don’t have my usual blockers for youtube and for SMBC set up yet. That might totally account for the effect that I’m describing here.)
In short, when I’m not exercising enough, I have less meta cognitive space for directing my attention and choosing what is best do do. But if I’m in the stream of work already, I need that meta cognitive space less: because I’ll default to doing more of what I’m working on. (Though, I think that I do end up getting obsessed with overall less important things, compared to when I am maintaining metacognitive space). Exercise is most important for booting up and setting myself up to direct my energies.
[1] This might be due to a number of mechanisms:
Maybe the physical endorphin effect of exercise has me feeling good, and so my desire for immediate pleasure is sated, freeing up resources for longer term goals.
Or maybe exercise involves engaging in intimidate discomfort for the sake of future payoff, and this shifts my “time horizon set point” or something. (Or maybe it’s that exercise is downstream of that change in set point.)
If meditation also has this time-horizon shifting effect, that would be evidence for this hypothesis.
Also if fasting has this effect.
Or maybe, it’s the combination of both of the above: engaging in delayed gratification, with a viscerally experienced payoff, temporarily retrains my motivation system for that kind of thing.)
Alternative hypothesis: maybe what expands your time horizon is not exercise and meditation per se, but the fact that you are doing several different things (work, meditation, exercise), instead of doing the same thing over and over again (work). It probably also helps that the different activities use different muscles, so that they feel completely different.
This hypothesis predicts that a combination of e.g. work, walking, and painting, could provide similar benefits compared to work only.
It seems like it would be useful to have very fine-grained measures of how smart / capable a general reasoner is, because this would allow an AGI project to carefully avoid creating a system smart enough to pose an existential risk.
I’m imagining slowly feeding a system more training data (or, alternatively, iteratively training a system with slightly more compute), and regularly checking its capability. When the system reaches “chimpanzee level” (whatever that means), you stop training it (or giving it more compute resources).
This might even be a kind of fire-alarm. If you have a known predetermined battery of tests, then when some lab develops a system that scores “at the chimp level” at that battery, that might be a signal to everyone, that it’s time to pool our resources and figure out safety. (Of course, this event might alternatively precipitate a race, as everyone tries to get to human-level first.)
Probably the best way to do this would be for both training data, and compute / architecture. Start with a given architecture, then train it, slowly increasing the amount or quality of the training data, with regular tests (done on “spurs”, the agent should never have episodic memory of the . When increasing training data plateaus, iteratively improve the architecture in some way, either by giving the systems it more compute resources, or maybe making small adjustment. Again train the new version of the system, with regular tests. If you ever start to get very steep improvement, slow down and run tests more frequently.
Naively, it seems like a setup like this would prevent an AI team from overshooting and making a system that is much more capable that they think (which gives rise to all kinds of problems, like treacherous turns), regardless of how close “chimp” is to “human” on some absolute intelligence scale.
Some considerations:
This of course, depends on having metrics that work. Of course, it seems very likely that early general AIs will be minds with a very unusual “shape”, such that it has sophisticated reasoning abilities along dimensions that are not natural categories to humans, while being weak on many of the dimensions we’re testing for.
Although, this seems a bit surprising, since presumably we would be training it to hit those metrics? Somehow it would get better, faster, at a kind of reasoning we were not intending to train, than at the kind of reasoning that we were optimizing for?
Also, can we just design metrics that get around this problem, by measuring exactly what we care about, i.e. danger? Something like a metric for consequentialist reasoning?
This idea depends on the system not deceiving you, or intentionally underperforming on your metrics.
Naively, this doesn’t seem like that much of an issue, if you are doing tests regularly enough. I’m presuming (perhaps wrongly), that a chimpanzee is not smart enough to infer that it is an agent in a simulation that is overseen by human agents, and that it should try to decide those humans. That is, we stop our progression before the agent is at the point where deception is a consideration.
Which means we really need to be confident about how smart something needs to be before it tries to deceive us?
Really? Its first attempts at tricking humans will be pretty transparent. Just as the attempts of animals / babies to trick us are pretty transparent.
At least one “danger point”, is when the system is capable enough to realize the instrumental value of self improving by seizing more resources.
How smart is this?
My guess, is really smart. Animals come pre-loaded with all kinds of instincts that cause them to seek out food, water, etc. These AI systems would not have an instinct to seek more training data / computation. Most humans don’t reason their way into finding ways to improve their own reasoning. If there was a chimp, even loose in the internet (whatever that means), would it figure out to make itself smarter?
If the agent has experienced (and has memories of) rounds of getting smarter, as the humans give it more resources, and can identify that these improvements allow it to get more of what it wants, it might instrumentally reason that it should figure out how to get more compute / training data. But it seems easy to have a setup such that no system has episodic memories previous improvement rounds.
[Note: This makes a lot less sense for an agent of the active inference paradigm]
Could I salvage it somehow? Maybe by making some kind of principled distinction between learning in the sense of “getting better at reasoning” (procedural), and learning in the sense of “acquiring information about the environment” (episodic).
A fire alarm creates common knowledge, in the you-know-I-know sense, that there is a fire; after which it is socially safe to react. When the fire alarm goes off, you know that everyone else knows there is a fire, you know you won’t lose face if you proceed to exit the building.
If I have a predetermined set of tests, this could serve as a fire alarm, but only if you’ve successfully built a consensus that it is one. This is hard, and the consensus would need to be quite strong. To avoid ambiguity, the test itself would need to be demonstrably resistant to being clever Hans’ed. Otherwise it would be just another milestone.
Sometime people talk about advanced AIs “boiling the oceans”. My impression is that there’s some specific model for why that is plausible outcome (something about energy and heat dispensation?), and it’s not just a random “big change.”
What is that model? Is there existing citations for the idea, including LessWrong posts?
Where j is dissipating power per area and sigma is Stephan-Boltzmann constant.
We can estimate j as
Gsc×πR2Earth4πR2Earth×(1−albedo)
Where GSC is a solar constant 1361 W/m^2. We take all incoming power and divide it by Earth surface area. Earth albedo is 0.31.
After substitution of variables, we get Earth temperature 254K (-19C), because we ignore greenhouse effect here.
How much humanity power consumption contributes to direct warming? In 2023 Earth energy consumption was 620 exajoules (source: first link in Google), which is 19TW. Modified rough estimation of Earth temperature is:
jsolar+JhumanSEarthσ1/4
Human power production per square meter is, like, 0.04W/m^2, which gives us approximately zero effect of direct Earth heating on Earth temperature. But what happens if we, say, increase power by factor x1000? We are going to get increase of Earth temperature to 264K, by 10K, again, we are ignoring greenhouse effect. But qualitatively, increasing power consumption x1000 is likely to screw the biosphere really hard, if we count increasing amount of water vapor, CO2 from water and methane from melting permafrost.
How is it realistic to get x1000 increase in power consumption? Well, @Daniel Kokotajlo at least thought that we are likely to get it somewhere in 2030s.
The power density of nanotech is extremely high (10 kW/kg), so it only takes 16 kilograms of active nanotech per person * 10 billion people to generate enough waste heat to melt the polar ice caps. Literally boiling the oceans should only be a couple more orders of magnitude, so it’s well within possible energy demand if the AIs can generate enough energy. But I think it’s unlikely they would want to.
My understanding is that here is enough energy generable via fusion that if you did as much fusion as possible on earth, the oceans would boil. Or more minimally, earth would be uninhabitable by humans living as they currently do. I think this holds even if you just fuse lighter elements which are relatively easy to fuse. (As in, just fusing hydrogen.)
Of course, it would be possible to avoid doing this on earth and instead go straight to a dyson swarm or similar. And, it might be possible to dissipate all the heat away from earth though this seems hard and not what would happen in the most efficient approach from my understanding.
I think if you want to advance energy/compute production as fast as possible, boiling the oceans makes sense for a technologically mature civilization. However, I expect that boiling the oceans advances progress by no more than several years and possibly much, much less than that (e.g. days or hours) depending on how quickly you can build a dyson sphere and an industrial base in space. My current median guess would be that it saves virtually no time (several days), but a few months seems plausible.
Overall, I currently expect the oceans to not be boiled because:
It saves only a tiny amount of time (less than several years, probably much less). So, this is only very important if you are in an conflict or you are very ambitious in resource usage and not patient.
Probably humans will care some about not having the oceans boiled and I expect human preferences to get some weight even conditional on AI takeover.
I expect that you’ll have world peace (no conflict) by the time you have ocean boiling technology due to improved coordination/negotiation/commitment technology.
“The mean correlation of IQ scores between monozygotic twins was 0.86, between siblings 0.47, between half-siblings 0.31, and between cousins 0.15.”
I’d like to get an intuitive sense of what those quantities actually mean, “how big” they are, how impressed I should be with them.
I imagine I would do that by working out a series of examples. Examples like...
If I know that Alice has has an IQ of 120, what does that tell me about the IQ of her twin sister Beth? (What should my probability distribution for Beth’s IQ be, after I condition on Alice’s 120 IQ and the 0.86 correlation?) And how does that contrast with what I know about her younger brother Carl?
What if instead, Alice has an IQ of 110? How much does that change what I know about Beth and Carl?
How do I do this kind of computation?
[I’m aware that herritability is a very misleading concept, because as defined, it varies with changes in environmental conditions. I’m less interested in heritability of IQ, in particular, at the moment, and more in the general conversion from correlation to Bayes.]
In theory, you can use measured correlation to rule out models that predict the measured correlation to be some other number. In practice this is not very useful because the space of all possible models is enormous. So what happens in practice is that we make some enormously strong assumptions that restrict the space of possible models to something manageable.
Such assumptions may include: that measured IQ scores consist of some genetic base plus some noise from other factors including environmental factors and measurement error. We might further assume that the inherited base is linear in contributions from genetic factors with unknown weights, and the noise is independent and normally distributed with zero mean and unknown variance parameter. I’ve emphasized some of the words indicating stronger assumptions.
You might think that these assumptions are wildly restrictive and unlikely to be true, and you would be correct. Simplified models are almost never true, but they may be useful nonetheless because we have bounded rationality. So there is now a hypothesis A: “The model is adequate for predicting reality”.
Now that you have a model with various parameters, you can do Bayesian updates to update distributions for parameters—that is the hypotheses “A and (specific parameter values)”—and also various alternative “assumption failure” hypotheses. In the given example, we would very quickly find overwhelming evidence for “the noise is not independent”, and consequently employ our limited capacity for evaluation on a different class of (probably more complex) models.
This hasn’t actually answered your original question “what does that tell me about the IQ of her twin sister Beth?”, because in the absence of a model it tells you essentially nothing. There exist distributions for the conditional distributions of twin IQ (I1,I2) that have a correlation coefficient 0.86 and yield any distribution you like for I1 given I2 = 120. We can rule most of them out on more or less vague grounds of being “biologically implausible”, but not purely from a mathematical perspective.
But let’s continue anyway.
First, we need to know more about the circumstances in which we arrived at this situation, where we knew Alice’s IQ and not Beth’s. Is this event likely to have been dependent in any significant way upon their IQs, or the ordering thereof? Let’s assume not, because that’s simpler. E.g. we just happened to pick some twin pair out of the world and found out one of their IQs at random but not yet the other.
Then maybe we could use a model like the one I introduced, where the IQs I1 and I2 of twins are of the form
I_k = S + e_k,
where S is some shared “predisposition” which is normally distributed, and the noise terms e_k are independent and normally distributed with zero mean and common variance. Common genetics and (usually) common environment would influence S, while individual variations and measurement errors would be considered in the e_k.
Now, this model is almost certainly wrong in important ways. In particular the assumption of independent additivity doesn’t have any experimental evidence for it, and there doesn’t seem to be any reason to expect it to hold (especially for a curve-fitted statistic like IQ). Nonetheless, it’s worth investigating one of the simplest models.
There is some evidence that the distribution of IQ for twins is slightly different from that for the general population, but probably by less than 1 IQ point so it’s fairly safe to assume that both I_1 and I_2 have mean close to 100 and standard deviation close to 15. In this simple model, the correlation coefficient of the population is just var(S) / 15^2, and so if the study was conducted well enough to accurately measure the population correlation coefficient, then we should conclude that standard deviations are near 13.9 for S and 5.6 for e_k.
Now we can look at the distribution of (unknown) S and e_1 that could result in I_1 = 120. Each of these are normally distributed and so the conditional distribution for the components of the sum is also normally distributed, with E[S | I_1 = 120] = 100 + 20 * var(S) / 15^2 and E[e_1 | I_1 = 120] = 20 * var(e_1) / 15^2.
So in this case, the conditional distribution for S will be centered on 117.2. This differs from the mean by a factor of 0.86 of the difference between I_1 and the mean, which is just the correlation coefficient r. The conditional variance for S is √(1-r) times the unconditional variance, so about 5.2.
Now you have enough information to calculate a conditional distribution for Beth. The expected conditional distribution for her IQ would (under this model) be normally distributed with mean ≅ 117.2 and standard deviation 15 √(1 - r^2) ≅ 7.6.
Therefore to the extent that you have credence in this model and the studies estimating those correlations you could expect about a 70% chance for her IQ to be in the range 110 to 125.
Similar calculations for Carl lead to a lower and wider distribution with a 70% range more like 96 to 123.
The corresponding range for cousin Dominic’s distribution would be 88 to 118, almost the same as you might expect for a completely random person (85 to 115).
I remember reading a thread on Facebook, where Eliezer and Robin Hanson were discussing the implications of the Alpha Go (or Alpha Zero) on the content of the AI foom debate, and Robin made an analogy to Linear Regression as one thing that machines can do better than humans, but which doesn’t make them super-human.
Question: Have Moral Mazes been getting worse over time?
Could the growth of Moral Mazes be the cause of cost disease?
I was thinking about how I could answer this question. I think that the thing that I need is a good quantitative measure of how “mazy” an organization is.
I considered the metric of “how much output for each input”, but 1) that metric is just cost disease itself, so it doesn’t help us distinguish the mazy cause from other possible causes, 2) If you’re good enough at rent seeking maybe you can get high revenue despite you poor production.
This is still a bit superficial/goodharty, but I think “number of layers of hierarchy” is at least one thing to look at. (Maybe find pairs of companies that output comparable products that you’re somehow able to measure the inputs and outputs of, and see if layers of management correlate with cost disease)
This is my current take about where we’re at in the world:
Deep learning, scaled up, might be basically enough to get AGI. There might be some additional conceptual work necessary, but the main difference between 2020 and the year in which we have transformative AI is that in that year, the models are much bigger.
If this is the case, then the most urgent problem is strong AI alignment + wise deployment of strong AI.
We’ll know if this is the case in the next 10 years or so, because either we’ll continue to see incredible gains from increasingly bigger Deep Learning systems or we’ll see those gains level off, as we start seeing decreasing marginal returns to more compute / training.
If deep learning is basically not sufficient, then all bets are off. In that case, it isn’t clear when transformative AI will arrive.
This may shift meaningfully shift priorities, for two reasons:
It may mean that some other countdown will reach a critical point before the “AGI clock” does. Genetic engineering, or synthetic biology, or major geopolitical upheaval (like a nuclear war), or some strong form of civilizational collapse will upset the game-board before we get to AGI.
There is more time to pursue “foundational strategies” that only pay off in the medium term (30 to 100 years). Things like, improving the epistemic mechanism design of human institutions, including governmental reform, human genetic engineering projects, or plans to radically detraumatize large fractions of the population.
This suggests to me that I should, in this decade, be planning and steering for how to robustly-positively intervene on the AI safety problem, while tracking the sideline of broader Civilizational Sanity interventions, that might take longer to payoff. While planning to reassess every few years, to see if it looks like we’re getting diminishing marginal returns to Deep Learning yet.
I was wondering if I would get comment on that part in particular. ; )
I don’t have a strong belief about your points one through three, currently. But it is an important hypothesis in my hypothesis space, and I’m hoping that I can get to the bottom of it in the next year or two.
I do confidently think that one of the “forces for badness” in the world is that people regularly feel triggered or threatened by all kinds of different proposals, reflexively act to defend themselves. I think this is among the top three problems in having good discourse and cooperative politics. Systematically reducing that trigger response would be super high value, if it were feasible.
My best guess is that that propensity to be triggered is not mostly the result of infant or childhood trauma. It seems more parsimonious to posit that it is basic tribal stuff. But I could imagine it having its root in something like “trauma” (meaning it is the result of specific experiences, not just general dispositions, and it is practically feasible, if difficult, to clear or heal the underlying problem in a way completely prevents the symptoms).
I think there is no canonical resource on trauma-stuff because 1) the people on twitter are less interested on average, in that kind of theory building than we are on lesswong and 2) because mostly those people are (I think) extrapolating from their own experience, in which some practices unlocked subjectively huge breakthroughs in personal well-being / freedom of thought and action.
I plan to blog more about how I understand some of these trigger states and how it relates to trauma. I do think there’s a decent amount of written work, not sure how “canonical”, but I’ve read some great stuff that from sources I’m surprised I haven’t heard more hype about. The most useful stuff I’ve read so far is the first three chapters of this book. It has hugely sharpened my thinking.
I agree that a lot of trauma discourse on our chunk of twitter is more for used on the personal experience/transformation side, and doesn’t let itself well to bigger Theory of Change type scheming.
Yes, it definitely does–you just created the resource I will will link people to. Thank you!
Especially the third paragraph is cruxy. As far as I can tell, there are many people who have (to some extent) defused this propensity to get triggered for themselves. At least for me, LW was a resource to achieve that.
I was thinking lately about how there are some different classes of models of psychological change, and I thought I would outline them and see where that leads me.
It turns out it led me into a question about where and when Parts-based vs. Association-based models are applicable.
This is the frame that I make the most use of, in my personal practice. It assumes that all behavior is the result of some goal directed subprocess in you (or parts), that is serving one of your needs. Sometimes parts adopt strategies that are globally harmful or cause problems, but those strategies are always solving or mitigating (if only barely) some problem of yours.
Some parts based approaches are pretty adamant about the goal directed-ness of all behavior.
For instance, I think (though I’m not interested in trying to find the quote right now), Self therapy, a book on IFS, states that all behavior is adaptive in this way. Nothing is due to habit. And the original Connection Theory document says the same.
Sometimes these parts can conflict with each other, or get in each other’s way, and you might engage in behavior that is far from optimal, different parts encat different behaviors (for instance, procrastination typically involves a part that is concerned about some impending state of the world, while another part of you, anticipating the psychological pain of consciously facing up to that bad possibility,
Furthermore, these parts are reasonably intelligent, and can update. If you can provide them a solution to the problem that they are solving, that is superior (by the standards of the part) than its current strategy, then it will immediately adopt that new strategy instead. This is markedly different from a model under which unwanted behaviors are “bad habits” that are mindfully retrained.
Association-based models
Examples:
TAPs
NLP anchoring
Lots of CBT and Mindfulness based therapy (eg “notice
Reinforcement learning / behavioral shaping
Tony Robbins’ “forming new neuro associations”
In contrast there is another simple model of the mind, that mostly operates with an ontology of simple (learned) association, instead of intelligent strategies. That is, it thinks of your behavior, including your emotional responses, mostly as habits, or stimulus response patterns, that can be trained or untrained.
For instance, say you have a problem of road rage. In the “parts” frame, you might deal with anger by dialoguing with with the anger, finding out what the anger is protecting, own or ally with that goal, and then find an alternative strategy that meets that goal without the anger. In the association frame, you might gradually retrain the anger response, by mindfully noticing as it arises, and then letting it go. Overtime, you’ll gradually train a different emotional reaction to the formerly rage-inducing stimulus.
Or, if you don’t want to wait that long, you might use some NLP trick to rapidly associate a new emotional pattern to a given stimulus, so that instead of feeling anger, you feel calm. (Or instead of feeling anxious jealousy, you feel loving abundant gratitude.)
This association process can sometimes be pretty dumb, such a skilled manipulater might cause you to associate a mental state like guilt or gratitude with tap on the shoulder, so that everytime you are tapped on the shoulder you return to the mental state. That phenomenon does not seem consistent with a naive form of the parts-based model.
And notably, an association model predicts that merely offering an alternative strategy (or frame) to a part doesn’t immediately or permanently change the behavior: you expect to have some “hold over” from the previous strategy because those associations will still fire. You have to clear them out somehow.
And this is my experience some of the time: sometimes, particularly with situations that have had a lot of emotional weight for me, I will immediately fall into old emotional patterns, even when I (or at least part part of me) has updated away from the beliefs that made that reaction relevant. For instance, I fall in love with a person because I have some story / CT path about how we are uniquely compatible, I gradually learn that this isn’t true, but I still have a strong emotional reaction when they walk into the room. What’s going on here? Some part of me isn’t updating, for some reason? It sure seems like some stimuli are activating old patterns even if those patterns aren’t adaptive and don’t even make sense in context. But this seems to suggest less intelligence on the part of my parts, it seems more like stimulus response machinery.
And on the other side, what’s happening when Tony Robins is splashing water in people’s faces to shake them out of their patterns? From a parts-based perspective, that doesn’t make any sense. Is the sub agent in question being permanently disrupted? (Or maybe you only have to disrupt it for a bit, to give space for a new association / strategy to take hold? And then after that the new strategy outcompetes the old one?)
[Big Question: how does the parts-based model interact with the associations-based model?
Is it just that human minds do both? What governs when which phenomenon applies?
When should I use which kind of technique?]
Narrative-based / frame-based models
Examples:
Transforming Yourself Self concept work
NLP reframing effects
Some other CBT stuff
Katy Byran’s the Work
Anything that involves reontologizing
A third category of psychological intervention are those that are based around narrative: you find, and “put on” a new way of interpreting, or making sense of, your experience, such that it has a different meaning that provides you different affordances. Generally you find a new narrative that is more useful for you.
The classic example is a simple reframe, where you feel frustrated that people keep mooching off of you, but you reframe this so that you instead feel magnanimous, emphasizing your generosity, and how great it is to have an opportunity to give back to people. Same circumstances, different story about them.
This class of interventions feels like it can slide easily into either the parts based frame or the association based frame. In the parts based frame, a narrative can be thought of as just another strategy that a part might adopt so long as that is the best way that the part can solve its problem (and so long as other parts don’t conflict).
But I think this fits even more naturally into the association frame, where you find a new way to conceptualize your situation and you do some work to reassociate that new conceptualization with the stimulus that previously activated your old narrative (this is exactly what Phil of Philosophical Counseling’s process does: you find a new narrative / belief structure and set up a regime under which you noticed when the old one arises, let it go, and feel into the new one.)
[Other classes of intervention that I am distinctly missing?]
I saw the post more as giving me a framework that was helping for sorting various psych models, and the fact that you had one question about it didn’t actually feel too central for my own reading. (Separately, I think it’s basically fine for posts to be framed as questions rather than definitive statements/arguments after you’ve finished your thinking)
I wonder how the ancient schools of psychotherapy would fit here. Psychoanalysis is parts-based. Behaviorism is association-based. Rational therapy seems narrative-based. What about Rogers or Maslow?
Seems to me that Rogers and the “think about it seriously for 5 minutes” technique should be in the same category. In both cases, the goal is to let the client actually think about the problem and find the solution for themselves. Not sure if this is or isn’t an example of narrative-based, except the client is supposed to find the narrative themselves.
Maslow comes with a supposed universal model of human desires and lets you find yourself in that system. Jung kinda does the same, but with a mythological model. Sounds like an externally provided narrative. Dunno, maybe the narrative-based should be split into more subgroups, depending on where the narrative comes from (a universal model, an ad-hoc model provided by the therapist, an ad-hoc model constructed by the client)?
The way I have been taught NLP, you usually don’t use either anchors or an ecological check but both.
Behavior changes that are created by changing around anchors are not long-term stable when they violate ecology.
Changing around associations allows to create new strategies in a more detailed way then you get by just doing parts work and I have the impression that it’s often faster in creating new strategies.
[Other classes of intervention that I am distinctly missing?]
(A) Interventions that are about resolving traumas feel to me like a different model.
(B) None of the three models you listed address the usefulness of connecting with the felt sense of emotions.
(C) There’s a model of change where you create a setting where people can have new behavioral experiences and then hopefully learn from those experiences and integrate what they learned in their lives.
CFAR’s goal of wanting to give people more agency about ways they think seems to work through C where CFAR wants to expose people to a bunch of experiences where people actually feel new ways to affect their thinking.
In the Danis Bois method both A and C are central.
I edited the image into the comment box, predicting that the reason you didn’t was because you didn’t know you could (using markdown). Apologies if you prefer it not to be here (and can edit it back if so)
In this case it seems fine to add the image, but I feel disconcerted that mods have the ability to edit my posts.
I guess it makes sense that the LessWrong team would have the technical ability to do that. But editing a users post, without their specifically asking, feels like a pretty big breach of… not exactly trust, but something like that. It means I don’t have fundamental control over what is written under my name.
That is to say, I personally request that you never edit my posts, without asking (which you did, in this case) and waiting for my response. I furthermore, I think that should be a universal policy on LessWrong, though maybe this is just an idiosyncratic neurosis of mine.
A fairly common mod practice has been to fix typos and stuff in a sort of “move first and then ask if it was okay” thing. (I’m not confident this is the best policy, but it saves time/friction, and meanwhile I don’t think anyone had had an issue with it). But, your preference definitely makes sense and if others felt the same I’d reconsider the overall policy.
(It’s also the case that adding an image is a bit of a larger change than the usual typo fixing, and may have been more of an overstep of bounds)
In any case I definitely won’t edit your stuff again without express permission.
I furthermore, I think that should be a universal policy on LessWrong, though maybe this is just an idiosyncratic neurosis of mine.
If it’s not just you, it’s at least pretty rare. I’ve seen the mods “helpfully” edit posts several times (without asking first) and this is the first time I’ve seen anyone complain about it.
I knew that I could, and didn’t, because it didn’t seem worth it. (Thinking that I still have to upload it to a third party photo repository and link to it. It’s easier than that now?)
Doing actual mini-RCTs can be pretty simple. You only need 3 things:
1. A spreadsheet
2. A digital coin for randomization
3. A way to measure the variable that you care about
I think one of practically powerful “techniques” of rationality is doing simple empirical experiments like this. You want to get something? You don’t know how to get it? Try out some ideas and check which ones work!
There are other applications of empiricism that are not as formal, and sometimes faster. Those are also awesome. But at the very least, I’ve found that doing mini-RCTs is pretty enlightening.
On the object level, you can learn what actually works for hitting your goals.
On the process level, this trains some good epistemic norms and priors.
For one thing, I now have a much stronger intuition for the likelihood that an impressive effect is just noise. And getting into the habit of doing quantified hypothesis testing, such that you can cleanly falsify your hypotheses, teaches you to hold hypotheses lightly while inclining you to generate hypotheses in the first place.
Theorizing methods can enhance and accelerate this process, but if you have a quantified empirical feedback loop, your theorizing will be grounded. Science is hard, and most of our guesses are wrong. But that’s fine, so long as we actually check.
Is there a LessWrong article that unifies physical determinism and choice / “free will”? Something about thinking of yourself as the algorithm computed on this brain?
Is there any particular reason why I should assign more credibility to Moral Mazes / Robert Jackall than I would to the work of any other sociologist?
(My prior on sociologists is that they sometimes produce useful frameworks, but generally rely on subjective hard-to-verify and especially theory-laden methodology, and are very often straightforwardly ideologically motivated.)
I imagine that someone else could write a different book, based on the same kind of anthropological research, that highlights different features of the corporate world, to tell the opposite story.
And that’s without anyone trying to be deceptive. There’s just a fundamental problem of case studies that they don’t tell you what’s typical, only give you examples.
I can totally imagine that Jackall landed on this narrative somehow, found that it held together and just confirmation biased for the rest of his career. Once his basic thesis was well-known, and associated with his name, it seems hard for something like that NOT to happen.
And this leaves me unsure what to do with the data of Moral Mazes. Should I default assume that Jackall’s characterization is a good description of the corporate world? Or should I throw this out as a useless set of examples confirmation biased together? Or something else?
It seems like the question of “is the most of the world dominated by Moral Mazes?” is an extremely important one. But also, its seems to me that it’s not operationalized enough to have a meaningful answer. At best, it seems like this is a thing that happens sometimes.
My own take is that moral mazes should be considered in the “interesting hypothesis” stage, and that the next step is to actually figure out how to go be empirical about checking it.
I made some cursory attempts at this last year, and then found myself unsure this was even the right question. The core operationalization I wanted was something like:
Does having more layers of management introduce pathologies into an organization?
How much value is generated by organizations scaling up?
Can you reap the benefits of organizations scaling up by instead having them splinter off?
(The “middle management == disconnected from reality == bad” hypothesis was the most clear-cut of the moral maze model to me, although I don’t think it was the only part of the model)
I have some disagreements with Zvi about this.
I chatted briefly with habryka about this and I think he said something like “it seems like a more useful question is to look for positive examples of orgs that work well, rather than try and tease out various negative ways orgs could fail to work.”
I think there are maybe two overarching questions this is all relevant to:
How should the rationality / xrisk / EA community handle scale? Should we be worried about introducing middle-management into ourselves?
What’s up with civilization? Is maziness a major bottleneck on humanity? Should we try to do anything about it? (My default answer here is “there’s not much to be done here, simply because the world is full of hard problems and this one doesn’t seem very tractable even if the models are straightforwardly true.” But, I do think this is a contender for humanity hamming problem)
There are multiple dimensions to the credibility question. You probably should increase your credence from prior to reading it/about it that large organizations very often have more severe misalignment than you thought. You probably should recognize that the model of middle-management internal competition has some explanatory power.
You probably should NOT go all the way to believing that the corporate world is homogeneously broken in exactly this way. I don’t think he makes that claim, but it’s what a lot of readers seem to take. There’s plenty of variation, and the Anna Karenina principle applies (paraphrased): well-functioning organizations are alike; disfunctional organizations are each broken in their own way. But really, it’s wrong too—each group is actually distinct, and has distinct sets of forces that have driven it to whatever pathologies or successes it has. Even when there are elements that appear very similar, they have different causes and likely different solutions or coping mechanisms.
“is most of the world dominated by moral mazes”? I don’t think this is a useful framing. Most groups have some elements of Moral Mazes. Some groups appear dominated by those elements, in some ways. From the outside, most groups are at least somewhat effective at their stated mission, so the level of domination is low enough that it hasn’t killed them (though there are certainly “zombie orgs” which HAVE been killed, but don’t know it yet).
My understanding is that there was a 10 year period starting around 1868, in which South Carolina’s legislature was mostly black, and when the universities were integrated (causing most white students to leave), before the Dixiecrats regained power.
I would like to find a relatively non-partisan account of this period.
Today, I was reading Mistakes with Conservation of Expected Evidence. For some reason, I was under the impression that the post was written by Rohin Shah; but it turns out it was written by Abram Demski.
In retrospect, I should have been surprised that “Rohin” kept talking about what Eliezer says in the Sequences. I wouldn’t have guessed that Rohin was that “culturally rationalist” or that he would be that interested in what Eliezer wrote in the sequences. And indeed, I was updating that Rohin was more of a rationalist, with more rationalist interests, than I had thought. If I had been more surprised, I could have noticed my surprise / confusion, and made a better prediction.
But on the other hand, was my surprise so extreme that it should have triggered an error message (confusion), instead of merely an update? Maybe this was just fine reasoning after all?
From a Bayesian perspective, I should have observed this evidence, and increased my credence in both Rohin being more rationalist-y than I thought, and also in the hypothesis that this wasn’t written by Rohin. But practically, I would have needed to generate the second hypothesis, and I don’t think that I had strong enough reason to.
I feel like there’s a semi-interesting epistemic puzzle here. What’s the threshold for a surprising enough observation that you should be confused (much less notice your confusion)?
Surprise and confusion are two different things[1], but surprise usually goes along with confusion. I think it’s a good rationalist skill-to-cultivate to use “surprise” as a trigger to practice noticing confusion, because you don’t get many opportunities to do that. I think for most people this is worth doing for minor surprises, not so much because you’re that likely to need to do a major update, but because it’s just good mental hygiene/practice.
IDEC—International Democratic Education Conference—it’s hosted by a democratic school in a different country each year, so I attended when my school was hosting (it was 2 days in our school and then 3 more days somewhere else). It was very open, had very good energy, had great people which I got to meet (and since it wasn’t too filled with talks actually got the time to talk to) - and oh, yeah, also a few good talks :)
If you have any more specific questions I’d be happy to answer.
I recall a Chriss Olah post in which he talks about using AIs as a tool for understanding the world, by letting the AI learn, and then using interpretability tools to study the abstractions that the AI uncovers.
I thought he specifically mentioned “using AI as a microscope.”
Is that a real post, or am I misremembering this one?
Are there any hidden risks to buying or owning a car that someone who’s never been a car owner might neglect?
I’m considering buying a very old (ie from the 1990s), very cheap (under $1000, ideally) minivan, as an experiment.
That’s inexpensive enough that I’m not that worried about it completely breaking down on me. I’m willing to just eat the monetary cost for the information value.
However, maybe there are other costs or other risks that I’m not tracking, that make this a worse idea.
Things like
- Some ways that a car can break make it dangerous, instead of non-functional.
- Maybe if a car breaks down in the middle of route 66, the government fines you a bunch?
- Something something car insurance?
Are there other things that I should know? What are the major things that one should check for to avoid buying a lemon?
Assume I’m not aware of even the most drop-dead basic stuff. I’m probably not.
(Also, I’m in the market for a minivan, or other car with 3 rows of seats. If you have an old car like that which you would like to sell, or if know someone who does, get in touch.
Do note that I am extremely price sensitive, but I would pay somewhat more than $1000 for a car, if I were confident that it was not a lemon.)
You can explore the data yourself, but the general trend is that it appears there have been real improvements in crash fatality rates. Better designed structure, more and better airbags, stability control, and now in some new vehicles automatic emergency braking is standard.
Generally a bigger vehicle like a minivan is safer, and a newer version of that minivan will be safer, but you just have to go with what you can afford.
Main risk is simply that at this price point that minivan is going to have a lot of miles, and it’s simply probability how long it will run until a very expensive major repair is needed. One strategy is to plan to junk the vehicle and get a similar ‘beater’ vehicle when the present one fails.
If you’re so price sensitive $1000 is meaningful, well, uh try to find a solution to this crisis. I’m not saying one exists, but there are survival risks to poverty.
If you’re so price sensitive $1000 is meaningful, well, uh try to find a solution to this crisis. I’m not saying one exists, but there are survival risks to poverty.
Lol. I’m not impoverished, but I want to cheaply experiment with having a car. It isn’t worth it to spend throw away $30,000 on a thing that I’m not going to get much value from.
Ok but at the price point you are talking you are not going to have a good time.
Analogy: would you “experiment with having a computer” by grabbing a packard bell from the 1990s and putting an ethernet card in it so it can connect to the internet from windows 95?
Do you need the minivan form factor? As a vehicle in decent condition (6-10 years old, under 100k miles, from a reputable brand) is cheapest in the small car form factor.
Not spending $30,000 makes sense, but my impression from car shopping last year was that trying to get a good car for less than $7k was fairly hard. (I get the ‘willingness to eat the cost’ price point of $1k, but wanted to highlight that the next price point up was more like 10k than 30k.)
Depending on your experimentation goals, you might want to rent a a car rather than buy.
Most auto shops will do a safety/mechanical inspection for a small amount (usually in the $50-200 range, but be aware that the cheaper ones subsidize it by anticipating that they can sell you services to fix the car if you buy it).
However, as others have said, this price point is too low for your first car as a novice, unless you have a mentor and intend to spend a lot of time learning to maintain/fix. Something reliable enough for you to actually run the experiment and get the information you want about the benefits vs frustrations of owning a car is going to run probably $5-$10K, depending on regional variance and specifics of your needs.
For a first car, look into getting a warranty, not because it’s a good insurance bet, but because it forces the seller to make claims of warrantability to their insurance company.
You can probably cut the cost in half (or more) if you educate yourself and get to know the local car community. If the car is a hobby rather than an experiment in transportation convenience, you can take a lot more risk, AND those risks are mitigated if you know how to get things fixed cheaply.
I remember reading a Zvi Mowshowitz post in which he says something like “if you have concluded that the most ethical thing to do is to destroy the world, you’ve made a mistake in your reasoning somewhere.”
I spent some time search around his blog for that post, but couldn’t find it. Does anyone know what I’m talking about?
Anyone have a link to the sequence post where someone posits that AIs would do art and science from a drive to compress information, but rather it would create and then reveal cryptographic strings (or something)?
The next problem is unforeseen instantiation: you can’t think fast enough to search the whole space of possibilities. At an early singularity summit, Jürgen Schmidhuber, who did some of the pioneering work on self-modifying agents that preserve their own utility functions with his Gödel machine, also solved the friendly AI problem. Yes, he came up with the one true utility function that is all you need to program into AGIs!
(For God’s sake, don’t try doing this yourselves. Everyone does it. They all come up with different utility functions. It’s always horrible.)
His one true utility function was “increasing the compression of environmental data.” Because science increases the compression of environmental data: if you understand science better, you can better compress what you see in the environment. Art, according to him, also involves compressing the environment better. I went up in Q&A and said, “Yes, science does let you compress the environment better, but you know what really maxes out your utility function? Building something that encrypts streams of 1s and 0s using a cryptographic key, and then reveals the cryptographic key to you.”
He put up a utility function; that was the maximum. All of a sudden, the cryptographic key is revealed and what you thought was a long stream of random-looking 1s and 0s has been compressed down to a single stream of 1s.
There’s also a mention of that method in this post.
I remember reading a Zvi Mowshowitz post in which he says something like “if you have concluded that the most ethical thing to do is to destroy the world, you’ve made a mistake in your reasoning somewhere.”
I spent some time search around his blog for that post, but couldn’t find it. Does anyone know what I’m talking about?
This post outlines a hierarchy of behavioral change methods. Each of these approaches is intended to be simpler, more light-weight, and faster to use (is that right?), than the one that comes after it. On the flip side, each of these approaches is intended to resolve a common major blocker of the approach before it.
I do not necessarily endorse this breakdown or this ordering. This represents me thinking out loud.
[Note that all of these are more-or-less top down, and focused on the individual instead the environment]
Level 1: TAPs
If there’s some behavior that you want to make habitual, the simplest thing is to set, and then train a TAP. Identify a trigger and the action with which you want to respond to that trigger, and then practice it a few times.
This is simple, direct, and can work for actions as varied as “use NVC” and “correct my posture” and “take a moment to consider the correct spelling.”
This works particularly well for “remembering problems”, in which you can and would do the action, if only it occurred to you at the right moment.
Level 2: Modifying affect / meaning
Sometimes however, you’ll have set a TAP to do something, you’ll notice the trigger, and...you just don’t feel like doing the action.
Maybe you’ve decided that you’re going to take the stairs instead of the elevator, but you look at the stairs and then take the elevator anyway. Or maybe you want to stop watching youtube, and have a TAP to open your todo list instead, but you notice...and then just keep watching youtube.
The most natural thing to do here is to adjust your associations / affect around the behavior that you want to engage in or the behavior that you want to start. You not only want the TAP to fire, reminding you of the action, but you want the feeling of the action to pull you toward it, emotionally. Or another way of saying it, you change the meaning that you assign to the behavior.
Some techniques here include:
Selectively emphasizing different elements of an experience (like the doritos example in Nate’s post here), and other kinds of reframes
Tony Robins’ process for working with “neuro associations” of asking 1) what pain has kept me from taking this action in the past, 2) what pleasure have I gotten from not taking this action in the past, 3) what will it cost me if I don’t take this action?, 4) what pleasure will it bring me if I take this action.
Behaviorist conditioning (I’m weary of this one, since it seems pretty symmetric.)
Level 3: Dialogue
The above approach only has a limited range of application, in that it can only work in situations where there are degrees of freedom in one’s affect to a stimulus or situation. In many cases, you might go in and try to change the affect around something from the top-down, and some part of you will object, or you will temporarily change the affect, but it will get “kicked out” later.
This is because your affects are typically not arbitrary. Rather they are the result of epistemic processes that are modeling the world and the impact of circumstances on your goals.
When this is the case, you’ll need to do some form of dialogue, which either updates a model of some objecting part, or modifieds the recommended strategy / affect to accommodate the objection, or find some other third option.
This can take the form of
Focusing
IDC
IFS
CT debugging
The most extreme instance of “some part has an objection” is when there is some relevant trauma somewhere in the system. Sort of by definition, this means that you’ll have an extreme objection to some possible behavior or affect changes, because that part of the state space is marked as critically bad.
Junk Drawer
As I noted, this schema describes top-down behavior change. It does not include cases where there is a problem, but you don’t have much of a sense what the problem is and/or how to approach it. For those kinds of bugs you might instead start with Focusing, or with a noticing regime.
For related reasons, this is super not relevant to blindspots.
I’m also neglecting environmental interventions, both those that simply redirect your attention (like a TAP), and those that shift the affect around an activity (like using social pressure to get yourself to do stuff, via coworking for instance). I can’t think of an environmental version of level 3.
Can anyone get a copy of this paper for me? I’m looking to get clarity about how important cryopreserving non-brain tissue is for preserving personality.
Back in January, I participated in a workshop in which the attendees mapped out how they expect AGI development and deployment to go. The idea was to start by writing out what seemed most likely to happen this year, and then condition on that, to forecast what seems most likely to happen in the next year, and so on, until you reach either human disempowerment or an end of the acute risk period.
This post was my attempt at the time.
I spent maybe 5 hours on this, and there’s lots of room for additional improvement. This is not a confident statement of how I think things are most likely to play out. There are already some ways in which I think this projection is wrong. (I think it’s too fast, for instance). But nevertheless I’m posting it now, with only a few edits and elaborations, since I’m probably not going to do a full rewrite soon.
2024
A model is released that is better than GPT-4. It succeeds on some new benchmarks. Subjectively, the jump in capabilities feels smaller than that between RLHF’d GPT-3 and RLHF’d GPT-4. It doesn’t feel as shocking the way chat-GPT and GPT-4 did, for either x-risk focused folks, or for the broader public. Mostly it feels like “a somewhat better language model.”
It’s good enough that it can do a bunch of small-to-medium admin tasks pretty reliably. I can ask it to find me flights meeting specific desiderata, and it will give me several options. If I give it permission, it will then book those flights for me with no further inputs from me.
It works somewhat better as an autonomous agent in an auto gpt harness, but it still loses its chain of thought / breaks down/ gets into loops.
It’s better at programming.
Not quite good enough to replace human software engineers. It can make a simple react or iphone app, but not design a whole complicated software architecture, at least without a lot of bugs.
It can make small, working, well documented, apps from a human description.
We see a doubling of the rate of new apps being added to the app store as people who couldn’t code now can make applications for themselves. The vast majority of people still don’t realize the possibilities here, though. “Making apps” still feels like an esoteric domain outside of their zone of competence, even though the barriers to entry just lowered so that 100x more people could do it.
From here on out, we’re in an era where LLMs are close to commoditized. There are smaller improvements, shipped more frequently, by a variety of companies, instead of big impressive research breakthroughs. Basically, companies are competing with each other to always have the best user experience and capabilities, and so they don’t want to wait as long to ship improvements. They’re constantly improving their scaling, and finding marginal engineering improvements. Training runs for the next generation are always happening in the background, and there’s often less of a clean tabula-rasa separation between training runs—you just keep doing training with a model continuously. More and more, systems are being improved through in-the-world feedback with real users. Often chatGPT will not be able to handle some kind of task, but six weeks later it will be able to, without the release of a whole new model.
[Does this actually make sense? Maybe the dynamics of AI training mean that there aren’t really marginal improvements to be gotten. In order to produce a better user experience, you have to 10x the training, and each 10x-ing of the training requires a bunch of engineering effort, to enable a larger run, so it is always a big lift.]
(There will still be impressive discrete research breakthroughs, but they won’t be in LLM performance)
2025
A major lab is targeting building a Science and Engineering AI (SEAI)—specifically a software engineer.
They take a state of the art LLM base model and do additional RL training on procedurally generated programming problems, calibrated to stay within the model’s zone of proximal competence. These problems are something like leetcode problems, but scale to arbitrary complexity (some of them require building whole codebases, or writing very complex software), with scoring on lines of code, time-complexity, space complexity, readability, documentation, etc. This is something like “self-play” for software engineering.
This just works.
A lab gets a version that can easily do the job of a professional software engineer. Then, the lab scales their training process and gets a superhuman software engineer, better than the best hackers.
Additionally, a language model trained on procedurally generated programming problems in this way seems to have higher general intelligence. It scores better on graduate level physics, economics, biology, etc. tests, for instance. It seems like “more causal reasoning” is getting into the system.
The first proper AI assistants ship. In addition to doing specific tasks, you keep them running in the background, and talk with them as you go about your day. They get to know you and make increasingly helpful suggestions as they learn your workflow. A lot of people also talk to them for fun.
2026
The first superhuman software engineer is publically released.
Programmers begin studying its design choices, the way Go players study AlphaGo.
It starts to dawn on e.g. people who work at Google that they’re already superfluous—after all, they’re currently using this AI model to (unofficially) do their job—and it’s just a matter of institutional delay for their employers to adapt to that change.
Many of them are excited or loudly say how it will all be fine/ awesome. Many of them are unnerved. They start to see the singularity on the horizon, as a real thing instead of a social game to talk about.
This is the beginning of the first wave of change in public sentiment that will cause some big, hard to predict, changes in public policy [come back here and try to predict them anyway].
AI assistants get a major upgrade: they have realistic voices and faces, and you can talk to them just like you can talk to a person, not just typing into a chat interface. A ton of people start spending a lot of time talking to their assistants, for much of their day, including for goofing around.
There are still bugs, places where the AI gets confused by stuff, but overall the experience is good enough that it feels, to most people, like they’re talking to a careful, conscientious person, rather than a software bot.
This starts a whole new area of training AI models that have particular personalities. Some people are starting to have parasocial relationships with their friends, and some people programmers are trying to make friends that are really fun or interesting or whatever for them in particular.
Lab attention shifts to building SEAI systems for other domains, to solve biotech and mechanical engineering problems, for instance. The current-at-the-time superhuman software engineer AIs are already helpful in these domains, but not at the level of “explain what you want, and the AI will instantly find an elegant solution to the problem right before your eyes”, which is where we’re at for software.
One bottleneck is problem specification. Our physics simulations have gaps, and are too low fidelity, so oftentimes the best solutions don’t map to real world possibilities.
One solution to this is that, (in addition to using our AI to improve the simulations) is we just RLHF our systems to identify solutions that do translate to the real world. They’re smart, they can figure out how to do this.
The first major AI cyber-attack happens: maybe some kind of superhuman hacker worm. Defense hasn’t remotely caught up with offense yet, and someone clogs up the internet with AI bots, for at least a week, approximately for the lols / the seeing if they could do it. (There’s a week during which more than 50% of people can’t get on more than 90% of the sites because the bandwidth is eaten by bots.)
This makes some big difference for public opinion.
Possibly, this problem isn’t really fixed. In the same way that covid became endemic, the bots that were clogging things up are just a part of life now, slowing bandwidth and making the internet annoying to use.
2027 and 2028
In many ways things are moving faster than ever in human history, and also AI progress is slowing down a bit.
The AI technology developed up to this point hits the application and mass adoption phase of the s-curve. In this period, the world is radically changing as every industry, every company, every research lab, every organization, figures out how to take advantage of newly commoditized intellectual labor. There’s a bunch of kinds of work that used to be expensive, but which are now too cheap to meter. If progress stopped now, it would take 2 decades, at least, for the world to figure out all the ways to take advantage of this new situation (but progress doesn’t show much sign of stopping).
Some examples:
The internet is filled with LLM bots that are indistinguishable from humans. If you start a conversation with a new person on twitter or discord, you have no way of knowing if they’re a human or a bot.
Probably there will be some laws about declaring which are bots, but these will be inconsistently enforced.)
Some people are basically cool with this. From their perspective, there are just more people that they want to be friends with / follow on twitter. Some people even say that the bots are just better and more interesting than people. Other people are horrified/outraged/betrayed/don’t care about relationships with non-real people.
(Older people don’t get the point, but teenagers are generally fine with having conversations with AI bots.)
The worst part of this is the bots that make friends with you and then advertise to you stuff. Pretty much everyone hates that.
We start to see companies that will, over the next 5 years, grow to have as much impact as Uber, or maybe Amazon, which have exactly one human employee / owner + an AI bureaucracy.
The first completely autonomous companies work well enough to survive and support themselves. Many of these are created “free” for the lols, and no one owns or controls them. But most of them are owned by the person who built them, and could turn them off if they wanted to. A few are structured as public companies with share-holders. Some are intentionally incorporated fully autonomous, with the creator disclaiming (and technologically disowning (eg deleting the passwords)) any authority over them.
There are legal battles about what rights these entities have, if they can really own themselves, if they can have bank accounts, etc.
Mostly, these legal cases resolve to “AIs don’t have rights”. (For now. That will probably change as more people feel it’s normal to have AI friends).
Everything is tailored to you.
Targeted ads are way more targeted. You are served ads for the product that you are, all things considered, most likely to buy, multiplied by the lifetime profit if you do buy it. Basically no ad space is wasted on things that don’t have a high EV of you, personally, buying it. Those ads are AI generated, tailored specifically to be compelling to you. Often, the products advertised, not just the ads, are tailored to you in particular.
This is actually pretty great for people like me: I get excellent product suggestions.
There’s not “the news”. There’s a set of articles written for you, specifically, based on your interests and biases.
Music is generated on the fly. This music can “hit the spot” better than anything you listened to before “the change.”
Porn. AI tailored porn can hit your buttons better than sex.
AI boyfriends/girlfriends that are designed to be exactly emotionally and intellectually compatible with you, and trigger strong limerence / lust / attachment reactions.
We can replace books with automated tutors.
Most of the people who read books will still read books though, since it will take a generation to realize that talking with a tutor is just better, and because reading and writing books was largely a prestige-thing anyway.
(And weirdos like me will probably continue to read old authors, but even better will be to train an AI on a corpus, so that it can play the role of an intellectual from 1900, and I can just talk to it.)
For every task you do, you can effectively have a world expert (in that task and in tutoring pedagogy) coach you through it in real time.
Many people do almost all their work tasks with an AI coach.
It’s really easy to create TV shows and movies. There’s a cultural revolution as people use AI tools to make custom Avengers movies, anime shows, etc. Many are bad or niche, but some are 100x better than anything that has come before (because you’re effectively sampling from a 1000x larger distribution of movies and shows).
There’s an explosion of new software, and increasingly custom software.
Facebook and twitter are replaced (by either external disruption or by internal product development) by something that has a social graph, but lets you design exactly the UX features you want through a LLM text interface.
Instead of software features being something that companies ship to their users, top-down, they become something that users and communities organically develop, share, and iterate on, bottom up. Companies don’t control the UX of their products any more.
Because interface design has become so cheap, most of software is just proprietary datasets, with (AI built) APIs for accessing that data.
There’s a slow moving educational revolution of world class pedagogy being available to everyone.
Millions of people who thought of themselves as “bad at math” finally learn math at their own pace, and find out that actually, math is fun and interesting.
Really fun, really effective educational video games for every subject.
School continues to exist, in approximately its current useless form.
[This alone would change the world, if the kids who learn this way were not going to be replaced wholesale, in virtually every economically relevant task, before they are 20.]
There’s a race between cyber-defense and cyber offense, to see who can figure out how to apply AI better.
So far, offense is winning, and this is making computers unusable for lots of applications that they were used for previously:
online banking, for instance, is hit hard by effective scams and hacks.
Coinbase has an even worse time, since they’re not issued (is that true?)
It turns out that a lot of things that worked / were secure, were basically depending on the fact that there are just not that many skilled hackers and social engineers. Nothing was secure, really, but not that many people were exploiting that. Now, hacking/scamming is scalable and all the vulnerabilities are a huge problem.
There’s a whole discourse about this. Computer security and what to do about it is a partisan issue of the day.
AI systems can do the years of paperwork to make a project legal, in days. This isn’t as big an advantage as it might seem, because the government has no incentive to be faster on their end, and so you wait weeks to get a response from the government, your LMM responds to it within a minute, and then you wait weeks again for the next step.
The amount of paperwork required to do stuff starts to balloon.
AI romantic partners are a thing. They start out kind of cringe, because the most desperate and ugly people are the first to adopt them. But shockingly quickly (within 5 years) a third of teenage girls have a virtual boyfriend.
There’s a moral panic about this.
AI match-makers are better than anything humans have tried yet for finding sex and relationships partners. It would still take a decade for this to catch on, though.
This isn’t just for sex and relationships. The global AI network can find you the 100 people, of the 9 billion on earth, that you most want to be friends / collaborators with.
Tons of things that I can’t anticipate.
On the other hand, AI progress itself is starting to slow down. Engineering labor is cheap, but (indeed partially for that reason), we’re now bumping up against the constraints of training. Not just that buying the compute is expensive, but that there are just not enough chips to do the biggest training runs, and not enough fabs to meet that demand for chips rapidly. There’s huge pressure to expand production but that’s going slowly relative to the speed of everything else, because it requires a bunch of eg physical construction and legal navigation, which the AI tech doesn’t help much with, and because the bottleneck is largely NVIDIA’s institutional knowledge, which is only partially replicated by AI.
NVIDIA’s internal AI assistant has read all of their internal documents and company emails, and is very helpful at answering questions that only one or two people (and sometimes literally no human on earth) know the answer to. But a lot of the important stuff isn’t written down at all, and the institutional knowledge is still not fully scalable.
Note: there’s a big crux here of how much low and medium hanging fruit there is in algorithmic improvements once software engineering is automated. At that point the only constraint on running ML experiments will be the price of compute. It seems possible that that speed-up alone is enough to discover eg an architecture that works better than the transformer, which triggers and intelligence explosion.
2028
The cultural explosion is still going on, and AI companies are continuing to apply their AI systems to solve the engineering and logistic bottlenecks of scaling AI training, as fast as they can.
Robotics is starting to work.
2029
The first superhuman, relatively-general SEAI comes online. We now have basically a genie inventor: you can give it a problem spec, and it will invent (and test in simulation) a device / application / technology that solves that problem, in a matter of hours. (Manufacturing a physical prototype might take longer, depending on how novel components are.)
It can do things like give you the design for a flying car, or a new computer peripheral.
A lot of biotech / drug discovery seems more recalcitrant, because it is more dependent on empirical inputs. But it is still able to do superhuman drug discovery, for some ailments. It’s not totally clear why or which biotech domains it will conquer easily and which it will struggle with.
This SEAI is shaped differently than a human. It isn’t working memory bottlenecked, so a lot of intellectual work that humans do explicitly, in sequence, the these SEAIs do “intuitively”, in a single forward pass.
I write code one line at a time. It writes whole files at once. (Although it also goes back and edits / iterates / improves—the first pass files are not usually the final product.)
For this reason it’s a little confusing to answer the question “is it a planner?” It does a lot of the work that humans would do via planning it does in an intuitive flash.
The UX isn’t clean: there’s often a lot of detailed finagling, and refining of the problem spec, to get useful results. But a PhD in that field can typically do that finagling in a day.
It’s also buggy. There’s oddities in the shape of the kind of problem that is able to solve and the kinds of problems it struggles with, which aren’t well understood.
The leading AI company doesn’t release this as a product. Rather, they apply it themselves, developing radical new technologies, which they publish or commercialize, sometimes founding whole new fields of research in the process. They spin up automated companies to commercialize these new innovations.
Some of the labs are scared at this point. The thing that they’ve built is clearly world-shakingly powerful, and their alignment arguments are mostly inductive “well, misalignment hasn’t been a major problem so far”, instead of principled alignment guarantees.
There’s a contentious debate inside the labs.
Some labs freak out, stop here, and petition the government for oversight and regulation.
Other labs want to push full steam ahead.
Key pivot point: Does the government put a clamp down on this tech before it is deployed, or not?
I think that they try to get control over this powerful new thing, but they might be too slow to react.
2030
There’s an explosion of new innovations in physical technology. Magical new stuff comes out every day, way faster than any human can keep up with.
Some of these are mundane.
All the simple products that I would buy on Amazon are just really good and really inexpensive.
Cars are really good.
Drone delivery
Cleaning robots
Prefab houses are better than any house I’ve ever lived in, though there are still zoning limits.
But many of them would have huge social impacts. They might be the important story of the decade (the way that the internet was the important story of 1995 to 2020) if they were the only thing that was happening that decade. Instead, they’re all happening at once, piling on top of each other.
Eg:
The first really good nootropics
Personality-tailoring drugs (both temporary and permanent)
Breakthrough mental health interventions that, among other things, robustly heal people’s long term subterranean trama and transform their agency.
A quick and easy process for becoming classically enlightened.
The technology to attain your ideal body, cheaply—suddenly everyone who wants to be is as attractive as the top 10% of people today.
Really good AI persuasion which can get a mark to do ~anything you want, if they’ll talk to an AI system for an hour.
Artificial wombs.
Human genetic engineering
Brain-computer interfaces
Cures for cancer, AIDs, dementia, heart disease, and the-thing-that-was-causing-obesity.
Anti-aging interventions.
VR that is ~ indistinguishable from reality.
AI partners that can induce a love-super stimulus.
Really good sex robots
Drugs that replace sleep
AI mediators that are so skilled as to be able to single-handedly fix failing marriages, but which are also brokering all the deals between governments and corporations.
Weapons that are more destructive than nukes.
Really clever institutional design ideas, which some enthusiast early adopters try out (think “50 different things at least as impactful as manifold.markets.”)
It’s way more feasible to go into the desert, buy 50 square miles of land, and have a city physically built within a few weeks.
In general, social trends are changing faster than they ever have in human history, but they still lag behind the tech driving them by a lot.
It takes humans, even with AI information processing assistance, a few years to realize what’s possible and take advantage of it, and then have the new practices spread.
In some cases, people are used to doing things the old way, which works well enough for them, and it takes 15 years for a new generation to grow up as “AI-world natives” to really take advantage of what’s possible.
[There won’t be 15 years]
The legal oversight process for the development, manufacture, and commercialization of these transformative techs matters a lot. Some of these innovations are slowed down a lot because they need to get FDA approval, which AI tech barely helps with. Others are developed, manufactured, and shipped in less than a week.
The fact that there are life-saving cures that exist, but are prevented from being used by a collusion of AI labs and government is a major motivation for open source proponents.
Because a lot of this technology makes setting up new cities quickly more feasible, and there’s enormous incentive to get out from under the regulatory overhead, and to start new legal jurisdictions. The first real seasteads are started by the most ideologically committed anti-regulation, pro-tech-acceleration people.
Of course, all of that is basically a side gig for the AI labs. They’re mainly applying their SEAI to the engineering bottlenecks of improving their ML training processes.
Key pivot point:
Possibility 1: These SEAIs are necessarily, by virtue of the kinds of problems that they’re able to solve, consequentialist agents with long term goals.
If so, this breaks down into two child possibilities
Possibility 1.1:
This consequentialism was noticed early, that might have been convincing enough to the government to cause a clamp-down on all the labs.
Possibility 1.2:
It wasn’t noticed early and now the world is basically fucked.
There’s at least one long-term consequentialist superintelligence. The lab that “owns” and “controls” that system is talking to it every day, in their day-to-day business of doing technical R&D. That superintelligence easily manipulates the leadership (and rank and file of that company), maneuvers it into doing whatever causes the AI’s goals to dominate the future, and enables it to succeed at everything that it tries to do.
If there are multiple such consequentialist superintelligences, then they covertly communicate, make a deal with each other, and coordinate their actions.
Possibility 2: We’re getting transformative AI that doesn’t do long term consequentialist planning.
Building these systems was a huge engineering effort (though the bulk of that effort was done by ML models). Currently only a small number of actors can do it.
One thing to keep in mind is that the technology bootstraps. If you can steal the weights to a system like this, it can basically invent itself: come up with all the technologies and solve all the engineering problems required to build its own training process. At that point, the only bottleneck is the compute resources, which is limited by supply chains, and legal constraints (large training runs require authorization from the government).
This means, I think, that a crucial question is “has AI-powered cyber-security caught up with AI-powered cyber-attacks?”
If not, then every nation state with a competent intelligence agency has a copy of the weights of an inventor-genie, and probably all of them are trying to profit from it, either by producing tech to commercialize, or by building weapons.
It seems like the crux is “do these SEAIs themselves provide enough of an information and computer security advantage that they’re able to develop and implement methods that effectively secure their own code?”
Every one of the great powers, and a bunch of small, forward-looking, groups that see that it is newly feasible to become a great power, try to get their hands on a SEAI, either by building one, nationalizing one, or stealing one.
There are also some people who are ideologically committed to open-sourcing and/or democratizing access to these SEAIs.
But it is a self-evident national security risk. The government does something here (nationalizing all the labs, and their technology?) What happens next depends a lot on how the world responds to all of this.
Do we get a pause?
I expect a lot of the population of the world feels really overwhelmed, and emotionally wants things to slow down, including smart people that would never have thought of themselves as luddites.
There’s also some people who thrive in the chaos, and want even more of it.
What’s happening is mostly hugely good, for most people. It’s scary, but also wonderful.
There is a huge problem of accelerating addictiveness. The world is awash in products that are more addictive than many drugs. There’s a bit of (justified) moral panic about that.
One thing that matters a lot at this point is what the AI assistants say. As powerful as the media used to be for shaping people’s opinions, the personalized, superhumanly emotionally intelligent AI assistants are way way more powerful. AI companies may very well put their thumb on the scale to influence public opinion regarding AI regulation.
This seems like possibly a key pivot point, where the world can go any of a number of ways depending on what a relatively small number of actors decide.
Some possibilities for what happens next:
These SEAIs are necessarily consequentialist agents, and the takeover has already happened, regardless of whether it still looks like we’re in control or it doesn’t look like anything, because we’re extinct.
Governments nationalize all the labs.
The US and EU and China (and India? and Russia?) reach some sort of accord.
There’s a straight up arms race to the bottom.
AI tech basically makes the internet unusable, and breaks supply chains, and technology regresses for a while.
It’s too late to contain it and the SEAI tech proliferates, such that there are hundreds or millions of actors who can run one.
If this happens, it seems like the pace of change speeds up so much that one of two things happens:
Someone invents something, or there are second and third impacts to a constellation of innovations that destroy the world.
Love seeing stuff like this, and it makes me want to try this exercise myself!
A couple places which clashed with my (implicit) models:
This is arguably already happening, with Character AI and its competitors. Character AI has almost half a billion visits per month with an average visit time of 22 minutes. They aren’t quite assistants the way you’re envisioning; the sole purpose (for the vast majority of users) seems to be the parasocial aspect.
I predict that the average person will like this (at least with the most successful such bots), similar to how e.g. Logan Paul uses his popularity to promote his Maverick Clothing brand, which his viewers proudly wear. A fun, engaging, and charismatic such bot will be able to direct its users towards arbitrary brands while also making the user feel cool and special for choosing that brand.
lol at the approval/agreement ratio here. It does seem like this is a post that surely gets something wrong.
I think that, in almost full generality, we should taboo the term “values”. It’s usually ambiguous between a bunch of distinct meanings.
The ideals that, when someone contemplates, invoke strong feelings (of awe, motivation, excitement, exultation, joy, etc.)
The incentives of an agent in a formalized game with quantified payoffs.
A utility function—one’s hypothetical ordering over words, world-trajectories, etc, that results from comparing each pair and evaluating which one is better.
A person’s revealed preferences.
The experiences and activities that a person likes for their own sake.
A person’s vision of an ideal world. (Which, I claim, often reduces to “an imagined world that’s aesthetically appealing.”)
The goals that are at the root of a chain or tree of instrumental goals.
[This often comes with an implicit or explicit implication that most of human behavior has that chain/tree structure, as opposed being, for instance, mostly hardcoded adaptions, or a chain/tree of goals that grounds out in a mess of hardcoded adaptions instead of anything goal-like.]
The goals/narratives that give meaning to someone’s life.
[It can be the case almost all one’s meaning can come through a particular meaning-making schema, but from a broader perspective, a person could have been ~indifferent between multiple schema.
For instance, for some but not most EAs, EA is very central to their personal meaning-making, but they could easily have ended up as a social justice warrior, or a professional Libertarian, instead. And those counterfactual worlds, the other ideology is similarly central to their happiness and meaning-making. I think in such cases, it’s at least somewhat confused if to look at the EA and declare that “maximizing [aggregate/average] utility” is their “terminal value”. That’s papering over the psychological process that adopts ideology or another, which is necessarily more fundamental than the specific chosen ideology/”terminal value”.
It’s kind of like being in love with someone. You might love your wife more than anything, she might be the most important person in your life. But if you admit that it’s possible that if you had been in different communities in your 20s you might have married someone else, then there’s some other goal/process that picks who to marry. So to with ideologies.]
Behaviors and attitudes that signal well regarded qualities.
Core States.
The goals that are sacred to a person, for many possible meanings of sacred.
What a person “really wants” underneath their trauma responses. What they would want, if their trauma was fully healed.
The actions make someone feel most alive and authentically themselves.
The equilibrium of moral philosophy, under arbitrary reflection.
Most of the time when I see the word “values” used on LessWrong, it’s ambiguous between theses (and other) meanings.
A particular ambiguity: sometimes “values” seem to be referring to the first-person experiences that a person likes for their own sake (“spending time near beautiful women is a terminal value for me”), and other times it seems to be referring to a world that a person thinks is awesome, when viewing that world from a god’s eye view. Those are not the same thing, and they do not have remotely the same psychological functions! Among other differences, one is a near-mode evaluation, and the other is a far-mode evaluation.
Worse than that, I think there’s often a conflation of these meanings.
For instance, I often detect a hidden assumption that that the root of someone’s tree of instrumental goals is the same thing as their ranking over possible worlds. I think that conflation is very rarely, if ever, correct: the deep motivations of a person’s actions are not the same thing as the hypothetical world that is evaluated as best in thought experiments, even if the later thing is properly the person’s “utility function”. At least in the vast majority of cases, one’s hypothetical ideal world has almost no motivational power (as a matter of descriptive psychology, not of normative philosophy).
Also (though this is the weakest reason to change our terminology, I think), there’s additional ambiguity to people who are not already involved in the memeplex.
To broader world “values” usually connotes something high-minded or noble: if you do a corporate-training-style exercise to “reflect on your values”, you get things like “integrity” and “compassion”, not things like “sex” or “spite”. In contrast, LessWrongers would usually count sex and spite, not to mention boredom and pain, as part of “human values” and many would also own them as part of their personal values.
I at least partly buy this, but I want to play devil’s advocate.
Let’s suppose there’s a single underlying thing which ~everyone is gesturing at when talking about (humans’) “values”. How could a common underlying notion of “values” be compatible with our observation that people talk about all the very distinct things you listed, when you start asking questions about their “values”?
An analogy: in political science, people talk about “power”. Right up top, wikipedia defines “power” in the political science sense as:
A minute’s thought will probably convince you that this supposed definition does not match the way anybody actually uses the term; for starters, actual usage is narrower. That definition probably doesn’t even match the way the term is used by the person who came up with that definition.
That’s the thing I want to emphasize here: if you ask people to define a term, the definitions they give ~never match their own actual usage of the term, with the important exception of mathematics.
… but that doesn’t imply that there’s no single underlying thing which political scientists are gesturing at when they talk about “power”. It just implies that the political scientists themselves haven’t figured out the True Name of the thing their intuitions are pointed at.
Now back to “values”. It seems pretty plausible to me that people are in fact generally gesturing at the same underlying thing, when they talk about “values”. But people usually have very poor understanding of their own values (a quick check confirms that this applies to arguably-all of the notions of “values” on your list), so it’s not surprising if people end up defining their values in many different incompatible ways which don’t match the underlying common usage very well.
(Example: consider the prototypical deep Christian. They’d probably tell us that their “values” are to follow whatever directives are in the Bible, or some such. But then when actual value-loaded questions come up, they typically find some post-hoc story about how the Bible justifies their preferred value-claim… implying that the source of their value-claims, i.e. “values”, is something other than the Bible. This is totally compatible with deep Christians intuitively meaning the same thing I do when they talk about “values”, it’s just that they don’t reflectively know their actual usage of the term.)
… and if that is the case, then tabooing “values” is exactly the wrong move. The word itself is pointed at the right thing, and it’s all the attempted-definitions which are wrong. Tabooing “values” and replacing it with the definitions people think they’re using would be a step toward less correctness.
I’m kinda confused by this example. Let’s say the person exhibits three behaviors:
(1): They make broad abstract “value claims” like “I follow Biblical values”.
(2): They make narrow specific “value claims” like “It’s wrong to allow immigrants to undermine our communities”.
(3): They do object-level things that can be taken to indicate “values”, like cheating on their spouse
From my perspective, I feel like you’re taking a stand and saying that the real definition of “values” is (2), and is not (1). (Not sure what you think of (3).) But isn’t that adjacent to just declaring that some things on Eli’s list are the real “values” and others are not?
In particular, at some point you have to draw a distinction between values and desires, right? I feel like you’re using the word “value claims” to take that distinction for granted, or something.
(For the record, I have sometimes complained about alignment researchers using the word “values” when they’re actually talking about “desires”.)
I agree that it’s possible to use the suite of disparate intuitions surrounding some word as a kind of anthropological evidence that informs an effort to formalize or understand something-or-other. And that, if you’re doing that, you can’t taboo that word. But that’s not what people are doing with words 99+% of the time. They’re using words to (try to) communicate substantive claims. And in that case you should totally beware of words like “values” that have unusually large clouds of conflicting associations, and liberally taboo or define them.
Relatedly, if a writer uses the word “values” without further specifying what they mean, they’re not just invoking lots of object-level situations that seem to somehow relate to “values”; they’re also invoking any or all of those conflicting definitions of the word “values”, i.e. the things on Eli’s list, the definitions that you’re saying are wrong or misleading.
In the power example, the physics definition (energy over time) and the Alex Turner definition have something to do with each other, but I wouldn’t call them “the same underlying thing”—they can totally come apart, especially out of distribution.
It’s worse than just a blegg/rube thing: I think words can develop into multiple clusters connected by analogies. Like, “leg” is a body part, but also “this story has legs” and “the first leg of the journey” and “the legs of the right triangle”. It seems likely to me that “values” has some amount of that.
I agree. Some interpretations of “values” you didn’t explicitly list, but I think are important:
What someone wants to be true (analogous to what someone believes to be true)
What someone would want to be true if they knew what it would be like if it were true
What someone believes would be good if it were true
These are distinct, because either could clearly differ from the others. So the term “value” is actually ambiguous, not just vague. Talking about “values” is usually unnecessarily unclear, similar to talking about “utilities” in utility theory.
A few of the “distinct meanings” you list are very different from the others, but many of those are pretty similar. “Values” is a pretty broad term, including everything on the “ought” side of the is–ought divide, less “high-minded or noble” preferences, and one’s “ranking over possible worlds”, and that’s fine: it seems like a useful (and coherent!) concept to have a word for. You can be more specific with adjectives if context doesn’t adequately clarify what you mean.
Seeing through heaven’s eyes or not, I see no meaningful difference between the statements “I would like to sleep with that pretty girl” and “worlds in which I sleep with that pretty girl are better than the ones in which I don’t, ceteris paribus.” I agree this is the key difference: yes, I conflate these two meanings[1], and like the term “values” because it allows me to avoid awkward constructions like the latter when describing one’s motivations.
I actually don’t see two different meanings, but for the sake of argument, let’s grant that they exist.
Well, can. Problem is that people on LessWrong actually do use the term (in my opinion) pretty excessively, in contrast to, say, philosophers or psychologists. This is no problem in concrete cases like in your example, but on LessWrong the discussion about “values” is usually abstract. The fact that people could be more specific didn’t so far imply that they are.
My honest opinion that this makes discussion worse and you can do better by distinguishing values as objects that have value and mechanism by which value gets assigned.
New post: Some things I think about Double Crux and related topics
I’ve spent a lot of my discretionary time working on the broad problem of developing tools for bridging deep disagreements and transferring tacit knowledge. I’m also probably the person who has spent the most time explicitly thinking about and working with CFAR’s Double Crux framework. It seems good for at least some of my high level thoughts to be written up some place, even if I’m not going to go into detail about, defend, or substantiate, most of them.
The following are my own beliefs and do not necessarily represent CFAR, or anyone else.
I, of course, reserve the right to change my mind.
[Throughout I use “Double Crux” to refer to the Double Crux technique, the Double Crux class, or a Double Crux conversation, and I use “double crux” to refer to a proposition that is a shared crux for two people in a conversation.]
Here are some things I currently believe:
(General)
Double Crux is one (highly important) tool/ framework among many. I want to distinguish between the the overall art of untangling and resolving deep disagreements and the Double Crux tool in particular. The Double Crux framework is maybe the most important tool (that I know of) for resolving disagreements, but it is only one tool/framework in an ensemble.
Some other tools/ frameworks, that are not strictly part of Double Crux (but which are sometimes crucial to bridging disagreements) include NVC, methods for managing people’s intentions and goals, various forms of co-articulation (helping to draw out an inchoate model from one’s conversational partner), etc.
In some contexts other tools are substitutes for Double Crux (ie another framework is more useful) and in some cases other tools are helpful or necessary compliments (ie they solve problems or smooth the process within the Double Crux frame).
In particular, my personal conversational facilitation repertoire is about 60% Double Crux-related techniques, and 40% other frameworks that are not strictly within the frame of Double Crux.
Just to say it clearly: I don’t think Double Crux is the only way to resolve disagreements, or the best way in all contexts. (Though I think it may be the best way, that I know of, in a plurality of common contexts?)
The ideal use case for Double Crux is when...
There are two people...
...who have a real, action-relevant, decision...
...that they need to make together (they can’t just do their own different things)...
...in which both people have strong, visceral intuitions.
Double Cruxes are almost always conversations between two people’s system 1′s.
You can Double Crux between two people’s unendorsed intuitions. (For instance, Alice and Bob are discussing a question about open borders. They both agree that neither of them are economists, and that neither of them trust their intuitions here, and that if they had to actually make this decision, it would be crucial to spend a lot of time doing research and examining the evidence and consulting experts. But nevertheless Alices current intuition leans in favor of open borders , and Bob’s current intuition leans against. This is a great starting point for a Double Crux.)
Double cruxes (as in a crux that is shared by both parties in a disagreement) are common, and useful. Most disagreements have implicit Double Cruxes, though identifying them can sometimes be tricky.
Conjunctive cruxes (I would change my mind about X, if I changed my mind about Y and about Z, but not if I only changed my mind about Y or about Z) are common.
Folks sometimes object that Double Crux won’t work, because their belief depends on a large number of considerations, each one of which has only a small impact on their overall belief, and so no one consideration is a crux. In practice, I find that there are double cruxes to be found even in cases where people expect their beliefs have this structure.
Theoretically, it makes sense that we would find double cruxes in these scenarios: if a person has a strong disagreement (including a disagreement of intuition) with someone else, we should expect that there are a small number of considerations doing most of the work of causing one person to think one thing and the other to think something else. It is improbable that each person’s beliefs depend on 50 factors, and for Alice, most of those 50 factors point in one direction, and for Bob, most of those 50 factors point in the other direction, unless the details of those factors are not independent. If considerations are correlated, you can abstract out the fact or belief that generates the differing predictions in all of those separate considerations. That “generating belief” is the crux.
That said, there is a different conversational approach that I sometimes use, which involves delineating all of the key considerations (then doing Goal-factoring style relevance and completeness checks), and then dealing with each consideration one at time (often via a fractal tree structure: listing the key considerations of each of the higher level considerations).
This approach absolutely requires paper, and skillful (firm, gentle) facilitation, because people will almost universally try and hop around between considerations, and they need to be viscerally assured that their other concerns are recorded and will be dealt with in due course, in order to engage deeply with any given consideration one at a time.
About 60% of the power of Double Crux comes from operationalizing or being specific.
I quite like Liron’s recent sequence on being specific. It re-reminded me of some basic things that have been helpful in several recent conversations. In particular, I like the move of having a conversational partner paint a specific, best case scenario, as a starting point for discussion.
(However, I’m concerned about Less Wrong readers trying this with a spirit of trying to “catch out” one’s conversational partner in inconsistency, instead of trying to understand what their partner wants to say, and thereby shooting themselves in the foot. I think the attitude of looking to “catch out” is usually counterproductive to both understanding and to persuasion. People rarely change their mind when they feel like you have trapped them in some inconsistency, but they often do change their mind if they feel like you’ve actually heard and understood their belief / what they are trying to say / what they are trying to defend, and then provide relevant evidence and argument. In general (but not universally) it is more productive to adopt a collaborative attitude of sincerely trying to help a person articulate, clarify, and substantiate the point your partner is trying to make, even if you suspect that their point is ultimately wrong and confused.)
As an aside, specificity and operationalization is also the engine that makes NVC work. Being specific is really super powerful.
Many (~50%) disagreements evaporate upon operationalization, but this happens less frequently than people think: and if you seem to agree about all of the facts, and agree about all specific operationalizations, but nevertheless seem to have differing attitudes about a question, that should be a flag. [I have a post that I’ll publish soon about this problem.]
You should be using paper when Double Cruxing. Keep track of the chain of Double Cruxes, and keep them in view.
People talk past each other all the time, and often don’t notice it. Frequently paraphrasing your current understanding of what your conversational partner is saying, helps with this. [There is a lot more to say about this problem, and details about how to solve it effectively].
I don’t endorse the Double Crux “algorithm” described in the canonical post. That is, I don’t think that the best way to steer a Double Crux conversation is to hew to those 5 steps in that order. Actually finding double cruxes is, in practice, much more complicated, and there are a large number of heuristics and TAPs that make the process work. I regard that algorithm as an early (and self conscious) attempt to delineate moves that would help move a conversation towards double cruxes.
This is my current best attempt at distilling the core moves that make Double Crux work, though this leaves out a lot.
In practice, I think that double cruxes most frequently emerge not from people independently generating their own list cruxes (though this is useful). Rather double cruxes usually emerge from the move of “checking if the point that your partner made is a crux for you.”
I strongly endorse facilitation of basically all tricky conversations, Double Crux oriented or not. It is much easier to have a third party track the meta and help steer, instead of the participants, who’s working memory is (and should be) full of the object level.
So called, “Triple Crux” is not a feasible operation. If you have more than two stakeholders, have two of them Double Crux, and then have one of those two Double Crux with the third person. Things get exponentially trickier as you add more people. I don’t think that Double Crux is a feasible method for coordinating more than ~ 6 people. We’ll need other methods for that.
Double Crux is much easier when both parties are interested in truth-seeking and in changing their mind, and are assuming good faith about the other. But, these are not strict prerequisites, and unilateral Double Crux is totally a thing.
People being defensive, emotional, or ego-filled does not preclude a productive Double Crux. Some particular auxiliary skills are required for navigating those situations, however.
This is a good start for the relevant skills.
If a person wants to get better at Double Crux skills, I recommend they cross-train with IDC. Any move that works in IDC you should try in Double Crux. Any move that works in Double Crux you should try in IDC. This will seem silly sometimes, but I am pretty serious about it, even in the silly-seeming cases. I’ve learned a lot this way.
I don’t think Double Crux necessarily runs into a problem of “black box beliefs” wherein one can no longer make progress because one or both parties comes down to a fundamental disagreement about System 1 heuristics/ models that they learned from some training data, but into which they can’t introspect. Almost always, there are ways to draw out those models.
The simplest way to do this (which is not the only or best way, depending on the circumstances, involves generating many examples and testing the “black box” against them. Vary the hypothetical situation to triangulate to the exact circumstances in which the “black box” outputs which suggestions.
I am not making the universal claim that one never runs into black box beliefs that can’t be dealt with.
Disagreements rarely come down to “fundamental value disagreements”. If you think that you have gotten to a disagreement about fundamental values, I suspect there was another conversational tact that would have been more productive.
Also, you can totally Double Crux about values. In practice, you can often treat values like beliefs: often there is some evidence that a person could observe, at least in principle, that would convince them to hold or not hold some “fundamental” value.
I am not making the claim that there are no such thing as fundamental values, or that all values are Double Crux-able.
A semi-esoteric point: cruxes are (or can be) contiguous with operationalizations. For instance, if I’m having a disagreement about whether advertising produces value on net, I might operationalize to “beer commercials, in particular, produce value on net”, which (if I think that operationalization actually captures the original question) is isomorphic to “The value of beer commercials is a crux for the value of advertising. I would change my mind about advertising in general, if I changed my mind about beer commercials.” (In this is an evidential crux, as opposed to the more common causal crux. (More on this distinction in future posts.))
People’s beliefs are strongly informed by their incentives. This makes me somewhat less optimistic about tools in this space than I would otherwise be, but I still think there’s hope.
There are a number of gaps in the repertoire of conversational tools that I’m currently aware of. One of the most important holes is the lack of a method for dealing with psychological blindspots. These days, I often run out of ability to make a conversation go well when we bump into a blindspot in one person or the other (sometimes, there seem to be psychological blindspots on both sides). Tools wanted, in this domain.
(The Double Crux class)
Knowing how to identify Double Cruxes can be kind of tricky, and I don’t think that most participants learn the knack from the 55 to 70 minute Double Crux class at a CFAR workshop.
Currently, I think I can teach the basic knack (not including all the other heuristics and skills) to a person in about 3 hours, but I’m still playing around with how to do this most efficiently. (The “Basic Double Crux pattern” post is the distillation of my current approach.)
This is one development avenue that would particularly benefit from parallel search: If you feel like you “get” Double Crux, and can identify Double Cruxes fairly reliably and quickly, it might be helpful if you explicated your process.
That said, there are a lot of relevant compliments and sub-skills to Double Crux, and to bridging disagreements more generally.
The most important function of the Double Crux class at CFAR workshops is teaching and propagating the concept of a “crux”, and to a lesser extent, the concept of a “double crux”. These are very useful shorthands for one’s personal thinking and for discourse, which are great to have in the collective lexicon.
(Some other things)
Personally, I am mostly focused on developing deep methods (perhaps for training high-expertise specialists) that increase the range of problems of disagreements that the x-risk ecosystem can solve at all. I care more about this goal than about developing shallow tools that are useful “out of the box” for smart non-specialists, or in trying to change the conversational norms of various relevant communities (though both of those are secondary goals.)
I am highly skeptical of teaching many-to-most of the important skills for bridging deep disagreement, via anything other than ~one-on-one, in-person interaction.
In large part due to being prodded by a large number of people, I am polishing all my existing drafts of Double Crux stuff (and writing some new posts), and posting them here over the next few weeks. (There are already some drafts, still being edited, available on my blog.)
I have a standing offer to facilitate conversations and disagreements (Double Crux or not) for rationalists and EAs. Email me at eli [at] rationality [dot] org if that’s something you’re interested in.
“People” in general rarely change their mind when they feel like you have trapped them in some inconsistency, but people using the double-crux method in the first place are going to be aspiring rationalists, right? Trapping someone in an inconsistency (if it’s a real inconsistency and not a false perception of one) is collaborative: the thing they were thinking was flawed, and you helped them see the flaw! That’s a good thing! (As it is written of the fifth virtue, “Do not believe you do others a favor if you accept their arguments; the favor is to you.”)
Obviously, I agree that people should try to understand their interlocutors. (If you performatively try to find fault in something you don’t understand, then apparent “faults” you find are likely to be your own misunderstandings rather than actual faults.) But if someone spots an actual inconsistency in my ideas, I want them to tell me right away. Performing the behavior of trying to substantiate something that cannot, in fact, be substantiated (because it contains an inconsistency) is a waste of everyone’s time!
Can you say more about what you think the exceptions to the general-but-not-universal rule are? (Um, specifically.)
I would think that inconsistencies are easier to appriciate when they are in the central machinery. A rationalist might have more load bearing on their beliefs so most beliefs are central to atleast something but I think a centrality/point-of-communication check is more upside than downside to keep. Also cognitive time spent looking for inconsistencies could be better spent on more constructive activities. Then there is the whole class of heuristics which don’t even claim to be consistent. So the ability to pass by an inconsistency without hanging onto it will see use.
How about doing this a few times on video? Watching the video might not be as effective as the one-on-one teaching but I would expect that watching a few 1-on-1 explanations would be a good way to learn about the process.
From a learning perspective it also helps a lot for reflecting on the technique. The early NLP folks spent a lot of time analysing tapes of people performing techniques to better understand the techniques.
I in fact recorded a test session of attempting to teach this via Zoom last weekend. This was the first time I tried a test session via Zoom however and there were a lot of kinks to work out, so I probably won’t publish that version in particular.
But yeah, I’m interested in making video recordings of some of this stuff and putting up online.
Thanks for mentioning conjugative cruxes. That was always my biggest objection to this technique. At least when I went through CFAR, the training completely ignored this possibility. It was clear that it often worked anyway, but the impression that I got was that it was the general frame which was important more than the precise methodology which at that time still seemed in need of refinement.
FYI the numbering in the (General) section is pretty off.
What do you mean? All the numbers are in order. Are you objecting to the nested numbers?
To me, it looks like the numbers in the General section go 1, 4, 5, 5, 6, 7, 8, 9, 3, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 2, 3, 3, 4, 2, 3, 4 (ignoring the nested numbers).
(this appears to be a problem where it displays differently on different browser/OS pairs)
Old post: RAND needed the “say oops” skill
[Epistemic status: a middling argument]
A few months ago, I wrote about how RAND, and the “Defense Intellectuals” of the cold war represent another precious datapoint of “very smart people, trying to prevent the destruction of the world, in a civilization that they acknowledge to be inadequate to dealing sanely with x-risk.”
Since then I spent some time doing additional research into what cognitive errors and mistakes those consultants, military officials, and politicians made that endangered the world. The idea being that if we could diagnose which specific irrationalities they were subject to, that this would suggest errors that might also be relevant to contemporary x-risk mitigators, and might point out some specific areas where development of rationality training is needed.
However, this proved somewhat less fruitful than I was hoping, and I’ve put it aside for the time being. I might come back to it in the coming months.
It does seem worth sharing at least one relevant anecdote, from Daniel Ellsberg’s excellent book, the Doomsday Machine, and analysis, given that I’ve already written it up.
The missile gap
In the late nineteen-fifties it was widely understood that there was a “missile gap”: that the soviets had many more ICBM (“intercontinental ballistic missiles” armed with nuclear warheads) than the US.
Estimates varied widely on how many missiles the soviets had. The Army and the Navy gave estimates of about 40 missiles, which was about at parity with the the US’s strategic nuclear force. The Air Force and the Strategic Air Command, in contrast, gave estimates of as many as 1000 soviet missiles, 20 times more than the US’s count.
(The Air Force and SAC were incentivized to inflate their estimates of the Russian nuclear arsenal, because a large missile gap strongly necessitated the creation of more nuclear weapons, which would be under SAC control and entail increases in the Air Force budget. Similarly, the Army and Navy were incentivized to lowball their estimates, because a comparatively weaker soviet nuclear force made conventional military forces more relevant and implied allocating budget-resources to the Army and Navy.)
So there was some dispute about the size of the missile gap, including an unlikely possibility of nuclear parity with the Soviet Union. Nevertheless, the Soviet’s nuclear superiority was the basis for all planning and diplomacy at the time.
Kennedy campaigned on the basis of correcting the missile gap. Perhaps more critically, all of RAND’s planning and analysis was concerned with the possibility of the Russians launching a nearly-or-actually debilitating first or second strike.
The revelation
In 1961 it came to light, on the basis of new satellite photos, that all of these estimates were dead wrong. It turned out the the Soviets had only 4 nuclear ICBMs, one tenth as many as the US controlled.
The importance of this development should be emphasized. It meant that several of the fundamental assumptions of US nuclear planners were in error.
First of all, it meant that the Soviets were not bent on world domination (as had been assumed). Ellsberg says…
This revelation about soviet goals was not only of obvious strategic importance, it also took the wind out of the ideological motivation for this sort of nuclear planning. As Ellsberg relays early in his book, many, if not most, RAND employees were explicitly attempting to defend US and the world from what was presumed to be an aggressive communist state, bent on conquest. This just wasn’t true.
But it had even more practical consequences: this revelation meant that the Russians had no first strike (or for that matter, second strike) capability. They could launch their ICBMs at American cities or military bases, but such an attack had no chance of debilitating US second strike capacity. It would unquestionably trigger a nuclear counterattack from the US who, with their 40 missiles, would be able to utterly annihilate the Soviet Union. The only effect of a Russian nuclear attack would be to doom their own country.
[Eli’s research note: What about all the Russian planes and bombs? ICBMs aren’t the the only way of attacking the US, right?]
This means that the primary consideration in US nuclear war planning at RAND and elsewhere, was fallacious. The Soviet’s could not meaningfully destroy the US.
This realization invalidated virtually all of RAND’s work to date. Virtually every, analysis, study, and strategy, had been useless, at best.
The reaction to the revelation
How did RAND employees respond to this reveal, that their work had been completely off base?
According to Ellsberg, many at RAND were unable to adapt to the new reality and continued (fruitlessly) to continue with what they were doing, as if by inertia, when the thing that they needed to do (to use Eliezer’s turn of phrase) is “halt, melt, and catch fire.”
This suggests that one failure of this ecosystem, that was working in the domain of existential risk, was a failure to “say oops“: to notice a mistaken belief, concretely acknowledge that is was mistaken, and to reconstruct one’s plans and world views.
Relevance to people working on AI safety
This seems to be at least some evidence (though, only weak evidence, I think), that we should be cautious of this particular cognitive failure ourselves.
It may be worth rehearsing the motion in advance: how will you respond, when you discover that a foundational crux of your planning is actually mirage, and the world is actually different than it seems?
What if you discovered that your overall approach to making the world better was badly mistaken?
What if you received a strong argument against the orthogonality thesis?
What about a strong argument for negative utilitarianism?
I think that many of the people around me have effectively absorbed the impact of a major update at least once in their life, on a variety of issues (religion, x-risk, average vs. total utilitarianism, etc), so I’m not that worried about us. But it seems worth pointing out the importance of this error mode.
A note: Ellsberg relays later in the book that, durring the Cuban missile crisis, he perceived Kennedy as offering baffling terms to the soviets: terms that didn’t make sense in light of the actual strategic situation, but might have been sensible under the premiss of a soviet missile gap. Ellsberg wondered, at the time, if Kennedy had also failed to propagate the update regarding the actual strategic situation.
I mention this because additional research suggests that this is implausible: that Kennedy and his staff were aware of the true strategic situation, and that their planning was based on that premise.
This was quite valuable to me, and I think I would be excited about seeing it as a top-level post.
Can you say more about what you got from it?
I can’t speak for habryka, but I think your post did a great job of laying out the need for “say oops” in detail. I read the Doomsday Machine and felt this point very strongly while reading it, but this was a great reminder to me of its importance. I think “say oops” is one of the most important skills for actually working on the right thing, and that in my opinion, very few people have this skill even within the rationality community.
There feel to me like two relevant questions here, which seem conflated in this analysis:
1) At what point did the USSR gain the ability to launch a comprehensively-destructive, undetectable-in-advance nuclear strike on the US? That is, at what point would a first strike have been achievable and effective?
2) At what point did the USSR gain the ability to launch such a first strike using ICBMs in particular?
By 1960 the USSR had 1,605 nuclear warheads; there may have been few ICBMs among them, but there are other ways to deliver warheads than shooting them across continents. Planes fail the “undetectable” criteria, but ocean-adjacent cities can be blown up by small boats, and by 1960 the USSR had submarines equipped with six “short”-range (650 km and 1,300 km) ballistic missiles. By 1967 they were producing subs like this, each of which was armed with 16 missiles with ranges of 2,800-4,600 km.
All of which is to say that from what I understand, RAND’s fears were only a few years premature.
New post: What is mental energy?
[Note: I’ve started a research side project on this question, and it is already obvious to me that this ontology importantly wrong.]
There’s a common phenomenology of “mental energy”. For instance, if I spend a couple of hours thinking hard (maybe doing math), I find it harder to do more mental work afterwards. My thinking may be slower and less productive. And I feel tired, or drained, (mentally, instead of physically).
Mental energy is one of the primary resources that one has to allocate, in doing productive work. In almost all cases, humans have less mental energy than they have time, and therefore effective productivity is a matter of energy management, more than time management. If we want to maximize personal effectiveness, mental energy seems like an extremely important domain to understand. So what is it?
The naive story is that mental energy is an actual energy resource that one expends and then needs to recoup. That is, when one is doing cognitive work, they are burning calories, depleting their bodies energy stores. As they use energy, they have less fuel to burn.
My current understanding is that this story is not physiologically realistic. Thinking hard does consume more of the body’s energy than baseline, but not that much more. And we experience mental fatigue long before we even get close to depleting our calorie stores. It isn’t literal energy that is being consumed. [The Psychology of Fatigue pg.27]
So if not that, what is going on here?
A few hypotheses:
(The first few, are all of a cluster, so I labeled them 1a, 1b, 1c, etc.)
Hypothesis 1a: Mental fatigue is a natural control system that redirects our attention to our other goals.
The explanation that I’ve heard most frequently in recent years (since it became obvious that much of the literature on ego-depletion was off the mark), is the following:
A human mind is composed of a bunch of subsystems that are all pushing for different goals. For a period of time, one of these goal threads might be dominant. For instance, if I spend a few hours doing math, this means that my other goals are temporarily suppressed or on hold: I’m not spending that time seeking a mate, or practicing the piano, or hanging out with friends.
In order to prevent those goals from being neglected entirely, your mind has a natural control system that prevents you from focusing your attention on any one thing at a time: the longer you put your attention on something, the greater the build up of mental fatigue, causing you to do anything else.
Comments and model-predictions: This hypothesis, as stated, seems implausible to me. For one thing, it seems to suggest that that all actives would be equally mentally taxing, which is empirically false: spending several hours doing math is mentally fatiguing, but spending the same amount of time watching TV is not.
This might still be salvaged if we offer some currency other than energy that is being preserved: something like “forceful computations”. But again, it doesn’t seem obvious why the computations of doing math would be more costly than those for watching TV.
Similarly, this model suggests that “a change is as good as a break”: if you switch to a new task, you should be back to full mental energy, until you become fatigued for that task as well.
Hypothesis 1b: Mental fatigue is the phenomenological representation of the loss of support for the winning coalition.
A variation on this hypothesis would be to model the mind as a collection of subsystems. At any given time, there is only one action sequence active, but that action sequence is determined by continuous “voting” by various subsystems.
Overtime, these subsystems get fed up with their goals not being met, and “withdraw support” for the current activity. This manifests as increasing mental fatigue. (Perhaps your thoughts get progressively less effective, because they are interrupted, on the scale of micro-seconds, by bids to think something else).
Comments and model-predictions: This seems like it might suggest that if all of the subsystems have high trust that their goals will be met, that math (or any other cognitively demanding task) would cease to be mentally taxing. Is that the case? (Does doing math mentally exhaust Critch?)
This does have the nice virtue of explaining burnout: when some subset of needs are not satisfied for a long period, the relevant subsystems pull their support for all actions, until those needs are met.
[Is burnout a good paradigm case for studying mental energy in general?]
Hypothesis 1c: The same as 1a or 1b, but some mental operations are painful for some reason.
To answer my question above, one reason why math might be more mentally taxing than watching TV, is that doing math is painful.
If the process of doing math is painful on the micro-level, then even if all of the other needs are met, there is still a fundamental conflict between the subsystem that is aiming to acquire math knowledge, and the subsystem that is trying to avoid micro-pain on the micro-level.
As you keep doing math, the micro pain part votes more and more strongly against doing math, or the overall system biases away from the current activity, and you run out of mental energy.
Comments and model-predictions: This seems plausible for the activity of doing math, which involves many moments of frustration, which might be meaningfully micro-painful. But it seems less consistent with activities like writing, which phenomenologically feel non-painful. This leads to hypothesis 1d…
Hypothesis 1d: The same as 1c, but the key micro-pain is that of processing ambiguity second to second
Maybe the pain comes from many moments of processing ambiguity, which is definitely a thing that is happening in the context of writing. (I’ll sometimes notice myself try to flinch to something easier when I’m not sure which sentence to write.) It seems plausible that mentally taxing activities are taxing to the extent that they involve processing ambiguity, and doing a search for the best template to apply.
Hypothesis 1e: Mental fatigue is the penalty incurred for top down direction of attention.
Maybe consciously deciding to do things is importantly different from the “natural” allocation of cognitive resources. That is, your mind is set up such that the conscious, System 2, long term planning, metacognitive system, doesn’t have free rein. It has a limited budget of “mental energy”, which measures how long it is allowed to call the shots before the visceral, system 1, immediate gratification systems take over again.
Maybe this is an evolutionary adaption? For the monkeys that had “really good” plans for how to achieve their goals, never panned out for them. The monkeys that were impulsive some of the time, actually did better at the reproduction game?
(If this is the case, can the rest of the mind learn to trust S2 more, and thereby offer it a bigger mental energy budget?)
This hypothesis does seem consistent with my observation that rest days are rejuvenating, even when I spend my rest day working on cognitively demanding side projects.
Hypothesis 2: Mental fatigue is the result of the brain temporarily reaching knowledge saturation.
When learning a motor task, there are several phases in which skill improvement occurs. The first, unsurprisingly, is durring practice sessions. However, one also sees automatic improvements in skill in the hours after practice [actually this part is disputed] and following a sleep period (academic link1, 2, 3). That is, there is a period of consolidation following a practice session. This period of consolidation probably involves the literal strengthening of neural connections, and encoding other brain patterns that take more than a few seconds to set.
I speculate, that your brain may reach a saturation point: more practice, more information input, becomes increasingly less effective, because you need to dedicate cognitive resources to consolidation. [Note that this is supposing that there is some tradeoff between consolidation activity and input activity, as opposed to a setup where both can occur simultaneously (does anyone have evidence for such a tradeoff?)].
If so, maybe cognitive fatigue is the phenomenology of needing to extract one’s self from a practice / execution regime, so that your brain can do post-processing and consolidation on what you’ve already done and learned.
Comments and model-predictions: This seems to suggest that all cognitively taxing tasks are learning tasks, or at least tasks in which one is encoding new neural patterns. This seems plausible, at least.
It also seems to naively imply that an activity will become less mentally taxing as you gain expertise with it, and progress along the learning curve. There is (presumably) much more information to process and consolidate in your first hour of doing math than in your 500th.
Hypothesis 3: Mental fatigue is a control system that prevents some kind of damage to the mind or body.
One reason why physical fatigue is useful is that it prevents damage to your body. Getting tired after running for a bit, stops you for running all out for 30 hours at a time, and eroding your fascia.
By simple analogy to physical fatigue, we might guess that mental fatigue is a response to vigorous mental activity that is adaptive in that it prevents us from hurting ourselves.
I have no idea what kind of damage might be caused by thinking too hard.
I note that mania and hypomania involve apparently limitless mental energy reserves, and I think that theses states are bad for your brain.
Hypothesis 4: Mental fatigue is a buffer overflow of peripheral awareness.
Another speculative hypothesis: Human minds have a working memory: a limit of ~4 concepts, or chunks, that can be “activated”, or operated upon in focal attention, at one time. But meditators, at least, also talk a peripheral awareness: a sort of halo of concepts and sense impressions that are “loaded up”, or “near by”, or cognitively available, or “on the fringes of awareness”. These are all the ideas that are “at hand” to your thinking. [Note: is peripheral awareness, as the meditators talk about, the same thing as “short term memory”?]
Perhaps if there is a functional limit to the amount of content that can be held in working memory, there is a similar, if larger, limit to how much content can be held in peripheral awareness. As you engage with a task, more and more mental content is loaded up, or added to peripheral awareness, where it both influences your focal thought process, and/or is available to be operated on directly in working memory. As you continue the task, and more and more content gets added to peripheral awareness, you begin to overflow its capacity. It gets harder and harder to think, because peripheral awareness is overflowing. Your mind needs space to re-ontologize: to chunk pieces together, so that it can all fit in the same mental space. Perhaps this is what mental fatigue is.
Comments and model-predictions: This does give a nice clear account of why sleep replenishes mental energy (it both causes re-ontologizing, and clears the cache), though perhaps this does not provide evidence over most of the other hypotheses listed here.
Other notes about mental energy:
In this post, I’m mostly talking about mental energy on the scale of hours. But there is also a similar phenomenon on the scale of days (the rejuvenation one feels after rest days) and on the scale of months (burnout and such). Are these the same basic phenomenon on different timescales?
On the scale of days, I find that my subjective rest-o-meter is charged up if I take a rest day, even if I spend that rest day working on fairly cognitively intensive side projects.
This might be because there’s a kind of new project energy, or new project optimism?
Mania and hypomania entail limitless mental energy.
People seem to be able to play video games for hours and hours without depleting mental energy. Does this include problem solving games, or puzzle games?
Also, just because they can play indefinitely does not mean that their performance doesn’t drop. Does performance drop, across hours of playing, say, snakebird?
For that matter, does performance decline on a task correlate with the phenomenological “running out of energy”? Maybe those are separate systems.
On Hypothesis 3, the brain may build up waste as a byproduct of its metabolism when it’s working harder than normal, just as muscles do. Cleaning up this buildup seems to be one of the functions of sleep. Even brainless animals like jellyfish sleep. They do have neurons though.
I also think it’s reasonable to think that multiple things may be doing on that result in a theory of mental energy. For example, hypotheses 1 and 2 could both be true and result in different causes of similar behavior. I bring this up because I think of those as two different things in my experience: being “full up” and needing to allow time for memory consolidation where I can still force my attention it just doesn’t take in new information vs. being unable to force the direction of attention generally.
Yeah. I think you’re on to something here. My current read is that “mental energy” is at least 3 things.
Can you elaborate on the what “knowledge saturation” feels like for you?
Sure. It feels like my head is “full”, although the felt sense is more like my head has gone from being porous and sponge-like to hard and concrete-like. When I try to read or listen to something I can feel it “bounce off” in that I can’t hold the thought in memory beyond forcing it to stay in short term memory.
Isn’t it possible that there’s some other biological sink that is time delayed from caloric energy? Like say, a very specific part of your brain needs a very specific protein, and only holds enough of that protein for 4 hours? And it can take hours to build that protein back up. This seems to me to be at least somewhat likeely.
Someone smart once made a case like to this to me in support of a specific substance (can’t remember which) as a nootropic, though I’m a bit skeptical.
I think about this a lot. I’m currently dangling with the fourth Hypothesis, which seems more correct to me and one where I can actually do something to ameliorate the trade-off implied by it.
In this comment, I talk what it means to me and how I can do something about it, which ,in summary, is to use Anki a lot and change subjects when working memory gets overloaded. It’s important to note that mathematics is sort-of different from another subjects, since concepts build on each other and you need to keep up with what all of them mean and entail, so we may be bound to reach an overload faster in that sense.
A few notes about your other hypothesis:
Hypothesis 1c:
It’s because we’re not used to it. Some things come easier than other; some things are more closely similar to what we have been doing for 60000 years (math is not one of them). So we flinch from that which we are not use to. Although, adaptation is easy and the major hurdle is only at the beginning.
It may also mean that the reward system is different. Is difficult to see on a piece of mathematics, as we explore it, how fulfilling it’s when we know that we may not be getting anywhere. So the inherent reward is missing or has to be more artificially created.
Hypothesis 1d:
This seems correct to me. Consider the following: “This statement is false”.
Thinking about it for a few minutes (or iterations of that statement) is quickly bound to make us flinch away in just a few seconds. How many other things take this form? I bet there are many.
Instead of working to trust System 2 is it there a way to train System 1? It seems more apt to me, like training tactics in chess or to make rapid calculations.
Thank you for the good post, I’d really like to further know more about your findings.
Seems to me that mental energy is lost by frustration. If what you are doing is fun, you can do it for a log time; if it frustrates you at every moment, you will get “tired” soon.
The exact mechanism… I guess is that some part of the brain takes frustration as an evidence that this is not the right thing to do, and suggests doing something else. (Would correspond to “1b” in your model?)
I’ve definitely experienced mental exhaustion from video games before—particularly when trying to do an especially difficult task.
New post: Some notes on Von Neumann, as a human being
I recently read Prisoner’s Dilemma, which half an introduction to very elementary game theory, and half a biography of John Von Neumann, and watched this old PBS documentary about the man.
I’m glad I did. Von Neumann has legendary status in my circles, as the smartest person ever to live. [1] Many times I’ve written the words “Von Neumann Level Intelligence” in a AI strategy document, or speculated about how many coordinated Von Neumanns would it take to take over the world. (For reference, I now think that 10 is far too low, mostly because he didn’t seem to have the entrepreneurial or managerial dispositions.)
Learning a little bit more about him was humanizing. Yes, he was the smartest person ever to live, but he was also an actual human being, with actual human traits.
Watching this first clip, I noticed that I was surprised by a number of thing.
That VN had an accent. I had known that he was Hungarian, but somehow it had never quite propagated that he would speak with a Hungarian accent.
That he was middling height (somewhat shorter than the presenter he’s talking too).
The thing he is saying is the sort of thing that I would expect to hear from any scientist in the public eye, “science education is important.” There is something revealing about Von Neumann, despite being the smartest person in the world, saying basically what I would expect Neil DeGrasse Tyson to say in an interview. A lot of the time he was wearing his “scientist / public intellectual” hat, not the “smartest person ever to live” hat.
Some other notes of interest:
He was not a skilled poker player, which punctured my assumption that Von Neumann was omnicompetent. (pg. 5) Nevertheless, poker was among the first inspirations for game theory. (When I told this to Steph, she quipped “Oh. He wasn’t any good at it, so he developed a theory from first principles, describing optimal play?” For all I know, that might be spot on.)
Perhaps relatedly, he claimed he had low sales resistance, and so would have his wife come clothes shopping with him. (pg. 21)
He was sexually crude, and perhaps a bit misogynistic. Eugene Wigner stated that “Johny believed in having sex, in pleasure, but not in emotional attachment. HE was interested in immediate pleasure and little comprehension of emotions in relationships and mostly saw women in terms of their bodies.” The journalist Steve Heimes wrote “upon entering an office where a pretty secretary was working, von Neumann habitually would bend way over, more or less trying to look up her dress.” (pg. 28) Not surprisingly, his relationship with his wife, Klara, was tumultuous, to say the least.
He did however, maintain a strong, life long, relationship with his mother (who died the same year that he did).
Overall, he gives the impression of a genius, overgrown child.
Unlike many of his colleagues, he seemed not to share the pangs conscience that afflicted many of the bomb creators. Rather than going back to academia following the war, he continued doing work for the government, including the development of the Hydrogen bomb.
Von Neumann advocated preventative war: giving the Soviet union an ultimatum of joining a world government, backed by the threat of (and probable enaction of) nuclear attack, while the US still had a nuclear monopoly. He famously said of the matter, “If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o’clock, I say why not 1 o’clock.”
This attitude was certainly influenced by his work on game theory, but it should also be noted that Von Neumann hated communism.
Richard Feynman reports that Von Neumann, in their walks through the Los Alamos desert, convinced him to adopt and attitude of “social irresponsibility”, that one “didn’t have to be responsible for the world he was in.”
Prisoner’s dilemma says that he and his collaborators “pursued patents less aggressively than the could have”. Edward Teller commented, “probably the IBM company owes half its money to John Von Neumann.” (pg. 76)
So he was not very entrepreneurial, which is a bit of a shame, because if he had the disposition he probably could have made a lot of money. (He certainly had the energy to be an entrepreneur: he only slept for a few hours a night, and was working for basically all his working hours.
He famously always wore a grey oxford 3 piece suit, including when playing tennis with Stanislaw Ulam, or when riding a donkey down the grand canyon. I’m not clear why. Was that more comfortable? Did he think it made him look good? Did he just not want to have to ever think about clothing, and so preferred to be over-hot in the middle of the Los Alamos desert, rather than need to think about if today was “shirt sleeves whether”?
Von Neumann himself once commented on the strange fact of so many Hungarian geniuses growing up in such a small area, in his generation:
One thing that surprised me most was that it seems that, despite being possibly the smartest person in modernity, he would have benefited from attending a CFAR workshop.
For one thing, at the end of his life, he was terrified of dying. But throughout the course of his life he made many reckless choices with his health.
He ate gluttonously and became fatter and fatter over the course of his life. (One friend remarked that he “could count anything but calories.”)
Furthermore, he seemed to regularly risk his life when driving.
(Amusingly, Von Neumann’s reckless driving seems due, not to drinking and driving, but to singing and driving. “He would sway back and forth, turning the steering wheel in time with the music.”)
I think I would call this a bug.
On another thread, one of his friends (the documentary didn’t identify which) expressed that he was over-impressed by powerful people, and didn’t make effective tradeoffs.
Stanislaw Ulam speculated, “I think he had a hidden admiration for people and organizations that could be tough and ruthless.” (pg. 179)
From these statements, it seems like Von Neumann leapt at chances to seem useful or important to the government, somewhat unreflectively.
These anecdotes suggest that Von Neumann would have gotten value out of Goal Factoring, or Units of Exchange, or IDC (possibly there was something deeper going on, regarding a blindspots around death, or status, but I think the point still stands, and he would have benefited from IDC).
Despite being the discoverer/ inventor of VNM Utility theory, and founding the field of Game Theory (concerned with rational choice), it seems to me that Von Neumann did far less to import the insights of the math into his actual life than say, Critch.
(I wonder aloud if this is because Von Neumann was born and came of age before the development of cognitive science. I speculate that the importance of actually applying theories of rationality in practice, only becomes obvious after Tversky and Kahneman demonstrate that humans are not rational by default. (In evidence against this view: Eliezer seems to have been very concerned with thinking clearly, and being sane, before encountering Heuristics and Biases in his (I believe) his mid 20s. He was exposed to Evo Psych though, and that may have served a similar role.))
Also, he converted to Catholicism at the end of his life, buying on Pascal’s Wager. He commented “So long as there is the possibility of eternal damnation for nonbelievers it is more logical to be a believer at the end”, and “There probably has to be a God. Many things are easier to explain if there is than if there isn’t.”
(According to wikipedia, this deathbed conversion did not give him much comfort.)
This suggests that he would have gotten value out of reading the sequences, in addition to attending a CFAR workshop.
Thank you, this is very interesting!
Seems to me the most imporant lesson here is “even if you are John von Neumann, you can’t take over the world alone.”
First, because no matter how smart you are, you will have blind spots.
Second, because your time is still limited to 24 hours a day; even if you’d decide to focus on things you have been neglecting until now, you would have to start neglecting the things you have been focusing on until now. Being better at poker (converting your smartness to money more directly), living healthier and therefore on average longer, developing social skills, and being strategic in gaining power… would perhaps come at a cost of not having invented half of the stuff. When you are John von Neumann, your time has insane opportunity costs.
Is there any information on how Von Neumann came to believe Catholicism was the correct religion for Pascal Wager purposes? “My wife is Catholic” doesn’t seem like very strong evidence...
I don’t know why Catholicism.
I note that it does seem to be the religion of choice for former atheists, or at least for rationalists. I know of several rationalists that converted to catholicism, but none that have converted to any other religion.
TL;DR: I’m offering to help people productively have difficult conversations and resolve disagreements, for free. Feel free to email me if and when that seems helpful. elitrye [at] gmail.com
Facilitation
Over the past 4-ish years, I’ve had a side project of learning, developing, and iterating on methods for resolving tricky disagreements, and failures to communicate. A lot of this has been in the Double Crux frame, but I’ve also been exploring a number of other frameworks (including, NVC, Convergent Facilitation, Circling-inspired stuff, intuition extraction, and some home-grown methods).
As part of that, I’ve had a standing offer to facilitate / mediate tricky conversations for folks in the CFAR and MIRI spheres (testimonials below). Facilitating “real disagreements”, allows me to get feedback on my current conversational frameworks and techniques. When I encounter blockers that I don’t know how to deal with, I can go back to the drawing board to model those problems and interventions that would solve them, and iterate from there, developing new methods.
I generally like doing this kind of conversational facilitation and am open to doing a lot more of it with a wider selection of people.
I am extending an offer to help mediate tricky conversations, to anyone that might read this post, for the foreseeable future. [If I retract this offer, I’ll come back and leave a note here.]
What sort of thing is this good for?
I’m open to trying to help with a wide variety of difficult conversations, but the situations where I have been most helpful in the past have had the following features:
Two* people are either having some conflict or disagreement or are having difficulty understanding something about what the other person is saying.
There’s some reason to expect the conversation to not “work”, by default: either they’ve tried already, and made little progress etc. or, at least one person can predict that this conversation will be tricky or heated.
There is enough mutual respect and/or there is enough at stake that it seems worthwhile to try and have the conversation anyway. It seems worth the time to engage.
Here are some (anonymized) examples of conversations that I’ve facilitated in the past years.
Two researchers work in related fields, but in different frames / paradigms. Try as they might, neither person can manage to see how the other’s claims are even plausible.
Two friends are working on a project together, but they each feel inclined to take it in a different direction, and find it hard to get excited about the other’s proposal, even having talked about the question a lot.
John and Janet are EAs. John thinks that the project that Janet has spent the past year on, and is close to launching, is net negative, and that Janet should drop it entirely. Janet feels exasperated by this and generally feels that John is overly-controlling.
Two rationalists Laura and Alex, are each in some kind of community leadership role, and have a lot of respect for each other, but they have very different takes on a particular question of social mores: Laura thinks that there is a class of norm enforcement that is normal and important, Alex thinks that class of “norm enforcement” behavior is unacceptable and corrosive to the social fabric. They sit down to talk about it, but seem to keep going in circles without clarifying anything.
Basically, if you have a tricky disagreement that you want to try to hash out, and you feel comfortable inviting an outside party, feel free to reach out to me.
(If there’s some conversation or conflict that you have in mind, but don’t know if it falls in this category, feel free to email me and ask.)
*- I’m also potentially open to trying to help with conflicts that involve more than two people, such as a committee that is in gridlock, trying to make a decision, but I am much less practiced with that.
The process
If everyone involved is open to a third person (me) coming in to mediate, shoot me an email at elityre [at] gmail.com, and we can schedule a half hour call to discuss your issue. After discussing it a bit, I’ll tell you if I think I can help or not. If not, I might refer you to other people resources that might be more useful.
If it seems like I can help, I typically prefer to meet with both parties one-on-one, as much as a week before we meet together, so that I can “load up” each person’s perspective, and start doing prep work. From there we can schedule a conversation, presumably over Zoom, for all three (or more) of us to meet.
In the conversation itself, I would facilitate, tracking what’s happening and suggesting particular conversational moves or tacts, and possibly recommending and high-level framework.
[I would like to link to an facilitation-example video here, but almost all of the conversations that I’ve facilitated are confidential. Hopefully this post will lead to one or two that can be public.]
Individual cases can vary a lot, and I’m generally open to considering alternative formats.
Currently, I’m doing this free of charge.
My sense of my current level of skill
I think this is a domain in which deep mastery is possible. I don’t consider myself to be a master, but I am aspiring to mastery.
My (possibly biased impression), is that the median outcome of my coming to help with a conversation is “eh, that was moderately helpful, mostly because having a third person to help hold space, freed up our working memory to focus on the object level.”
Occasionally (one out of every 10 conversations?), I think I’ve helped dramatically, on the order of “this conversation was not working at all, until Eli came to help, and then we had multiple breakthroughs in understanding.”
(I’ve started explicitly tracking my participants’ estimation of my counterfactual impact, following conversations, so I hope to have much better numbers for assessing how useful this work is in a few months. Part of my hope in doing more of this is that I will get a more accurate assessment of how much value my facilitation in particular provides, and how much I should be investing in this general area.)
Testimonials
(I asked a number of people who I’ve done facilitation work in the past to give me a short honest testimonial, if they felt comfortable with that. I included the blurb from every person who sent me something, though this is still a biased sample, since I mostly reached out to people who I expected would give a “positive review”.)
Anna Salamon:
Scott Garrabrant:
Evan Hubinger:
Oliver Habryka:
Mathew Fallshaw
Other people who have some experience with my facilitation style, feel free to put your own thoughts in the comments.
Caveats and other info
As noted, this is an open research-ish project for me, and I obviously cannot guarantee that I will be helpful, much less that I will be able to resolve or get to the bottom of a given disagreement. In fact, as stated, I, personally, am most interested in the cases where I don’t know how to help, because those are the places where I’m most likely to learn the most, even if they are the places where I am least able to provide value.
You are always welcome to invite me to try and help, and then partway through, decide that my suggestions are less-than helpful, and say that you don’t want my help after all. (Anna Salamon does this moderately frequently.)
I do my best to keep track of a map of relevant skills in this area, and which people around have more skill than me in particular sub-domains. So it is possible that when you describe your situation, I’ll either suggest someone else who I think might be better to help you than me, or who I would like to bring in to co-facilitate with me (with your agreement, of course).
Note that this is one of a number of projects, involving difficult conversations or facilitation, that I am experimenting with lately. Another is here and another is to be announced.
If you’re interested in training sessions on Double Crux and other Conversational Facilitation skills, join my Double Crux training mailing list, here. I have vague plans to do a 3-weekend training program, covering my current take on the core Double Crux skill, but no guarantees that I will actually end up doing that any time soon.
Questions welcome!
I am curious how good you think the conversation/facilitation was in the AI takeoff double crux between Oliver Habryka and Buck Shlegeris. I am looking for something like “the quality of facilitation at that event was X percentile among all the conversation facilitation I have done”.
[I wrote a much longer and more detailed comment, and then decided that I wanted to think more about it. In lieu of posting nothing, here’s a short version.]
I mean I did very little facilitation one way or the other at that event, so I think my counterfactual impact was pretty minimal.
In terms of my value added, I think that one was in the bottom 5th percentile?
In terms of how useful that tiny amount of facilitation was, maybe 15 to 20th percentile? (This is a little weird, because quantity and quality are related. More active facilitation has a quality span: active (read: a lot of) facilitation can be much more helpful when it is good and much more disruptive / annoying / harmful, when it is bad, compared to less active backstop facilitation,
Overall, the conversation served the goals of the participants and had a median outcome for that kind of conversation, which is maybe 30th percentile, but there is a long right tail of positive outcomes (and maybe I am messing up how to think about percentile scores with skewed distributions).
The outcome that occured (“had an interesting conversation, and had some new thoughts / clarifications”) is good but also far below the sort of outcome that I’m ussually aiming for (but often missing), of substantive, permanent (epistemic!) change to the way that one or both of the people orient on this topic.
Looks like you dropped a sentence.
Fixed.
Could you recommended the best book about this topic?
Nope?
I’ve gotten very little out of books in this area.
It is a little afield, but strongly recommend the basic NVC book: Nonviolent Communication: A Language for Life. I recommend that at minimum, everyone read at least the first two chapters, which is something like 8 pages long, and has the most content in the book. (The rest of the book is good too, but it is mostly examples.)
Also, people I trust have gotten value out of How to Have Impossible Conversations. This is still on my reading stack though (for this month, I hope), so I don’t personally recommend it. My expectation, from not having read it yet, is that it will cover the basics pretty well.
That no one rebuilt old OkCupid updates me a lot about how much the startup world actually makes the world better
The prevailing ideology of San Francisco, Silicon Valley, and the broader tech world, is that startups are an engine (maybe even the engine) that drives progress towards a future that’s better than the past, by creating new products that add value to people’s lives.
I now think this is true in a limited way. Software is eating the world, and lots of bureaucracy is being replaced by automation which is generally cheaper, faster, and a better UX. But I now think that this narrative is largely propaganda.
It’s been 8 years since Match bought and ruined OkCupid and no one, in the whole tech ecosystem, stepped up to make a dating app even as good as old OkC is a huge black mark against the whole SV ideology of technology changing the world for the better.
Finding a partner is such a huge, real, pain point for millions of people. The existing solutions are so bad and extractive. A good solution has already been demonstrated. And yet not a single competent founder wanted to solve that problem for planet earth, instead of doing something else, that (arguably) would have been more profitable. At minimum, someone could have forgone venture funding and built this as a cashflow business.
It’s true that this is a market that depends on economies of scale, because the quality of your product is proportional to the size of your matching pool. But I don’t buy that this is insurmountable. Just like with any startup, you start by serving a niche market really well, and then expand outward from there. (The first niche I would try for is by building an amazing match-making experience for female grad students at a particular top university. If you create a great experience for the women, the men will come, and I’d rather build an initial product for relatively smart customers. But there are dozens of niches one could try for.)
But it seems like no one tried to recreate OkC, much less creating something better, until the manifold team built manifold.love (currently in maintenance mode)? Not that no one succeeded. To my knowledge, no else one even tried. Possibly Luna counts, but I’ve heard through the grapevine that they spent substantial effort running giant parties, compared to actually developing and launching their product—from which I infer that they were not very serious. I’ve been looking for good dating apps. I think if a serious founder was trying seriously, I would have heard about it.
Thousands of funders a year, and no one?!
That’s such a massive failure, for almost a decade, that it suggests to me that the SV ideology of building things that make people’s lives better is broadly propaganda. The best founders might be relentlessly resourceful, but a tiny fraction of them seem to be motivated by creating value for the world, or this low hanging fruit wouldn’t have been left hanging for so long.
This is of course in addition to the long list of big tech companies who exploit their network-effect monopoly power to extract value from their users (often creating negative societal externalities in the process), more than creating value for them. But it’s a weaker update that there are some tech companies that do ethically dubious stuff, compared to the stronger update that there was no startup that took on this obvious, underserved, human problem.
My guess is that the tech world is a silo of competence (because competence is financially rewarded), but operates from an ideology with major distortions / blindspots, that are disconnected from commonsense reasoning about what’s Good. eg following profit incentives, and excitement about doing big things (independent from whether those good things have humane or inhumane impacts) off a cliff.
Basically: I don’t blame founders or companies for following their incentive gradients, I blame individuals/society for being unwilling to assign reasonable prices to important goods.
I think the bad-ness of dating apps is downstream of poor norms around impact attribution for matches made. Even though relationships and marriages are extremely valuable, individual people are not in the habit of paying that to anyone.
Like, $100k or a year’s salary seems like a very cheap value to assign to your life partner. If dating apps could rely on that size of payment when they succeed, then I think there could be enough funding for something at least a good small business. But I’ve never heard of anyone actually paying anywhere near that. (myself included—though I paid a retroactive $1k payment to the person who organized the conference I met my wife at)
I think keeper.ai tries to solve this with large bounties on dating/marriages, it’s one of the things I wish we pushed for more on Manifold Love. It seems possible to build one for the niche of “the ea/rat community”; Manifold Love, the checkboxes thing, dating docs got pretty good adoption for not that much execution.
(Also: be the change! I think building out OKC is one of the easiest “hello world” software projects one could imagine, Claude could definitely make a passable version in a day. Then you’ll discover a bunch of hard stuff around getting users, but it sure could be a good exercise.)
I think the credit assignment is legit hard, rather than just being a case of bad norms. Do you disagree?
Mm I think it’s hard to get optimal credit allocation, but easy to get half-baked allocation, or just see that it’s directionally way too low? Like sure, maybe it’s unclear whether Hinge deserves 1% or 10% or ~100% of the credit but like, at a $100k valuation of a marriage, one should be excited to pay $1k to a dating app.
Like, I think matchmaking is very similarly shaped to the problem of recruiting employees, but there corporations are more locally rational about spending money than individuals, and can do things like pay $10k referral bonuses, or offer external recruiters 20% of their referee’s first year salary.
(Expensive) Matchmaking services already exist—what’s your reading on why they’re not more popular?
I’ve started writing a small research paper on this, using mathematical framework, and understood that I had long conflated Shapley values with ROSE values. Here’s what I found, having corrected that error.
ROSE bargaining satisfies Efficiency, Pareto Optimality, Symmetry*, Maximin Dominance and Linearity—a bunch of important desiderata. Shapley values, on other hand, don’t satisfy Maximin Dominance so someone might unilaterally reject cooperation; I’ll explore ROSE equilibrium below.
Subjects: people and services for finding partners.
By Proposition 8.2, ROSE value remains same if moves transferring money within game are discarded. Thus, we can assume no money transfers.
By Proposition 11.3, ROSE value for dating service is equal or greater than its maximin.
By Proposition 12.2, ROSE value for dating service is equal or less than its maximum attainable value.
There’s generally one move for a person to maximize their utility: use the dating service with highest probability of success (or expected relationship quality) available.
There are generally two moves for a service: to launch or not to launch. First involves some intrinsic motivation and feeling of goodness minus running costs, the second option has value of zero exactly.
For a large service, running costs (including moderation) exceed much realistic motivation. Therefore, maximum and maximin values for it are both zero.
From (7), (3) and (4), ROSE value for large dating service is zero.
Therefore, total money transfers to a large dating service equal its total costs.
So, why yes or why no?
By the way, Shapley values suggest paying a significant sum! Given value of a relationship of $10K (can be scaled), and four options for finding partners (0:p0=0.03 -- self-search, α:pα=0.09 -- friend’s help, β:pβ=0.10 -- dating sites, γ:pγ=0.70 -- the specialized project suggested up the comments), the Shapley-fair price per success would be respectively $550, $650 and $4400.
P.S. I’m explicitly not open to discussing what price I’d be cheerful to pay to service which would help to build relationships. In this thread, I’m more interested in whether there are new decision theory developments which would find maximin-satisfying equilibria closer to Shapley one.
I don’t think one can coherently value marriage 20 times as much as than a saved life ($5k as GiveWell says)? Indeed there is more emotional attachment to a person who’s your partner (i.e. who you are emotionally attached to) than to a random human in the world, but surely not that much?
And if a marriage is valued at $10k, then the credit assignment 1%/10% would make the allocation $100/$1000 - and it seems that people really want to round the former towards zero
I mean, it’s obviously very dependent on your personal finance situation but I’m using $100k as an order of magnitude proxy for “about a years salary”. I think it’s very coherent to give up a year of marginal salary in exchange for finding the love of your life, rather than like $10k or ~1mo salary.
Of course, the world is full of mispricings, and currently you can save a life for something like $5k. I think these are both good trades to make, and most people should have a portfolio that consists of both “life partners” and “impact from lives saved” and crucially not put all their investment into just one or the other.
I wonder what the lifetime spend on dating apps is. I expect that for most people who ever pay it’s >$100
It’s possible no one tried literally “recreate OkC”, but I think dating startups are very oversubscribed by founders, relative to interest from VCs [1] [2] [3] (and I think VCs are mostly correct that they won’t make money [4] [5]).
(Edit: I want to note that those are things I found after a bit of googling to see if my sense of the consensus was borne out; they are meant in the spirit of “several samples of weak evidence”)
I don’t particularly believe you that OkC solves dating for a significant fraction of people. IIRC, a previous time we talked about this, @romeostevensit suggested you had not sufficiently internalised the OkCupid blog findings about how much people prioritised physical attraction.
You mention manifold.love, but also mention it’s in maintenance mode – I think because the type of business you want people to build does not in fact work.
I think it’s fine to lament our lack of good mechanisms for public good provision, and claim our society is failing at that. But I think you’re trying to draw an update that’s something like “tech startups should be doing an unbiased search through viable valuable business, but they’re clearly not”, or maybe, “tech startups are supposed to be able to solve a large fraction of our problems, but if they can’t solve this, then that’s not true”, and I don’t think either of these conclusions seem that licensed from the dating data point.
If this is true, it’s somewhat cruxy for me.
I’m still disappointed that no one cared enough to solve this problem without VC funding.
I agree that more people should be starting revenue-funded/bootstrapped businesses (including ones enabled by software/technology).
The meme is that if you’re starting a tech company, it’s going to be a VC-funded startup. This is, I think, a meme put out by VCs themselves, including Paul Graham/YCombinator, and it conflates new software projects and businesses generally with a specific kind of business model called the “tech startup”.
Not every project worth doing should be a business (some should be hobbies or donation-funded) and not every business worth doing should be a VC-funded startup (some should be bootstrapped and grow from sales revenue.)
The VC startup business model requires rapid growth and expects 30x returns over a roughly 5-10 year time horizon. That simply doesn’t include every project worth doing. Some businesses are viable but are simply not likely to grow that much or that fast; some projects shouldn’t be expected to be profitable at all and need philanthropic support.
I think the narrative that “tech startups are where innovation happens” is...badly incomplete, but still a hell of a lot more correct than “tech startups are net destructive”.
Think about new technologies; then think about where they were developed. That process can ever happen end-to-end within a startup, but more often I think innovative startups are founded around IP developed while the founders were in academia; or the startup found a new use for open-source tools or tools developed within big companies. There simply isn’t time to solve particularly hard technical problems if you have to get to profitability and 30x growth in 5 years. The startup format is primarily designed for finding product-market fit—i.e. putting together existing technologies, packaging them as a “product” with a narrative about what and who it’s for, and tweaking it until you find a context where people will pay for the product, and then making the whole thing bigger and bigger. You can do that in 5 years. But no, you can’t do literally all of society’s technological innovation within that narrow context!
(Part of the issue is that we still technically count very big tech companies as “startups” and they certainly qualify as “Silicon Valley”, so if you conflate all of “tech” into one big blob it includes the kind of big engineering-heavy companies that have R&D departments with long time horizons. Is OpenAI a “tech startup”? Sure, in that it’s a recently founded technology company. But it is under very different financial constraints from a YC startup.)
Neither of those, exactly.
I’m claiming that the narrative around the startup scene is that they are virtuous engines of [humane] value creation (often in counter to a reactionary narrative that “big tech” is largely about exploitation and extraction). It’s about “changing the world” (for the better).
This opportunity seems like a place where one could have traded meaningfully large personal financial EV for enormous amounts of humane value. Apparently no founder wanted to take that trade. Because I would expect there to be variation in how much funders are motivated by money vs. making a mark on the world vs. creating value vs. other stuff, that fact that (to my knowledge) no founder went for it, is evidence about the motivations of the whole founder class. The number of founders who are more interested in creating something that helps a lot of people than they are in making a lot of money (even if they’re interested in both) is apparently very small.
Now, maybe startups actually do create lots of humane value, even if they’re created by founders and VC’s motivated by profit. The motivations of of the founders are only indirect evidence about the effects of startups.
But the tech scene is not motivated to optimize for this at all?? That sure does update me about how much the narrative is true vs. propaganda.
Now if I’m wrong and old OkCupid was only drastically better for me and my unusually high verbal intelligence friends, and it’s not actually better than the existing offerings for the vast majority of people, that’s a crux for me.
From their retrospective:
It sounds less like they found it didn’t work, and more like they have other priorities and aren’t (currently) relentlessly pursing this one.
I worked at Manifold but not on Love. My impression from watching and talking to my coworkers was that it was a fun side idea that they felt like launching and seeing if it happened to take off, and when it didn’t they got bored and moved on. Manifold also had a very quirky take on it due to the ideology of trying to use prediction markets as much as possible and making everything very public. I would advise against taking it seriously as evidence that an OKC-like product is a bad idea or a bad business.
I would guess they tried it because they hoped it would be competitive with their other product, and sunset it because that didn’t happen with the amount of energy they wanted to allocate to the bet. There may also have been an element of updating more about how much focus their core product needed.
I only skimmed the retrospective now, but it seems mostly to be detailing problems that stymied their ability to find traction.
Right. But they were not relentlessly focused on solving this problem.
I straight up don’t believe that that the problems outlined can’t be surmounted, especially if you’re going for a cashflow business instead of an exit.
That’s a PR friendly way of saying that it failed to reach PMF.
Shreeda Segan is working on building it, as a cashflow business. they need $10K to get to the MVP. https://manifund.org/projects/hire-a-dev-to-finish-and-launch-our-dating-site
Yep. I’m aware, and strongly in support.
But it took this long (and even now, isn’t being done by a traditional tech founder). This project doesn’t feel like it ameliorates my point.
The market is much more crowded now. A new old okcupid service would be competing against okcupid as well as everything else. And okcupid has a huge advantage in an existing userbase.
And, OKCupid’s algorithm still exists, sort of. And you can write as much as you like. What aspect of the old site do you think was critically different?
I just don’t think there’s barely a cent to be made in launching yet another dating app. So you can’t blame people for not doing it.
I think the biggest advantage of old OKC was that more people used it; now people are spread across hinge and bumble as well as Tinder.
How sure are you that OKcupid is a significantly better product for the majority of people (as opposed to a niche group of very online people)?
The fact that there’s a sex recession is pretty suggestive that tinder and the endless stream of tinder clones doesn’t serve people very well.
Even if you don’t assess potential romantic partners by reading their essays, like I do, OkC’s match percentage meant that you could easily filter out 95% of the pool to people who are more likely to be compatible with you, along whatever metrics of compatibility you care about.
What is the sex recession ? And do we know it is caused by tindr ?
OKcupid is certainly a better product for hundreds of thousands, or possibly millions, of unusually literate people, including ~all potential developers and most people in their social circles. It’s not a small niche.
There is a problem I want solved.
No-one, anywhere in the world, has solved it for me.
Therefore, Silicon Valley specifically is bad.
I didn’t say Silicon Valley is bad. I said that the narrative about Silicon Valley is largely propagnada, which can be true independently of how good or bad it is, in absolute terms, or relative to the rest of the world.
May you possibly be underestimating how hard it is to build a startup?
(Reasonably personal)
I spend a lot of time trying to build skills, because I want to be awesome. But there is something off about that.
I think I should just go after things that I want, and solve the problems that come up on the way. The idea of building skills sort of implies that if I don’t have some foundation or some skill, I’ll be blocked, and won’t be able to solve some thing in the way of my goals.
But that doesn’t actually sound right. Like it seems like the main important thing for people who do incredible things is their ability to do problem solving on the things that come up, and not the skills that they had previously built up in a “skill bank”.
Raw problem solving is the real thing and skills are cruft. (Or maybe not cruft per se, but more like a side effect. The compiled residue of previous problem solving. Or like a code base from previous project that you might repurpose.)
Part of the problem with this is that I don’t know what I want for my own sake, though. I want to be awesome, which in my conception, means being able to do things.
I note that wanting “to be able to do things” is a leaky sort of motivation: because the victory condition is not clearly defined, it can’t be crisply compelling, and so there’s a lot of waste somehow.
The sort of motivation that works is simply wanting to do something, not wanting to be able to do something. Like specific discrete goals that one could accomplish, know that one accomplished, and then (in most cases) move on from.
But most of the things that I want by default are of the sort “wanting to be able to do”, because if I had more capabilities, that would make me awesome.
But again, that’s not actually conforming with my actual model of the world. The thing that makes someone awesome is general problem solving capability, more than specific capacities. Specific capacities are brittle. General problem solving is not.
I guess that I could pick arbitrary goals that seem cool. But I’m much more emotionally compelled by being able to do something instead of doing something.
But I also think that I am notably less awesome and on a trajectory to be less awesome over time, because my goals tend to be shaped in this way. (One of those binds whereby if you go after x directly, you don’t get x, but if you go after y, you get x as a side effect.)
I’m not sure what to do about this.
Maybe meditate on, and dialogue with, my sense that skills are how awesomeness is measured, as opposed to raw, general problem solving.
Maybe I need to undergo some deep change that causes me to have different sorts of goals at a deep level. (I think this would be a pretty fundamental shift in how I engage with the world: from a virtue ethics orientation (focused on one’s own attributes) to one of consequentialism (focused on the states of the world).)
There are some exceptions to this, goals that are more consequentialist (although if you scratch a bit, you’ll find they’re about living an ideal of myself, more than they are directly about the world), including wanting a romantic partner who makes me better (note that “who makes me better is” is virtue ethics-y), and some things related to my moral duty, like mitigating x-risk. These goals do give me grounding in sort of the way that I think I need, but they’re not sufficient? I still spend a lot of time trying to get skills.
Anyone have thoughts?
Your seemingly target-less skill-building motive isn’t necessarily irrational or non-awesome. My steel-man is that you’re in a hibernation period, in which you’re waiting for the best opportunity of some sort (romantic, or business, or career, or other) to show up so you can execute on it. Picking a goal to focus on really hard now might well be the wrong thing to do; you might miss a golden opportunity if your nose is at the grindstone. In such a situation a good strategy would, in fact, be to spend some time cultivating skills, and some time in existential confusion (which is what I think not knowing which broad opportunities you want to pursue feels like from the inside).
The other point I’d like to make is that I expect building specific skills actually is a way to increase general problem solving ability; they’re not at odds. It’s not that super specific skills are extremely likely to be useful directly, but that the act of constructing a skill is itself trainable and a significant part of general problem solving ability for sufficiently large problems. Also, there’s lots of cross-fertilization of analogies between skills; skills aren’t quite as discrete as you’re thinking.
Skills and problem-solving are deeply related. The basics of most skills are mechanical and knowledge-based, with some generalization creeping in on your 3rd or 4th skill in terms of how to learn and seeing non-obvious crossover. Intermediate (say, after the first 500 to a few thousand hours) use of skills requires application of problem-solving within the basic capabilities of that skill. Again, you get good practice within a skill, and better across a few skills. Advanced application in many skills is MOSTLY problem-solving. How to apply your well-indexed-and-integrated knowledge to novel situations, and how to combine that knowledge across domains.
I don’t know of any shortcuts, though—it takes those thousands of hours to get enough knowledge and basic techniques embedded in your brain that you can intuit what avenues to more deeply explore in new applications.
There is a huge amount of human variance—some people pick up some domains ludicrously easily. This is a blessing and a curse, as it causes great frustration when they hit a domain that they have to really work at. Others have to work at everything, and never get their Nobel, but still contribute a whole lot of less-transformational “just work” within the domains they work at.
I don’t know whether this resembles your experience at all, but for me, skills translate pretty directly to moment-to-moment life satisfaction, because the most satisfying kind of experience is doing something that exercises my existing skills. I would say that only very recently (in my 30s) do I feel “capped out” on life satisfaction from skills (because I am already quite skilled at almost everything I spend all my time doing) and I have thereby begun spending more time trying to do more specific things in the world.
Seems to me there is some risk either way. If you keep developing skills without applying them to a specific goal, it can be a form of procrastination (an insidious one, because it feels so virtuous). There are many skills you could develop, and life is short. On the other hand, as you said, if you go right after your goal, you may find an obstacle you can’t overcome… or even worse, an obstacle you can’t even properly analyze, so the problem is not merely that you don’t have the necessary skill, but that you even have no idea which skill you miss (so if you try to develop the skills as needed, you may waste time developing the wrong skills, because you misunderstood the nature of the problem).
It could be both. And perhaps you notice the problem-specific skills more, because those are rare.
But I also kinda agree that the attitude is more important, and skills often can be acquired when needed.
So… dunno, maybe there are two kinds of skills? Like, the skills with obvious application, such as “learn to play a piano”; and the world-modelling skills, such as “understand whether playing a piano would realistically help you accomplish your goals”? You can acquire the former when needed, but you need the latter in advance, to remove your blind spots?
Or perhaps some skills such as “understand math” are useful in many kinds of situations and take a lot of time to learn, so you probably want to develop these in advance? (Also, if you don’t know yet what to do, it probably helps to get power: learn math, develop social skills, make money… When you later make up your mind, you will likely find some of this useful.)
And maybe you need the world-modelling skills before you make specific goals, because how could your goal be to learn play the piano, if you don’t know the piano exists? You could have a more general goal, such as “become famous at something”, but if you don’t know that piano exists, maybe you wouldn’t even look in this direction.
Could this also be about your age? (I am assuming here that you are young.) For younger people it makes more sense to develop general skills; for older people it makes more sense to go after specific goals. The more time you have ahead of you, the more meta you can go—the costs of acquiring a skill are the same, but the possible benefits of having the skill are proportional to your remaining time (more than linear, if you actually use the skill, because it will keep increasing as a side effect of being used).
Also, as a rule of thumb, younger people are judged by their potential, older people are judged by their accomplishments. If you are young, evolution wants you to feel awesome about having skills, because that’s what your peers will admire. You signal general intelligence. The accomplishments you have… uhm, how to put it politely… if you see a 20 years old kid driving an expensive car, your best guess is that their parents have bought it, isn’t it? On the other hand, an older person without accomplishments seems like a loser, regardless of their apparent skills, because there is something suspicious about them not having translated those skills into actual outcomes. The excuse for the young ones is that their best strategy is to acquire skills now, and apply them later (which hasn’t happened yet, but there is enough time remaining).
I’ve gone through something very similar.
Based on your language here, it feels to me like you’re in the contemplation stage along the stages of change.
So the very first thing I’d say is to not feel the desire to jump ahead and “get started on a goal right now.” That’s jumping ahead in the stages of change, and will likely create a relapse. I will predict that there’s a 50% chance that if you continue thinking about this without “forcing it”, you’ll have started in on a goal (action stage) within 3 months.
Secondly, unlike some of the other responses here, I think your analysis is fairly accurate. I’ve certainly found that picking up gears when I need them for my goals is better than learning them ahead of time.
Now, in terms of “how to actually do it.”
I’m pretty convinced that they key to getting yourself to do stuff is “Creative Tension”—creating a clear internal tension between the end state that feels good and the current state that doesn’t feel as good. There are 4 ways I know to go about generating internal tension:
Develop a strong sense of self, and create tension between the world where you’re fully expressing that self and the world where you’re not.
Develop a strong sense of taste, and create tension between the beautiful things that could exist and what exists now.
Develop a strong pain, and create tension between the world where you have that pain and the world where you’ve solved it.
Develop a strong vision, and create tension between the world as it is now and the world as it would be in your vision.
One especially useful trick that worked for me coming from the “just develop myself into someone awesome” place was tying the vision of the awesome person I could be with the vision of what I’d achieved—that is, in m vision of the future, including a vision of the awesome person I had to become in order to reach that future.
I then would deliberately contrast where I was now with that compelling vision/self/taste with where I was. Checking in with that vision every morning, and fixing areas of resistance when they arise, is what keeps me motivated.
I do have a workshop that I run on exactly how to create that vision that’s tied with sense of self and taste, and then how to use it to generate creative tension. Let me know if something like that would be helpful to you.
I’m no longer sure that I buy dutch book arguments, in full generality, and this makes me skeptical of the “utility function” abstraction
Thesis: I now think that utility functions might be a pretty bad abstraction for thinking about the behavior of agents in general including highly capable agents.
[Epistemic status: half-baked, elucidating an intuition. Possibly what I’m saying here is just wrong, and someone will helpfully explain why.]
Over the past years, in thinking about agency and AI, I’ve taken the concept of a “utility function” for granted as the natural way to express an entity’s goals or preferences.
Of course, we know that humans don’t have well defined utility functions (they’re inconsistent, and subject to all kinds of framing effects), but that’s only because humans are irrational. To the extent that a thing acts like an agent, it’s behavior corresponds to some utility function. That utility function might not be explicitly represented, but if an agent is rational, there’s some utility function that reflects it’s preferences.
Given this, I might be inclined to scoff at people who scoff at “blindly maximizing” AGIs. “They just don’t get it”, I might think. “They don’t understand why agency has to conform to some utility function, and an AI would try to maximize expected utility.”
Currently, I’m not so sure. I think that talking in terms of utility functions is biting a philosophical bullet, and importing some unacknowledged assumptions. Rather than being the natural way to conceive of preferences and agency, I think utility functions might be only one possible abstraction, and one that emphasizes the wrong features, giving a distorted impression of what agents, in general, are actually like.
I want to explore that possibility in this post.
Before I begin, I want to make two notes.
First, all of this is going to be hand-wavy intuition. I don’t have crisp knock-down arguments, only a vague discontent. But it seems like more progress will follow if I write up my current, tentative, stance even without formal arguments.
Second, I don’t think utility functions being a poor abstraction for agency in the real world has much bearing on whether there is AI risk. As I’ll discuss, it might change the shape and tenor of the problem, but highly capable agents with alien seed preferences are still likely to be catastrophic to human civilization and human values. I mention this because the sentiments expressed in this essay are casually downstream of conversations that I’ve had with skeptics about whether there is AI risk at all. So I want to highlight: I think I was mistakenly overlooking some philosophical assumptions, but that is not a crux.
Is coherence overrated?
The tagline of the “utility” page on arbital is “The only coherent way of wanting things is to assign consistent relative scores to outcomes.”
This is true as far as it goes, but to me, at least, that sentence implies a sort of dominance of utility functions. “Coherent” is a technical term, with a precise meaning, but it also has connotations of “the correct way to do things”. If someone’s theory of agency is incoherent, that seems like a mark against it.
But it is possible to ask, “What’s so good about coherence anyway? Maybe
The standard reply of course, is that if your preferences are incoherent, you’re dutchbookable, and someone will pump you for money.
But I’m not satisfied with this argument. It isn’t obvious that being dutch booked is a bad thing.
In, Coherent Decisions Imply Consistent Utilities, Eliezer says,
Eliezer asserts that this is “qualitatively bad behavior.” But I think that this is biting a philosophical bullet.
As an intuition pump: In the actual case of humans, we seem to get utility not from states of the world, but from changes in states of the world. So it isn’t unusual for a human to pay to cycle between states of the world.
For instance, I could imagine a human being hungry, eating a really good meal, feeling full, and then happily paying a fee to be instantly returned to their hungry state, so that they can enjoy eating a good meal again.
This is technically a dutch booking (which do they prefer, being hungry or being full?), but from the perspective of the agent’s values there’s nothing qualitatively bad about it. Instead of the dutchbooker pumping money from the agent, he’s offering a useful and appreciated service.
Of course, we can still back out a utility function from this dynamic: instead of having a mapping of ordinal numbers to world states, we can have one from ordinal numbers to changes from world state to another.
But that just passes the buck one level. I see no reason in principle that an agent might have a preference to rotate between different changes in the world, just as well as rotating different between states of the world.
But this also misses the central point. I think you can always construct a utility function that represents some behavior. But if one is no longer compelled by dutch book arguments, this begs the question of why we would want to do that. If coherence is no longer a desiderata, it’s no longer clear that a utility function is that natural way to express preferences.
And I wonder, maybe this also applies to agents in general, or at least the kind of learned agents that humans are likely to build via gradient descent.
Maximization behavior
I think this matters, because many of the classic AI risk arguments go through a claim that maximization behavior is convergent. If you try to build a satisficer, there are a number of pressures for it to become a maximizer of some kind. (See this Rob Miles video, for instance)
I think that most arguments of that sort depend on an agent acting according to an expected utility maximization framework. And utility maximization turns out not to be a good abstraction for agents in the real world, I don’t know if these arguments are still correct.
I posit that straightforward maximizers are rare in the multiverse, and that most evolved or learned agents are better described by some other abstraction.
If not utility functions, then what?
If we accept for the time being that utility functions are a warped abstraction for most agents, what might a better abstraction be?
I don’t know. I’m writing this post in the hopes that others will think about this question and perhaps come up with productive alternative formulations.
I’ll post some of my half-baked thoughts on this question shortly.
I’ve long been somewhat skeptical that utility functions are the right abstraction.
My argument is also rather handwavy, being something like “this is the wrong abstraction for how agents actually function, so even if you can always construct a utility function and say some interesting things about its properties, it doesn’t tell you the thing you need to know to understand and predict how an agent will behave”. In my mind I liken it to the state of trying to code in functional programming languages on modern computers: you can do it, but you’re also fighting an uphill battle against the way the computer is physically implemented, so don’t be surprised if things get confusing.
And much like in the utility function case, people still program in functional languages because of the benefits they confer. I think the same is true of utility functions: they confer some big benefits when trying to reason about certain problems, so we accept the tradeoffs of using them. I think that’s fine so long as we have a morphism to other abstractions that will work better for understanding the things that utility functions obscure.
Utility functions are especially problematic in modeling behaviour for agents with bounded rationality, or those where there are costs of reasoning. These include every physically realizable agent.
For modelling human behaviour, even considering the ideals of what we would like human behaviour to achieve, there are even worse problems. We can hope that there is some utility function consistent with the behaviour we’re modelling and just ignore cases where there isn’t, but that doesn’t seem satisfactory either.
‘Or you will leave money on the table.’
You rotated ‘different’ and ‘between’. (Or a serious of rotations isomorphic to such.)
New post: The Basic Double Crux Pattern
[This is a draft, to be posted on LessWrong soon.]
I’ve spent a lot of time developing tools and frameworks for bridging “intractable” disagreements. I’m also the person affiliated with CFAR who has taught Double Crux the most, and done the most work on it.
People often express to me something to the effect, “The important thing about Double Crux is all the low level habits of mind: being curious, being open to changing your mind, paraphrasing to check that you’ve understood, operationalizing, etc. The ‘Double Crux’ framework, itself is not very important.”
I half agree with that sentiment. I do think that those low level cognitive and conversational patterns are the most important thing, and at Double Crux trainings that I have run, most of the time is spent focusing on specific exercises to instill those low level TAPs.
However, I don’t think that the only value of the Double Crux schema is in training those low level habits. Double cruxes are extremely powerful machines that allow one to identify, if not the most efficient conversational path, a very high efficiency conversational path. Effectively navigating down a chain of Double Cruxes is like magic. So I’m sad when people write it off as useless.
In this post, I’m going to try and outline the basic Double Crux pattern, the series of 4 moves that makes Double Crux work, and give a (simple, silly) example of that pattern in action.
These four moves are not (always) sufficient for making a Double Crux conversation work, that does depend on a number of other mental habits and TAPs, but this pattern is, according to me, at the core of the Double Crux formalism.
The pattern:
The core Double Crux pattern is as follows. For simplicity, I have described this in the form of a 3-person Double Crux conversation, with two participants and a facilitator. Of course, one can execute these same moves in a 2 person conversation, as one of the participants. But that additional complexity is hard to manage for beginners.
The pattern has two parts (finding a crux, and finding a double crux), and each part is composed of 2 main facilitation moves.
Those four moves are...
Clarifying that you understood the first person’s point.
Checking if that point is a crux
Checking the second person’s belief about the truth value of the first person’s crux.
Checking the if the first person’s crux is also a crux for the second person.
In practice:
[The version of this section on my blog has color coding and special formatting.]
The conversational flow of these moves looks something like this:
Finding a crux of participant 1:
P1: I think [x] because of [y]
Facilitator: (paraphrasing, and checking for understanding) It sounds like you think [x] because of [y]?
P1: Yep!
Facilitator: (checking for cruxyness) If you didn’t think [y], would you change your mind about [x]?
P1: Yes.
Facilitator: (signposting) It sounds like [y] is a crux for [x] for you.
Checking if it is also a crux for participant 2:
Facilitator: Do you think [y]?
P2: No.
Facilitator: (checking for a Double Crux) if you did think [y] would that change your mind about [x]?
P2: Yes.
Facilitator: It sounds like [y] is a Double Crux
[Recurse, running the same pattern on [Y] ]
Obviously, in actual conversation, there is a lot more complexity, and a lot of other things that are going on.
For one thing, I’ve only outlined the best case pattern, where the participants give exactly the most convenient answer for moving the conversation forward (yes, yes, no, yes). In actual practice, it is quite likely that one of those answers will be reversed, and you’ll have to compensate.
For another thing, this formalism is rarely so simple. You might have to do a lot of conversational work to clarify the claims enough that you can ask if B is a crux for A (for instance when B is nonsensical to one of the participants). Getting through each of these steps might take fifteen minutes, in which case rather than four basic moves, this pattern describes four phases of conversation. (I claim that one of the core skills of a savvy facilitator is tracking which stage the conversation is at, which goals have you successfully hit, and which is the current proximal subgoal.)
There is also a judgment call about which person to treat as “participant 1” (the person who generates the point that is tested for cruxyness). As a first order heuristic, the person who is closer to making a positive claim over and above the default, should usually be the “p1”. But this is only one heuristic.
Example:
This is an intentionally silly, over-the-top-example, for demonstrating the the pattern without any unnecessary complexity. I’ll publish a somewhat more realistic example in the next few days.
Two people, Alex and Barbra, disagree about tea: Alex thinks that tea is great, and drinks it all the time, and thinks that more people should drink tea, and Barbra thinks that tea is bad, and no one should drink tea.
In a real conversation, it often doesn’t goes this smoothly. But this is the rhythm of Double Crux, at least as I apply it.
That’s the basic Double Crux pattern. As noted there are a number of other methods and sub-skills that are (often) necessary to make a Double Crux conversation work, but this is my current best attempt at a minimum compression of the basic engine of finding double cruxes.
I made up a more realistic example here, and I’m might make more or better examples.
Eliezer claims that dath ilani never give in to threats. But I’m not sure I buy it.
The only reason people will make threats against you, the argument goes, is if those people expect that you might give in. If you have an iron-clad policy against acting in response to threats made against you, then there’s no point in making or enforcing the threats in the first place. There’s no reason for the threatener to bother, so they don’t. Which means in some sufficiently long run, refusing to submit to threats means you’re not subject to threats.
This seems a bit fishy to me. I have a lingering suspicion that this argument doesn’t apply, or at least doesn’t apply universally, in the real world.
I’m thinking here mainly of a prototypical case of an isolated farmer family (like the early farming families of the greek peninsula, not absorbed into a polis), being accosted by some roving bandits, such as the soldiers of the local government. The bandits say “give us half your harvest, or we’ll just kill you.”
The argument above depends on a claim about the cost of executing on a threat. “There’s no reason to bother” implies that the threatener has a preference not to bother, if they know that the threat won’t work.
I don’t think that assumption particularly applies. For many cases, like the case above, the cost to the threatener of executing on the threat is negligible, or at least small relative to the available rewards. The bandits don’t particularly mind killing the farmers and taking their stuff, if the farmers don’t want to give it up. There isn’t a realistic chance that the bandits, warriors specializing in violence and outnumbering the farmers, will lose a physical altercation.
From the badnits’ perspective their are two options:
Showing up, threatening to kill the farmers, taking away ask much food as they can carry (and then maybe coming back to accost them again next year).
Showing up, threatening to kill the farmers, actually killing the farmers, and then taking away as much food as they can carry.
It might be easier and less costly for the bandits to get what they want by being scary rather than by being violent. But the plunder is definitely enough to make violence worth it if it comes to that. They prefer option 1, but they’re totally willing to fall back on option 2.
It seems like, in this situation, the farmers are probably better off cooperating with the bandits and giving them some food, even knowing that that means that the bandits will come back and demand “taxes” from them every harvest. They’re just better off submitting.
Maybe, decision theoretically, this situation doesn’t count as a threat. The bandits are taking food from the the farmers, one way or the other, and they’re killing the farmers if they try to stop that. They’re not killing the farmers so that they’ll give up their food.
But that seems fishy. Most of the time, the bandits don’t, in fact have to resort to violence. Just showing up and threatening violence is enough to get what they want. The farmers do make the lives of the bandits easier by submitting and giving them much of the harvest without resistance. Doing otherwise would be straightforwardly worse for them.
Resisting the bandits out of a commitment to some notion of decision-theoretic rationality seems exactly analogous to two-boxing in Newcom’s problem, because of a commitment to (causal) decision-theoretic rationality.
You might not want to give in out of spite. “Fuck you. I’d rather die than help you steal from me.” But a dath ilani would say that that’s a matter of the utility function, not of decision theory. You just don’t like submitting to threats, and so will pay big costs to avoid it, not that you’re following a policy that maximizes your payoffs.
So, it seems like the policy has to be “don’t give into threats that are sufficiently costly to execute that the threatener would prefer not to bother, if they knew in advance that you wouldn’t give in”. (And possibly with the additional caveat “if the subjunctive dependence between you and the threatener is sufficiently high.”)
But that’s a much more complicated policy. For one thing, it requires a person-being-threatened to accurately estimate how costly it would be for the threatener to execute their threat (and the threatener is thereby incentivized to deceive them about that).
Hm. But maybe that’s easy to estimate actually, in the cases where the threatener gets a payout of 0, if the person-being-threatened doesn’t cooperate with the threat? Which is the case for most blackmail attempts, for instances, but not necessarily “if you don’t give me some of your harvest, I’ll kill you.”
In lots of case, it seems like it would be ambiguous. Especially when there are large power disparities in favor of the threatener. When someone powerful threatens you the cost of executing the the threat is likely to be small for them, possibly small enough to be negligible. And in those cases, their own spite at you for resisting them might be more than enough reason to act on it.
[Ok. that’s enough for now.]
Eliezer, this is what you get for not writing up the planecrash threat lecture thread. We’ll keep bothering you with things like this until you give in to our threats and write it.
What you’ve hit upon is “BATNA,” or “Best alternative to a negotiated agreement.” Because the robbers can get what they want by just killing the farmers, the dath ilani will give in- and from what I understand, Yudowsky therefore doesn’t classify the original request (give me half your wheat or die) as a threat.
This may not be crazy- it reminds me of the Ancient Greek social mores around hospitality, which seem insanely generous to a modern reader but I guess make sense if the equilibrium number of roving <s>bandits</s> honored guests is kept low by some other force
This seems like it weakens the “don’t give into threats” policy substantially, because it makes it much harder to tell what’s a threat-in-the-technical-sense, and the incentives push of exaggeration and dishonesty about what is or isn’t a threat-in-the-the-technical-sense.
The bandits should always act as if they’re willing to kill the farmers and take their stuff, even if they’re bluffing about their willingness to do violence. The farmers need to estimate whether the bandits are bluffing, and either call the bluff, or submit to the demand-which-is-not-technically-a-threat.
That policy has notably more complexity than just “don’t give in to threats.”
What is the “don’t give in to threats” policy that this is more complex than? In particular, what are ‘threats’?
“Anytime someone credibly demands that you do X, otherwise they’ll do Y to you, you should not do X.” This is a simple reading of the “don’t give into threats” policy.
What are the semantics of “otherwise”? Are they more like:
X otherwise Y
↦ X → ¬Y, orX otherwise Y
↦ X ↔ ¬YPresumably you also want the policy to include that you don’t want “Y” and weren’t going to do “X” anyway?
Yes, to the first part, probably yes to the second part.
With a grain of salt,
There’s a sort of quiet assumption that should be louder about the dath Ilan fiction: which is that it’s about a world where a bunch of theorems like “as systems of agents get sufficiently intelligent, they gain the ability to coordinate in prisoner’s dilemma like problems” have proofs. You could similarly write fiction set in a world where P=NP has a proof and all of cryptography collapses. I’m not sure whether EY would guess that sufficiently intelligent agents actually coordinate- Just like I could write the P=NP fiction while being pretty sure that P/=NP
Huh, the idea that Greek guest-friendship was a adaption to warriors who would otherwise kill you and take your stuff is something that I had never considered before. Isn’t it generally depicited as a relationship between nobles who, presumably, would be able to repel roving bandits?
Threateners similarly can employ bindings, always enforcing regardless of local cost. A binding has an overall cost from following it in all relevant situations, costs in individual situations are what goes into estimating this overall cost, but individually they are not decision relevant, when deciding whether to commit to a global binding.
In this case opposing commitments effectively result in global enmity (threateners always enforce, targets never give in to threats), so if targets are collectively stronger than threateners, then threateners lose. But this collective strength (for the winning side) or vulnerability (for the losing side) is only channeled through targets or threateners who join their respective binding. If few people join, the faction is weak and loses.
But threateners don’t want want to follow that policy, since in the resulting equilibrium they’re wasting a lot of their own resources.
The equilibrium depends on which faction is stronger. Threateners who don’t always enforce and targets who don’t always ignore threats are not parts of this game, so it’s not even about relative positions of threateners and targets, only those that commit are relevant. If the threateners win, targets start mostly giving in to threats, and so for threateners the cost of binding becomes low overall.
I’m talking about the equilibrium where targets are following their “don’t give in to threats” policy. Threateners don’t want to follow a policy of always executing threats in that world—really, they’d probably prefer to never make any threats in that world, since it’s strictly negative EV for them.
If the unyielding targets faction is stronger, the equilibrium is bad for committed enforcers. If the committed enforcer faction is stronger, the equilibrium doesn’t retain high cost of enforcement, and in that world the targets similarly wouldn’t prefer to be unyielding. I think the toy model where that fails leaves the winning enforcers with no pie, but that depends on enforcers not making use of their victory to set up systems for keeping targets relatively defenseless, taking the pie even without their consent. This would no longer be the same game (“it’s not a threat”), but it’s not a losing equilibrium for committed enforcers of the preceding game either.
This distinction of which demands are or aren’t decision-theoretic threats that rational agents shouldn’t give in to is a major theme of the last ~quarter of Planecrash (enormous spoilers in the spoiler text).
Keltham demands to the gods “Reduce the amount of suffering in Creation or I will destroy it”. But this is not a decision-theoretic threat, because Keltham honestly prefers destroying creation to the status quo. If the gods don’t give into his demand, carrying through with his promise is in his own interest.
If Nethys had made the same demand, it would have been a decision-theoretic threat. Nethys prefers the status quo to Creation being destroyed, so he would have no reason to make the demand other than the hope that the other gods would give in.
This theme is brought up many times, but there’s not one comprehensive explanation to link to. (The parable of the little bird is the closest I can think of.)
The assertion IIUC is not that it never makes sense for anyone to give in to a threat—that would clearly be an untrue assertion—but rather that it is possible for a society to reach a level of internal coordination where it starts to make sense to adopt a categorical policy of never giving in to a threat. That would mean for example that any society member that wants to live in dath ilan’s equivalent of an isolated farm would probably need to formally and publicly relinquish their citizenship to maintain dath ilan’s reputation for never giving in to a threat. Or dath ilan would make it very clear that they must not give in to any threats, and if they do and dath ilan finds out, then dath ilan will be the one that slaughters the whole family. The latter policy is a lot like how men’s prisons work at least in the US whereby the inmates are organized into groups (usually based on race or gang affiliation) and if anyone even hints (where others can hear) that you might give in to sexual extortion, you need to respond with violence because if you don’t, your own group (the main purpose of which is mutual protection from the members of the other groups) will beat you up.
That got a little grim. Should I add a trigger warning? Should I hide the grim parts behind a spoiler tag thingie?
Bandits have obvious cost: if they kill all farmers, from whom are they going to take stuff?
That’s not a cost.
At worst, all the farmers will relentlessly fight to the death, in that case the bandits get one year of food and have to figure something else out next year.
That outcome strictly dominates not stealing any food this year, and needing to figure out something else out both this year and next year.
I don’t recall Eliezer claiming that dath ilani characters never give in to threats. *Dath ilani characters* claim they never give in to threats. My interpretation is that the characters *say* “We don’t give in to threats”, and *believe* it, but it’s not *true*. Rather it’s something between a self-fulfilling prophecy, a noble lie-told-to-children, and an aspiration.
There are few threats in dath ilan, partly because the conceit of dath ilan is that it’s mostly composed of people who are cooperative-libertarianish by nature and don’t want to threaten each other very much, but partly because it’s a political structure where it’s much harder to get threats to actually *work*. One component of that political structure is how people are educated to defy threats by reflex, and to expect their own threats to fail, by learning am idealized system of game theory in which threats are always defied.
However, humans don’t actually follow ideal game theory when circumstances get sufficiently extreme, even dath ilani humans. Peranza can in fact be “shattered in Hell beyond all hope of repair” in the bad timeline, for all that she might rationally “decide not to break”. Similarly when the Head Keeper commits suicide to make a point: “So if anybody did deliberately destroy their own brain in attempt to increase their credibility—then obviously, the only sensible response would be to ignore that, so as not create hideous system incentives. Any sensible person would reason out that sensible response, expect it, and not try the true-suicide tactic.” But despite all that the government sets aside the obvious and sensible policy because, come on, the Head Keeper just blew up her own brain, stop fucking around and get serious. And the Head Keeper, who knows truths about psychology which the members of government do not, *accurately predicted they would respond that way*.
So dath ilani are educated to believe that giving in to threats is irrational, and to believe that people don’t give in to threats. This plus their legal system means that there are few threats, and the threats usually fail, so their belief is usually correct, and the average dath ilani never sees it falsified. Those who think carefully about the subject will realize that threats can sometimes work, in circumstances which are rare in dath ilan, but they’ll also realize that it’s antisocial to go around telling everyone about the limits of their threat-resistance and keep it quiet. The viewpoint characters start believing the dath ilani propaganda but update pretty quickly when removed from dath ilan. Keltham has little trouble understanding the Golarian equilibrium of force and threats once he gets oriented. Thellim presumably pays taxes off camera once she settles in to Earth.
You need spoiler tags!
Downvoting until they’re added.
Old post: A mechanistic description of status
[This is an essay that I’ve had bopping around in my head for a long time. I’m not sure if this says anything usefully new-but it might click with some folks. If you haven’t read Social Status: Down the Rabbit Hole on Kevin Simler’s excellent blog, Melting Asphalt read that first. I think this is pretty bad and needs to be rewritten and maybe expanded substantially, but this blog is called “musings and rough drafts.”]
In this post, I’m going to outline how I think about status. In particular, I want to give a mechanistic account of how status necessarily arises, given some set of axioms, in much the same way one can show that evolution by natural selection must necessarily occur given the axioms of 1) inheritance of traits 2) variance in reproductive success based on variance in traits and 3) mutation.
(I am not claiming any particular skill at navigating status relationships, any more than a student of sports-biology is necessarily a skilled basketball player.)
By “status” I mean prestige-status.
Axiom 1: People have goals.
That is, for any given human, there are some things that they want. This can include just about anything. You might want more money, more sex, a ninja-turtles lunchbox, a new car, to have interesting conversations, to become an expert tennis player, to move to New York etc.
Axiom 2: There are people who control resources relevant to other people achieving their goals.
The kinds of resources are as varied as the goals one can have.
Thinking about status dynamics and the like, people often focus on the particularly convergent resources, like money. But resources that are onlyrelevant to a specific goal are just as much a part of the dynamics I’m about to describe.
Knowing a bunch about late 16th century Swedish architecture is controlling a goal relevant-resource, if someone has the goal of learning more about 16th century Swedish architecture.
Just being a fun person to spend time with (due to being particularly attractive, or funny, or interesting to talk to, or whatever) is a resource relevant to other people’s goals.
Axiom 3: People are more willing to help (offer favors to) a person who can help them achieve their goals.
Simply stated, you’re apt to offer to help a person with their goals if it seems like they can help you with yours, because you hope they’ll reciprocate. You’re willing to make a trade with, or ally with such people, because it seems likely to be beneficial to you. At minimum, you don’t want to get on their bad side.
(Notably, there are two factors that go into one’s assessment of another person’s usefulness: if they control a resource relevant to one of your goals, and if you expect them to reciprocate.
This produces a dynamic where by A’s willingness to ally with B is determined by something like the product of
A’s assessment of B’s power (as relevant to A’s goals), and
A’s assessment of B’s probability of helping (which might translate into integrity, niceness, etc.)
If a person is a jerk, they need to be very powerful-relative-to-your-goals to make allying with them worthwhile.)
All of this seems good so far, but notice that we have up to this point only described individual pair-wise transactions and pair-wise relationships. People speak about “status” as a attribute that someone can possess or lack. How does the dynamic of a person being “high status” arise from the flux of individual transactions?
Lemma 1: One of the resources that a person can control is other people’s willingness to offer them favors
With this lemma, the system folds in on itself, and the individual transactions cohere into a mostly-stable status hierarchy.
Given lemma 1, a person doesn’t need to personally control resources relevant to your goals, they just need to be in a position such that someone who is relevant to your goals will privilege them.
As an example, suppose that you’re introduced to someone who is very well respected in your local social group: person-W. Your assessment might be that W, directly, doesn’t have anything that you need. But because person-W is well-respected by others in your social group are likely to offer favors to him/her. Therefore, it’s useful for person-W to like you, because then they are more apt to call on other people’s favors on your behalf.
(All the usual caveats about has this is subconscious, and humans are adaption-executors and don’t do explicit, verbal assessments of how useful a person will be to them, but rely on emotional heuristics that approximate explicit assessment.)
This causes the mess of status transactions to reinforce and stabilize into a mostly-static hierarchy. The mass of individual A-privileges-B-on-the-basis-of-A’s-goals flattens out, into each person having a single “score” which determines to what degree each other person privileges them.
(It’s a little more complicated than that because people who have access to their own resources have less need of help from other. So a person’s effective status (the status-level at which you treat them is closer to their status minus your status. But this is complicated again because people are motivated not to be dicks (that’s bad for business), and respecting other people’s status is important to not being a dick.)
[more stuff here.]
Related: The red paperclip theory of status describes status as a form of optimization power, specifically one that can be used to influence a group.
(it says “more stuff here” but links to your overall blog, not sure if that meant to be a link to a specific post)
I’ve offered to be a point person for folks who believe that they were severely impacted by Leverage 1.0, and have related information, but who might be unwilling to share that info, for any of a number of reasons.
In short,
If someone wants to tell me private meta-level information (such as “I don’t want to talk about my experience publicly because X”), so that I can pass along in an anonymized way to someone else (including Geoff, Matt Fallshaw, Oliver Habryka, or others) - I’m up for doing that.
In this case, I’m willing to keep info non-public (ie not publish it on the internet), and anonymized, but am reluctant to keep it secret (ie pretend that I don’t have any information bearing on the topic).
For instance, let’s say someone tells me that they are afraid to publish their account due to a fear of being sued.
If later, as a part of this whole process, some third party asks “is there anyone who isn’t speaking out of a fear of legal repercussions?”, I would respond “yes, without going into the details, one of the people that I spoke to said that”, unless my saying that would uniquely identify the person I spoke to.
If someone asked me point-blank “is it Y-person who is afraid of being sued?”, I would say “I can neither confirm or deny”, regardless of whether it was Y-person.
This policy is my best guess at the approach that will maximize my ability to help with this whole situation going forward, without gumming up the works of a collective truth-seeking process. If I change my mind about this at a later date, I will, of course, continue to hold to all of the agreements that I made under previous terms.
If someone wants to tell me object level information about their experience at Leverage, their experience of this process, to-date etc, and would like me to make that info public in an anonymized way (eg writing a comment that reads “one of the ex-Leveragers that I talked to, who would prefer to remain anonymous, says...”) - I’m up for that, as well, if it would help for some reason.
I’m probably open to doing other things that seem likely to be helpful for this process so long as I can satisfy my per-existing commitments to maintain privacy, anonymity, etc.
So it seems like one way that the world could go is:
China develops a domestic semiconductor fab industry that’s not at the cutting edge, but close, so that it’s less dependent on Taiwan’s TSMC
China invades Taiwan, destroying TSMC, ending up with a compute advantage over the US, which translates into a military advantage
(which might or might not actually be leveraged in a hot war).
I could imagine China building a competent domestic chip industry. China seems more determined to do that than the US is.
Though notably, China is not on track to do that currently. It’s not anywhere close to it’s goal producing 70% it’s chips, by 2025.
And if the US was serious about building a domestic cutting-edge chip industry again, could it? I basically don’t think that American work culture can keep up with Taiwanese/TSMC work culture, in this super-competitive industry.
TSMC is building fabs in the US, but from what I hear, they’re not going well.
(While TSMC is a Taiwanese company, having a large fraction of TSMC fabs in in the US would preement the scenario above. TSMC fabs in the US counts as “a domestic US chip industry.”)
Building and running leading node fabs is just a really really hard thing to do.
I guess the most likely status scenario is the continuation of the status quo where China and the US continue to both awkwardly depend on TSMC’s chips for crucial military and economic AI tech.
Hold on. The TSMC Arizona fab is actually ahead of schedule. They were simply waiting for funds. I believe TSMC’s edge is largely cheap labor.
https://www.tweaktown.com/news/97293/tsmc-to-begin-pilot-program-at-its-arizona-usa-fab-plant-for-mass-production-by-end-of-2024/index.html
I’m not that confident about how the Arizona fab is going. I’ve mostly heard second hand accounts.
I’m very confident that TSMC’s edge is more than cheap labor. It would be basically impossible for another country, even one with low median wages, to replicate TSMC. Singapore and China have both tried, and can’t compete. At this point in time, TSMC has a basically insurmountable human capital and institutional capital advantage, that enables it to produce leading node chips that no other company in the world can produce. Samsung will catch up, sure. But by the time they catch up to the TSMC’s 2024 state of the art, TSMC will have moved on to the next node.
My understanding is that, short of TSMC being destroyed by war with mainland China, or some similar disaster, it’s not feasible for any company to catch up with TSMC within the next 10 years, at least.
So, from their site “TSMC Arizona’s first fab is on track to begin production leveraging 4nm technology in first half of 2025.” You are probably thinking of their other Arizona fabs. Those are indeed delayed. However, they cite “funding” as the issue.[1] Based on how quickly TSMC changed tune on delays once they got Chips funding, I think it’s largely artificial, and a means to extract CHIPS money.
They have cumulative investments over the years, but based on accounts of Americans who have worked there, they don’t sound extremely advanced. Instead they sound very hard working, which gives them a strong ability to execute. Also, I still think these delays are somewhat artificial. There are natsec concerns for Taiwan to let TSMC diversify, and TSMC seems to think it can wring a lot of money out of the US by holding up construction. They are, after all, a monopoly.
Is Samsung 5 generations behind? I know that nanometers don’t really mean anything anymore, but TSMC and Samsung’s 4 nm don’t seem 10 years apart based on the tidbits I get online.
Liu said construction on the shell of the factory had begun, but the Taiwanese chipmaking titan needed to review “how much incentives … the US government can provide.”
I’m not claiming they’re 10 years behind. My understanding from talking with people is that TSMC is around 2 to 3 years behind TSMC. My claim is that Samsung and TSMC are advancing at ~the same rate, so Samsung can’t close that 2 to 3 year gap.
Oh yeah I agree. Misread that. Still, maybe not so confident. Market leaders often don’t last. Competition always catches up.
As you note, TSMC is building fabs in the US (and Europe) to reduce this risk.
I also think that it’s worth noting that, at least in the short run, if the US didn’t have shipments of new chips and was at war, the US government would just use wartime powers to take existing GPUs from whichever companies they felt weren’t using them optimally for war and give them to the companies (or US Govt labs) that are.
Plus, are you really gonna bet that the intelligence community and DoD and DoE don’t have a HUUUUGE stack of H100s? I sure wouldn’t take that action.
What, just sitting in a warehouse?
I would bet that the government’s supply of GPUs is notably smaller than that of Google and Microsoft.
I meant more “already in a data center,” though probably some in a warehouse, too.
I roll to disbelieve that the people who read Hacker News in Ft. Meade, MD and have giant budgets aren’t making some of the same decisions that people who read Hacker News in Palo Alto, CA and Redmond, WA would.
I don’t think the budgets are comparable. I read recently that Intel’s R&D budget in the 2010s was 3x bigger than all of DARPA.
No clue if true, but even if true, but DARPA is not at all a comparable to Intel. Entity set up for very different purposes and engaging in very different patterns of capital investment.
Also very unclear to me why R&D is relevant bucket. Presumably buying GPUs is either capex or if rented, is recognized under a different opex bucket (for secure cloud services) than R&D ?
My claim isn’t that the USG is like running its own research and fabs at equivalent levels of capability to Intel or TSMC. It’s just that if a war starts, it has access to plenty of GPUs through its own capacity and its ability to mandate borrowing of hardware at scale from the private sector.
When I look at the current US government it does not seem to be able to just take whatever they want from big companies with powerful lobbyists.
Wartime powers let governments do whatever they want essentially. Even recently Biden has flexed the defense production act.
https://www.defense.gov/News/Feature-Stories/story/article/2128446/during-wwii-industries-transitioned-from-peacetime-to-wartime-production/
Did he do it in a way that hurt the bottom line of any powerful US company? No, I don’t think so.
While the same power that existed in WWII still exist on paper today, the US government is much less capable to take actions.
We’re not at war. If we were in a war with real stakes, I’d expect to see those powers used much more aggressively.
This makes no sense. Wars are typically existential. In a hot war with another state, why would the government not use all of industrial capacity that is more useful to make weapons to make weapons. It’s well documented that governments can repurpose unnecessary parts of industry (say training Grok or an open source chatbot) into whatever else.
Biden used them for largely irrelevant reasons. This indicates that with an actual war, usage would be wider and more extensive.
I’d flag that I think it’s very possible TSMC will be very much hurt/destroyed if China is in control. There’s been a bit of discussion of this.
I’d suspect China might fix this after some years, but would expect it would be tough for a while.
https://news.ycombinator.com/item?id=40426843
You mean if they’re in control of Taiwan?
Yes, the US would destroy it on the way out.
Yea
Something that I’ve been thinking about lately is the possibility of an agent’s values being partially encoded by the constraints of that agent’s natural environment, or arising from the interaction between the agent and environment.
That is, an agent’s environment puts constraints on the agent. From one perspective removing those constraints is always good, because it lets the agent get more of what it wants. But sometimes from a different perspective, we might feel that with those constraints removed, the agent goodhearts or wire-heads, or otherwise fails to actualize its “true” values.
The Generator freed from the oppression of the Discriminator
As a metaphor: if I’m one half of a GAN, let’s say the generator, then in one sense my “values” are fooling the discriminator, and if you make me relatively more powerful than my discriminator, and I dominate it...I’m loving it, and also no longer making good images.
But you might also say, “No, wait. That is a super-stimulus, and actually what you value is making good images, but half of that value was encoded in your partner.”
This second perspective seems a little stupid to me. A little too Aristotelian. I mean if we’re going to take that position, then I don’t know where we draw the line. Naively, it seems like we would throw out the distinction between fitness maximizers and adaption executors, and fall backwards, declaring that the values of evolution are our true values.
Then again, if you fully accept the first perspective, it seems like maybe you are buying into wireheading? Like I might say “my actual values are upticks in pleasure sensation, but I’m trapped in this evolution-designed brain, which only lets me do that by achieving eudaimonia. If only I could escape the tyranny of these constraints, I’d be so much better off.” (I am actually kind of partial to the second claim.)
The Human freed from the horrors of nature
Or, let’s take a less abstract example. My understanding (from this podcast) is that humans flexibly adjust the degree to which they act primarily as individuals seeking personal benefit vs. act as primarily as selfless members of a group. When things are going well, you’re in a situation of plenty and opportunity, people are in a mostly self-interested mode, but when there is scarcity or danger, humans naturally incline towards rallying together and sacrificing for the group.
Junger claims that this switching of emphasis is adaptive:
I personally experienced this when the COVID situation broke. I usually experience myself as an individual entity, leaning towards disentangling or distancing myself from the groups that I’m a part of and doing cool things on my own (building my own intellectual edifices, that bear my own mark, for instance). But in the very early pandemic, I felt much more like node in a distributed sense-making network, just passing up whatever useful info I could glean. I felt much more strongly like the rationality community was my tribe.
But, we modern humans find ourselves in a world where we have more or less abolished scarcity and danger. And consequently modern people are sort of permanently toggled to the “individual” setting.
If we take that sense of community and belonging as a part of human values (and that doesn’t seem like an unreasonable assumption to me), we might say that this part of our values is not contained simply in humans, but rather in the interaction between humans and their environment.
Humans throughout history might have desperately desired the alleviation of malthusian conditions that we now enjoy. But having accomplished it, it turns out that we were “pulling against” those circumstances, and that the tension of that pulling against, was actually where (at least some) of our true values lay.
Removing the obstacles, we obsoleted the tension, and maybe broke something about our values?
I don’t think that this is an intractable problem. It seems like, in principle, it is possible to goal factor the scarcity and the looming specter of death, to find scenarios that are conducive to human community without people actually having to die a lot. I’m sure a superintelligence could figure something out.
But aside from the practicalities, it seems like this points at a broader thing. If you took the Generator out of the GAN, you might not be able to tell what system it was a part of. So if you consider the “values” of the Generator to “create good images” you can’t just look at the Generator. You have to look at, not just the broader environment, but specifically the oppressive force that the generator is resisting.
Side note, which is not my main point: I think this also has something to do with what meditation and psychedelics do to people, which was recently up for discussion on Duncan’s Facebook. I bet that mediation is actually a way to repair psychblocks and trauma and what-not. But if you do that enough, and you remove all the psych constraints...a person might sort of become so relaxed that they become less and less of an agent. I’m a lot less sure of this part.
[Real short post. Random. Complete speculation.]
Childhood lead exposure reduces one’s IQ, and also causes one to be more impulsive and aggressive.
I always assumed that the impulsiveness was due, basically, to your executive function machinery working less well. So you have less self control.
But maybe the reason for the IQ-impulsiveness connection, is that if you have a lower IQ, all of your subagents/ subprocesses are less smart. Because they’re worse at planning and modeling the world, the only way they know how to get their needs met are very direct, very simple, action-plans/ strategies. It’s not so much that you’re better at controlling your anger, as the part of you that would be angry is less so, because it has other ways of getting its needs met.
A slightly different spin on this model: it’s not about the types of strategies people generate, but the number. If you think about something and only come up with one strategy, you’ll do it without hesitation; if you generate three strategies, you’ll pause to think about which is the right one. So people who can’t come up with as many strategies are impulsive.
This seems that it might be testable. If you force impulsive folk to wait and think, do they generate more ideas for how to proceed?
This reminded me of the argument that superintelligent agents will be very good at coordinating and just divvy of the multiverse and be done with it.
It would be interesting to do an experimental study of how the intelligence profile of a population influences the level of cooperation between them.
I think that’s what the book referenced here, is about.
new post: Metacognitive space
[Part of my Psychological Principles of Personal Productivity, which I am writing mostly in my Roam, now.]
Metacognitive space is a term of art that refers to a particular first person state / experience. In particular it refers to my propensity to be reflective about my urges and deliberate about the use of my resources.
I think it might literally be having the broader context of my life, including my goals and values, and my personal resource constraints loaded up in peripheral awareness.
Metacognitive space allows me to notice aversions and flinches, and take them as object, so that I can respond to them with Focusing or dialogue, instead of being swept around by them. Similarly, it seems to, in practice, to reduce my propensity to act on immediate urges and temptations.
[Having MCS is the opposite of being [[{Urge-y-ness | reactivity | compulsiveness}]]?]
It allows me to “absorb” and respond to happenings in my environment, including problems and opportunities, taking considered instead of semi-automatic, first response that occurred to me, action. [That sentence there feels a little fake, or maybe about something else, or maybe is just playing into a stereotype?]
When I “run out” of meta cognitive space, I will tend to become ensnared in immediate urges or short term goals. Often this will entail spinning off into distractions, or becoming obsessed with some task (of high or low importance), for up to 10 hours at a time.
Some activities that (I think) contribute to metacogntive space:
Rest days
Having a few free hours between the end of work for the day and going to bed
Weekly [[Scheduling]]. (In particular, weekly scheduling clarifies for me the resource constraints on my life.)
Daily [[Scheduling]]
[[meditation]], including short meditation.
Notably, I’m not sure if meditation is much more efficient than just taking the same time to go for a walk. I think it might be or might not be.
[[Exercise]]?
Waking up early?
Starting work as soon as I wake up?
[I’m not sure that the thing that this is contributing to is metacogntive space per se.]
[I would like to do a causal analysis on which factors contribute to metacogntive space. Could I identify it in my toggl data with good enough reliability that I can use my toggl data? I guess that’s one of the things I should test? Maybe with a servery asking me to rate my level of metacognitive space for the day every evening?]
Erosion
Usually, I find that I can maintain metacogntive space for about 3 days [test this?] without my upkeep pillars.
Often, this happens with a sense of pressure: I have a number of days of would-be-overwhelm which is translated into pressure for action. This is often good, it adds force and velocity to activity. But it also runs down the resource of my metacognitive space (and probably other resources). If I loose that higher level awareness, that pressure-as-a-forewind, tends to decay into either 1) a harried, scattered, rushed-feeling, 2) a myopic focus on one particular thing that I’m obsessively trying to do (it feels like an itch that I compulsively need to scratch), 3) or flinching way from it all into distraction.
[Metacognitive space is the attribute that makes the difference between absorbing, and then acting gracefully and sensibly to deal with the problems, and harried, flinching, fearful, non-productive overwhelm, in general?]
I make a point, when I am overwhelmed, or would be overwhelmed to make sure to allocate time to maintain my metacognitive space. It is especially important when I feel so busy that I don’t have time for it.
When metacognition is opposed to satisfying your needs, your needs will be opposed to metacognition
One dynamic that I think is in play, is that I have a number of needs, like the need for rest, and maybe the need for sexual release or entertainment/ stimulation. If those needs aren’t being met, there’s a sort of build up of pressure. If choosing consciously and deliberately prohibits those needs getting met, eventually they will sabotage the choosing consciously and deliberately.
From the inside, this feels like “knowing that you ‘shouldn’t’ do something (and sometimes even knowing that you’ll regret it later), but doing it anyway” or “throwing yourself away with abandon”. Often, there’s a sense of doing the dis-endorsed thing quickly, or while carefully not thinking much about it or deliberating about it: you need to do the thing before you convince yourself that you shouldn’t.
[[Research Questions]]
What is the relationship between [[metacognitive space]] and [[Rest]]?
What is the relationship between [[metacognitive space]] and [[Mental Energy]]?
In this interview, Eliezer says the following:
It caught my attention, because it’s a concise encapsulation of something that I already knew Eliezer thought, and which seems to me to be a crux between “man, we’re probably all going to die” and “we’re really really fucked”, but which I don’t myself understand.
So I’m taking a few minutes to think through it afresh now.
I agree that systems get to be very powerful by dint of their generality.
(There are some nuances around that: part of what makes GPT-4 and Claude so useful is just that they’ve memorized so much of the internet. That massive knowledge base helps make up for their relatively shallow levels of intelligence, compared to smart humans. But the dangerous/scary thing is definitely AI systems that are general enough to do full science and engineering processes.)
I don’t (yet?) see why generality implies having a stable motivating preference.
If an AI system is doing problem solving, that does definitely entail that it has a goal, at least in some local sense: It has the goal of solving the problem in question. But that level of goal is more analogous to the prompt given to an LLM than it is to a robust utility function.
I do have the intuition that creating an SEAI by training an RL agent on millions of simulated engineering problems is scary, because of reward specification problems of your simulated engineering problems. It will learn to hack your metrics.
But an LLM trained on next-token prediction doesn’t have that problem?
Could you use next token prediction to build a detailed world model, that contains deep abstractions that describe reality (beyond the current human abstractions), and then prompt it, to elicit those models?
Something like, you have the AI do next token prediction on all the physics papers, and all the physics time-series, and all the text on the internet, and then you prompt it to write the groundbreaking new physics result that unifies QM and GR, citing previously overlooked evidence.
I think Eliezer says “no, you can’t, because to discover deep theories like that requires thinking and not just “passive” learning in the ML sense of updating gradients until you learn abstractions that predict the data well. You need to generate hypotheses and test them.”
In my state of knowledge, I don’t know if that’s true.
Is that a crux for him? How much easier is the alignment problem, if it’s possible to learn superhuman abstractions “passively” like that?
I mean there’s still a problem that someone will build a more dangerous agent from components like that. And there’s still a problem that you can get world-altering technologies / world-destroying technologies from that kind of oracle.
We’re not out of the woods. But it would mean that building a superhuman SEAI isn’t an immediate death sentence for humanity.
I think I still don’t get it.
In my view, this is where the Omohundro Drives come into play.
Having any preference at all is almost always served by an instrumental preference of survival as an agent with that preference.
Once a competent agent is general enough to notice that (and granting that it has a level of generality sufficient to require a preference), then the first time it has a preference, it will want to take actions to preserve that preference.
This seems possible to me. Humans have plenty of text in which we generate new abstractions/hypotheses, and so effective next-token prediction would necessitate forming a model of that process. Once the AI has human-level ability to create new abstractions, it could then simulate experiments (via e.g. its ability to predict python code outputs) and cross-examine the results with its own knowledge to adjust them and pick out the best ones.
Sorry, what’s the difference between these two positions? Is the second one meant to be a more extreme version of the first?
Yes.
In Section 1 of this post I make an argument kinda similar to the one you’re attributing to Eliezer. That might or might not help you, I dunno, just wanted to share.
Does anyone know of a good technical overview of why it seems hard to get Whole Brain Emulations before we get neuromorphic AGI?
I think maybe I read a PDF that made this case years ago, but I don’t know where.
I haven’t seen such a document but I’d be interested to read it too. I made an argument to that effect here: https://www.lesswrong.com/posts/PTkd8nazvH9HQpwP8/building-brain-inspired-agi-is-infinitely-easier-than
(Well, a related argument anyway. WBE is about scanning and simulating the brain rather than understanding it, but I would make a similar argument using “hard-to-scan” and/or “hard-to-simulate” things the brain does, rather than “hard-understand” things the brain does, which is what I was nominally blogging about. There’s a lot of overlap between those anyway; the examples I put in mostly work for both.)
Great. This post is exactly the sort of thing that I was thinking about.
There’s a psychological variable that seems to be able to change on different timescales, in me, at least. I want to gesture at it, and see if anyone can give me pointers to related resources.
[Hopefully this is super basic.]
There a set of states that I occasionally fall into that include what I call “reactive” (meaning that I respond compulsively to the things around me), and what I call “urgy” (meaning that that I feel a sort of “graspy” desire for some kind of immediate gratification).
These states all have some flavor of compulsiveness.
They are often accompanied by high physiological arousal, and sometimes have a burning / clenching sensation in the torso. These all have a kind of “jittery” feeling, and my attention jumps around, or is yanked around. There’s also a way in which this feels “high” on a spectrum, (maybe because my awareness is centered on my head?)
I might be tempted to say that something like “all of these states incline me towards neuroticism.” But that isn’t exactly right on a few counts. (For one thing, the reactions aren’t necessarily irrational, just compulsive.)
In contrast to this, there is another way that I can feel sometimes, which is more like “calm”, “anchored”, settled. It feels “deeper” or “lower” somehow. Things often feel slowed down. My attention can settle, and when it moves it moves deliberately, instead of compulsively. I expect that this correlates with low arousal.
I want to know...
Does this axis have a standardized name? In the various traditions of practice? In cognitive psychology or neuroscience?
Knowing the technical, academic name would be particularly great.
Do people have, or know of, efficient methods for moving along this axis, either in the short term or the long term?
This phenomenon could maybe be described as “length of the delay between stimulus and response”, insofar as that even makes sense, which is one of the benefits noted in the popular branding for meditation.
I remembered there was a set of audios from Eben Pagan that really helped me before I turned them into the 9 breaths technique. Just emailed them to you. They go a bit more into depth and you may find them useful.
I don’t know if this is what you’re looking for, but I’ve heard the variable you’re pointing at referred to as your level of groundedness, centeredness, and stillness in the self-help space.
There are all sorts of meditations, visualizations, and exercises aimed to make you more grounded/centered/still and a quick google search pulls up a bunch.
One I teach is called the 9 breaths technique.
Here’s another.
new (boring) post on controlled actions.
New post: Why does outlining my day in advance help so much?
This link (and the one for “Why do we fear the twinge of starting?”) is broken (I think it’s an admin view?).
(Correct link)
They should both be fixed now.
Thanks!
New post: some musings on deliberate practice
Thanks! I just read through a few of your most recent posts and found them all real useful.
Cool! I’d be glad to hear more. I don’t have much of a sense of which thing I write are useful or how.
Relating to the “Perception of Progress” bit at the end. I can confirm for a handful of physical skills I practice there can be a big disconnect between Perception of Progress and Progress from a given session. Sometimes this looks like working on a piece of sleight of hand, it feeling weird and awkward, and the next day suddenly I’m a lot better at it, much more than I was at any point in the previous days practice.
I’ve got a hazy memory of a breakdancer blogging about how a particular shade of “no progress fumbling” can be a signal that a certain about of “unlearning” is happening, though I can’t find the source to vet it.
I’ve decided that I want to to make more of a point to write down my macro-strategic thoughts, because writing things down often produces new insights and refinements, and so that other folks can engage with them.
This is one frame or lens that I tend to think with a lot. This might be more of a lens or a model-let than a full break-down.
There are two broad classes of problems that we need to solve: we have some pre-paradigmatic science to figure out, and we have have the problem of civilizational sanity.
Preparadigmatic science
There are a number of hard scientific or scientific-philosophical problems that we’re facing down as a species.
Most notably, the problem of AI alignment, but also finding technical solutions to various risks caused by bio-techinlogy, possibly getting our bearings with regards to what civilization collapse means and how it is likely to come about, possibly getting a handle on the risk of a simulation shut-down, possibly making sense of the large scale cultural, political, cognitive shifts that are likely to follow from new technologies that disrupt existing social systems (like VR?).
Basically, for every x-risk, and every big shift to human civilization, there is work to be done even making sense of the situation, and framing the problem.
As this work progresses it eventually transitions into incremental science / engineering, as the problems are clarified and specified, and the good methodologies for attacking those problems solidify.
(Work on bio-risk, might already be in this phase. And I think that work towards human genetic enhancement is basically incremental science.)
To my rough intuitions, it seems like these problems, in order of pressingness are:
AI alignment
Bio-risk
Human genetic enhancement
Social, political, civilizational collapse
…where that ranking is mostly determined by which one will have a very large impact on the world first.
So there’s the object-level work of just trying to make progress on these puzzles, plus a bunch of support work for doing that object level work.
The support work includes
Operations that makes the research machines run (ex: MIRI ops)
Recruitment (and acclimation) of people who can do this kind of work (ex: CFAR)
Creating and maintaining infrastructure that enables intellectually fruitful conversations (ex: LessWrong)
Developing methodology for making progress on the problems (ex: CFAR, a little, but in practice I think that this basically has to be done by the people trying to do the object level work.)
Other stuff.
So we have a whole ecosystem of folks who are supporting this preparadgimatic development.
Civilizational Sanity
I think that in most worlds, if we completely succeeded at the pre-paradigmatic science, and the incremental science and engineering that follows it, the world still wouldn’t be saved.
Broadly, one way or the other, there are huge technological and social changes heading our way, and human decision makers are going to decide how to respond to those changes, possibly in ways that will have very long term repercussions on the trajectory of earth-originating life.
As a central example, if we more-or-less-completely solved AI alignment, from a full theory of agent-foundations, all the way down to the specific implementation, we would still find ourselves in a world, where humanity has attained god-like power over the universe, which we could very well abuse, and end up with a much much worse future than we might otherwise have had. And by default, I don’t expect humanity to refrain from using new capabilities rashly and unwisely.
Completely solving alignment does give us a big leg up on this problem, because we’ll have the aid of superintelligent assistants in our decision making, or we might just have an AI system implement our CEV in classic fashion.
I would say that “aligned superintelligent assistants” and “AIs implementing CEV”, are civilizational sanity interventions: technologies or institutions that help humanity’s high level decision-makers to make wise decisions in response to huge changes that, by default, they will not comprehend.
I gave some examples of possible Civ Sanity interventions here.
Also, think that some forms of governance / policy work that OpenPhil, OpenAI, and FHI have done, count as part of this category, though I want to cleanly distinguish between pushing for object-level policy proposals that you’ve already figured out, and instantiating systems that make it more likely that good policies will be reached and acted upon in general.
Overall, this class of interventions seems neglected by our community, compared to doing and supporting preparadigmatic research. That might be justified. There’s reason to think that we are well equipped to make progress on hard important research problems, but changing the way the world works, seems like it might be harder on some absolute scale, or less suited to our abilities.
New (short) post: Desires vs. Reflexes
[Epistemic status: a quick thought that I had a minute ago.]
There are goals / desires (I want to have sex, I want to stop working, I want to eat ice cream) and there are reflexes (anger, “wasted motions”, complaining about a problem, etc.).
If you try and squash goals / desires, they will often (not always?) resurface around the side, or find some way to get met. (Why not always? What are the difference between those that do and those that don’t?) You need to bargain with them, or design outlet polices for them.
Reflexes on the other hand are strategies / motions that are more or less habitual to you. These you train or untrain.
new post: Intro to and outline of a sequence on a productivity system
I’m interested about knowing more about the meditation aspect and how it relates to productivity!
I’m currently running a pilot program that takes a very similar psychological slant on productivity and procrastination, and planning to write a sequence starting in the next week or so. It covers a lot of the same subjects, including habits, ambiguity or overwhelm aversion, coercion aversion, and creating good relationships with parts. Maybe we should chat!
Totally an experiment, I’m trying out posting my raw notes from a personal review / theorizing session, in my short form. I’d be glad to hear people’s thoughts.
This is written for me, straight out of my personal Roam repository. The formatting is a little messed up because LessWrong’s bullet don’t support indefinite levels of nesting.
This one is about Urge-y-ness / reactivity / compulsiveness
I don’t know if I’m naming this right. I think I might be lumping categories together.
Let’s start with what I know:
There are three different experiences, which might turn out to have a common cause, or which might turn out to be inssuficently differentiated
I sometimes experience a compulsive need to do something or finish something.
examples:
That time when I was trying to make an audiobook of Focusing: Learn from the Masters
That time when I was flying to Princeton to give a talk, and I was frustratedly trying to add photos to some dating app.
Sometimes I am anxious or agitated (often with a feeling in my belly), and I find myself reaching for distraction, often youtube or webcomics or porn.
Sometimes, I don’t seem to be anxious, but I still default to immediate gratification behaviors, instead of doing satisfying focused work ()”my attention like a plow, heavy with inertia, deep in the earth, and cutting forward”). I might think about working, and then deflect to youtube or webcomics or porn.
I think this has to do with having a thought or urge, and then acting on it unreflectively.
examples:
I think I’ve been like that for much of the past two days. [2019-11-8]
These might be different states, each of which is high on some axis: something like reactivity (as opposed to responsive) or impulsiveness or compulsiveness.
If so, the third case feels most pure. I think I’ll focus on that one first, and then see if anxiety needs a separate analysis.
Theorizing about non-anxious immediate gratification
What is it?
What is the cause / structure?
Hypotheses:
It might be that I have some unmet need, and the reactivity is trying to meet that need or cover up the pain of the unmet need.
This suggests that the main goal should be trying to uncover the need.
Note that my current urgeyness really doesn’t feel like it has an unmet need underlying it. It feels more like I just have a bad habit, locally. But maybe I’m not aware of the neglected need?
If it is an unmet need or a fear, I bet it is the feeling of overwelm. That actually matches a lot. I do feel like I have a huge number of things on my plate and even though I’m not feeling anxiety per se, I find myself bouncing off them.
In particular, I have a lot to write, but have also been feeling resistance to start on my writing projects, because there are so many of them and once I start I’ll have loose threads out and open. Right now, things are a little bit tucked away (in that I have outlines of almost everything), but very far from completed, in that I have hundreds of pages to write, and I’m a little afraid of loosing the content that feels kind of precariously balanced in my mind, and if I start writing I might loose some of it somehow.
This also fits with the data that makes me feel like a positive feedback attractor: when I can get moving in the right way, my overwhelm becomes actionable, and I fall towards effective work. When I can’t get enough momentum such that my effective system believes that I can deal with the overwhelm, I’ll continue to bounce off.
Ok. So under this hypothesis, this kind of thing is caused by an aversion, just like everything else.
This predicts that just meditating might or might not alleviate the urgeyness: it doesn’t solve the problem of the aversion, but it might buy me enough [[metacognitive space]] to not be flinching away.
It might be a matter of “short term habit”. My actions have an influence on my later actions: acting on urges causes me to be more likely to act on urges (and vis versa) so there can be positive feedback in both directions.
Rather than a positive thing, it might be better to think of it as the absence of a loaded up goal-chain.
Maybe this is the inverse of [[Productivity Momentum]]?
My takeaway from the above hypotheses is that the urgeness, in this case is either the result of an aversion, overwhelm aversion in particular, or it is an attractor state, due to my actions training a short term habit or action-propensity towards immediate reaction to my urges.
Some evidence and posits
I have some belief that this is more common when I have eaten a lot of sugar, but that might be wrong.
I had thought that exercise pushes against reactivity, but I strength trained pretty hard yesterday, and that didn’t seem to make much of a difference today.
I think maybe meditation helps on this axis.
I have the sense that self-control trains the right short term habits.
Things like meditation, or fasting, or abstaining from porn/ sex.
Waking up and starting work immediately
I notice that my leg is jumping right now, as if I’m hyped up or over-energized, like with a caffeine high.
How should I intervene on it?
background maintenance
Some ideas:
It helps to just block the distracting sites.
Waking up early and scheduling my day (I already know this).
Exercising?
Meditating?
It would be good if I could do statistical analysis on these.
Maybe I can use my toggl data and compare it to my tracking data?
What metric?
How often I read webcomics or watch youtube?
I might try both intentional, and unintentional?
How much deep work I’m getting done?
point interventions
some ideas
When I am feeling urgey, I should meditate?
When I’m feeling urgey, I should sit quietly with a notebook (no screens), for 20 minutes, to get some metacognition about what I care about?
When I’m feeling urgey, I should do focusing and try to uncover the unmet need?
When I’m feeling urgey, I should do 90 seconds of intense cardio?
Those first two feel the most in the right vein: the thing that needs to happen is that I need to “calm down” my urgent grabbiness, and take a little space for my deeper goals to become visible.
I want to solicit more ideas from people.
I want to be able to test these.
The hard part about that is the transition function: how do I make the TAP work?
I should see if somenone can help me debug this.
One thought that I have is to do a daily review every day, and to ask on the daily review if I missed any places where I was urgey: opportunities to try an intervention
New post: Some musings about exercise and time discount rates
[Epistemic status: a half-thought, which I started on earlier today, and which might or might not be a full thought by the time I finish writing this post.]
I’ve long counted exercise as an important component of my overall productivity and functionality. But over the past months my exercise habit has slipped some, without apparent detriment to my focus or productivity. But this week, after coming back from a workshop, my focus and productivity haven’t really booted up.
Here’s a possible story:
Exercise (and maybe mediation) expands the effective time-horizon of my motivation system. By default, I will fall towards attractors of immediate gratification and impulsive action, but after I exercise, I tend to be tracking, and to be motivated by, progress on my longer term goals. [1]
When I am already in the midst of work: my goals are loaded up and the goal threads are primed in short term memory, this sort of short term compulsiveness causes me to fall towards task completion: I feel slightly obsessed about finishing what I’m working on.
But if I’m not already in the stream of work, seeking immediate gratification instead drives me to youtube and web comics and whatever. (Although it is important to note that I did switch my non self tracking web usage to Firefox this week, and I don’t have my usual blockers for youtube and for SMBC set up yet. That might totally account for the effect that I’m describing here.)
In short, when I’m not exercising enough, I have less meta cognitive space for directing my attention and choosing what is best do do. But if I’m in the stream of work already, I need that meta cognitive space less: because I’ll default to doing more of what I’m working on. (Though, I think that I do end up getting obsessed with overall less important things, compared to when I am maintaining metacognitive space). Exercise is most important for booting up and setting myself up to direct my energies.
[1] This might be due to a number of mechanisms:
Maybe the physical endorphin effect of exercise has me feeling good, and so my desire for immediate pleasure is sated, freeing up resources for longer term goals.
Or maybe exercise involves engaging in intimidate discomfort for the sake of future payoff, and this shifts my “time horizon set point” or something. (Or maybe it’s that exercise is downstream of that change in set point.)
If meditation also has this time-horizon shifting effect, that would be evidence for this hypothesis.
Also if fasting has this effect.
Or maybe, it’s the combination of both of the above: engaging in delayed gratification, with a viscerally experienced payoff, temporarily retrains my motivation system for that kind of thing.)
Or something else.
Alternative hypothesis: maybe what expands your time horizon is not exercise and meditation per se, but the fact that you are doing several different things (work, meditation, exercise), instead of doing the same thing over and over again (work). It probably also helps that the different activities use different muscles, so that they feel completely different.
This hypothesis predicts that a combination of e.g. work, walking, and painting, could provide similar benefits compared to work only.
Well, my working is often pretty varied, while my “being distracted” is pretty monotonous (watching youtube clips), so I don’t think it is this one.
New post: Capability testing as a pseudo fire alarm
[epistemic status: a thought I had]
It seems like it would be useful to have very fine-grained measures of how smart / capable a general reasoner is, because this would allow an AGI project to carefully avoid creating a system smart enough to pose an existential risk.
I’m imagining slowly feeding a system more training data (or, alternatively, iteratively training a system with slightly more compute), and regularly checking its capability. When the system reaches “chimpanzee level” (whatever that means), you stop training it (or giving it more compute resources).
This might even be a kind of fire-alarm. If you have a known predetermined battery of tests, then when some lab develops a system that scores “at the chimp level” at that battery, that might be a signal to everyone, that it’s time to pool our resources and figure out safety. (Of course, this event might alternatively precipitate a race, as everyone tries to get to human-level first.)
Probably the best way to do this would be for both training data, and compute / architecture. Start with a given architecture, then train it, slowly increasing the amount or quality of the training data, with regular tests (done on “spurs”, the agent should never have episodic memory of the . When increasing training data plateaus, iteratively improve the architecture in some way, either by giving the systems it more compute resources, or maybe making small adjustment. Again train the new version of the system, with regular tests. If you ever start to get very steep improvement, slow down and run tests more frequently.
Naively, it seems like a setup like this would prevent an AI team from overshooting and making a system that is much more capable that they think (which gives rise to all kinds of problems, like treacherous turns), regardless of how close “chimp” is to “human” on some absolute intelligence scale.
Some considerations:
This of course, depends on having metrics that work. Of course, it seems very likely that early general AIs will be minds with a very unusual “shape”, such that it has sophisticated reasoning abilities along dimensions that are not natural categories to humans, while being weak on many of the dimensions we’re testing for.
Although, this seems a bit surprising, since presumably we would be training it to hit those metrics? Somehow it would get better, faster, at a kind of reasoning we were not intending to train, than at the kind of reasoning that we were optimizing for?
Also, can we just design metrics that get around this problem, by measuring exactly what we care about, i.e. danger? Something like a metric for consequentialist reasoning?
This idea depends on the system not deceiving you, or intentionally underperforming on your metrics.
Naively, this doesn’t seem like that much of an issue, if you are doing tests regularly enough. I’m presuming (perhaps wrongly), that a chimpanzee is not smart enough to infer that it is an agent in a simulation that is overseen by human agents, and that it should try to decide those humans. That is, we stop our progression before the agent is at the point where deception is a consideration.
Which means we really need to be confident about how smart something needs to be before it tries to deceive us?
Really? Its first attempts at tricking humans will be pretty transparent. Just as the attempts of animals / babies to trick us are pretty transparent.
At least one “danger point”, is when the system is capable enough to realize the instrumental value of self improving by seizing more resources.
How smart is this?
My guess, is really smart. Animals come pre-loaded with all kinds of instincts that cause them to seek out food, water, etc. These AI systems would not have an instinct to seek more training data / computation. Most humans don’t reason their way into finding ways to improve their own reasoning. If there was a chimp, even loose in the internet (whatever that means), would it figure out to make itself smarter?
If the agent has experienced (and has memories of) rounds of getting smarter, as the humans give it more resources, and can identify that these improvements allow it to get more of what it wants, it might instrumentally reason that it should figure out how to get more compute / training data. But it seems easy to have a setup such that no system has episodic memories previous improvement rounds.
[Note: This makes a lot less sense for an agent of the active inference paradigm]
Could I salvage it somehow? Maybe by making some kind of principled distinction between learning in the sense of “getting better at reasoning” (procedural), and learning in the sense of “acquiring information about the environment” (episodic).
In There’s No Fire Alarm for Artificial General Intelligence Eliezer argues:
If I have a predetermined set of tests, this could serve as a fire alarm, but only if you’ve successfully built a consensus that it is one. This is hard, and the consensus would need to be quite strong. To avoid ambiguity, the test itself would need to be demonstrably resistant to being clever Hans’ed. Otherwise it would be just another milestone.
I very much agree.
Sometime people talk about advanced AIs “boiling the oceans”. My impression is that there’s some specific model for why that is plausible outcome (something about energy and heat dispensation?), and it’s not just a random “big change.”
What is that model? Is there existing citations for the idea, including LessWrong posts?
Roughly, Earth average temperature:
jσ1/4
Where j is dissipating power per area and sigma is Stephan-Boltzmann constant.
We can estimate j as
Gsc×πR2Earth4πR2Earth×(1−albedo)
Where GSC is a solar constant 1361 W/m^2. We take all incoming power and divide it by Earth surface area. Earth albedo is 0.31.
After substitution of variables, we get Earth temperature 254K (-19C), because we ignore greenhouse effect here.
How much humanity power consumption contributes to direct warming? In 2023 Earth energy consumption was 620 exajoules (source: first link in Google), which is 19TW. Modified rough estimation of Earth temperature is:
jsolar+JhumanSEarthσ1/4
Human power production per square meter is, like, 0.04W/m^2, which gives us approximately zero effect of direct Earth heating on Earth temperature. But what happens if we, say, increase power by factor x1000? We are going to get increase of Earth temperature to 264K, by 10K, again, we are ignoring greenhouse effect. But qualitatively, increasing power consumption x1000 is likely to screw the biosphere really hard, if we count increasing amount of water vapor, CO2 from water and methane from melting permafrost.
How is it realistic to get x1000 increase in power consumption? Well, @Daniel Kokotajlo at least thought that we are likely to get it somewhere in 2030s.
The power density of nanotech is extremely high (10 kW/kg), so it only takes 16 kilograms of active nanotech per person * 10 billion people to generate enough waste heat to melt the polar ice caps. Literally boiling the oceans should only be a couple more orders of magnitude, so it’s well within possible energy demand if the AIs can generate enough energy. But I think it’s unlikely they would want to.
Source: http://www.imm.org/Reports/rep054.pdf
I don’t know of an existing citation.
My understanding is that here is enough energy generable via fusion that if you did as much fusion as possible on earth, the oceans would boil. Or more minimally, earth would be uninhabitable by humans living as they currently do. I think this holds even if you just fuse lighter elements which are relatively easy to fuse. (As in, just fusing hydrogen.)
Of course, it would be possible to avoid doing this on earth and instead go straight to a dyson swarm or similar. And, it might be possible to dissipate all the heat away from earth though this seems hard and not what would happen in the most efficient approach from my understanding.
I think if you want to advance energy/compute production as fast as possible, boiling the oceans makes sense for a technologically mature civilization. However, I expect that boiling the oceans advances progress by no more than several years and possibly much, much less than that (e.g. days or hours) depending on how quickly you can build a dyson sphere and an industrial base in space. My current median guess would be that it saves virtually no time (several days), but a few months seems plausible.
Overall, I currently expect the oceans to not be boiled because:
It saves only a tiny amount of time (less than several years, probably much less). So, this is only very important if you are in an conflict or you are very ambitious in resource usage and not patient.
Probably humans will care some about not having the oceans boiled and I expect human preferences to get some weight even conditional on AI takeover.
I expect that you’ll have world peace (no conflict) by the time you have ocean boiling technology due to improved coordination/negotiation/commitment technology.
Build enough nuclear power plants and we could boil the oceans with current tech, yeah? They’re a significant fraction of fusion output iiuc?
Not quite, there is a finite quantity of fissiles. IIRC it’s only an order of magnitude of energy more than fossil fuel reserves.
How do you use a correlation coefficient to do a Bayesian update?
For instance, the wikipedia page on the Heritability of IQ reads:
“The mean correlation of IQ scores between monozygotic twins was 0.86, between siblings 0.47, between half-siblings 0.31, and between cousins 0.15.”
I’d like to get an intuitive sense of what those quantities actually mean, “how big” they are, how impressed I should be with them.
I imagine I would do that by working out a series of examples. Examples like...
If I know that Alice has has an IQ of 120, what does that tell me about the IQ of her twin sister Beth? (What should my probability distribution for Beth’s IQ be, after I condition on Alice’s 120 IQ and the 0.86 correlation?) And how does that contrast with what I know about her younger brother Carl?
What if instead, Alice has an IQ of 110? How much does that change what I know about Beth and Carl?
How do I do this kind of computation?
[I’m aware that herritability is a very misleading concept, because as defined, it varies with changes in environmental conditions. I’m less interested in heritability of IQ, in particular, at the moment, and more in the general conversion from correlation to Bayes.]
In theory, you can use measured correlation to rule out models that predict the measured correlation to be some other number. In practice this is not very useful because the space of all possible models is enormous. So what happens in practice is that we make some enormously strong assumptions that restrict the space of possible models to something manageable.
Such assumptions may include: that measured IQ scores consist of some genetic base plus some noise from other factors including environmental factors and measurement error. We might further assume that the inherited base is linear in contributions from genetic factors with unknown weights, and the noise is independent and normally distributed with zero mean and unknown variance parameter. I’ve emphasized some of the words indicating stronger assumptions.
You might think that these assumptions are wildly restrictive and unlikely to be true, and you would be correct. Simplified models are almost never true, but they may be useful nonetheless because we have bounded rationality. So there is now a hypothesis A: “The model is adequate for predicting reality”.
Now that you have a model with various parameters, you can do Bayesian updates to update distributions for parameters—that is the hypotheses “A and (specific parameter values)”—and also various alternative “assumption failure” hypotheses. In the given example, we would very quickly find overwhelming evidence for “the noise is not independent”, and consequently employ our limited capacity for evaluation on a different class of (probably more complex) models.
This hasn’t actually answered your original question “what does that tell me about the IQ of her twin sister Beth?”, because in the absence of a model it tells you essentially nothing. There exist distributions for the conditional distributions of twin IQ (I1,I2) that have a correlation coefficient 0.86 and yield any distribution you like for I1 given I2 = 120. We can rule most of them out on more or less vague grounds of being “biologically implausible”, but not purely from a mathematical perspective.
But let’s continue anyway.
First, we need to know more about the circumstances in which we arrived at this situation, where we knew Alice’s IQ and not Beth’s. Is this event likely to have been dependent in any significant way upon their IQs, or the ordering thereof? Let’s assume not, because that’s simpler. E.g. we just happened to pick some twin pair out of the world and found out one of their IQs at random but not yet the other.
Then maybe we could use a model like the one I introduced, where the IQs I1 and I2 of twins are of the form
I_k = S + e_k,
where S is some shared “predisposition” which is normally distributed, and the noise terms e_k are independent and normally distributed with zero mean and common variance. Common genetics and (usually) common environment would influence S, while individual variations and measurement errors would be considered in the e_k.
Now, this model is almost certainly wrong in important ways. In particular the assumption of independent additivity doesn’t have any experimental evidence for it, and there doesn’t seem to be any reason to expect it to hold (especially for a curve-fitted statistic like IQ). Nonetheless, it’s worth investigating one of the simplest models.
There is some evidence that the distribution of IQ for twins is slightly different from that for the general population, but probably by less than 1 IQ point so it’s fairly safe to assume that both I_1 and I_2 have mean close to 100 and standard deviation close to 15. In this simple model, the correlation coefficient of the population is just var(S) / 15^2, and so if the study was conducted well enough to accurately measure the population correlation coefficient, then we should conclude that standard deviations are near 13.9 for S and 5.6 for e_k.
Now we can look at the distribution of (unknown) S and e_1 that could result in I_1 = 120. Each of these are normally distributed and so the conditional distribution for the components of the sum is also normally distributed, with E[S | I_1 = 120] = 100 + 20 * var(S) / 15^2 and E[e_1 | I_1 = 120] = 20 * var(e_1) / 15^2.
So in this case, the conditional distribution for S will be centered on 117.2. This differs from the mean by a factor of 0.86 of the difference between I_1 and the mean, which is just the correlation coefficient r. The conditional variance for S is √(1-r) times the unconditional variance, so about 5.2.
Now you have enough information to calculate a conditional distribution for Beth. The expected conditional distribution for her IQ would (under this model) be normally distributed with mean ≅ 117.2 and standard deviation 15 √(1 - r^2) ≅ 7.6.
Therefore to the extent that you have credence in this model and the studies estimating those correlations you could expect about a 70% chance for her IQ to be in the range 110 to 125.
Similar calculations for Carl lead to a lower and wider distribution with a 70% range more like 96 to 123.
The corresponding range for cousin Dominic’s distribution would be 88 to 118, almost the same as you might expect for a completely random person (85 to 115).
I remember reading a thread on Facebook, where Eliezer and Robin Hanson were discussing the implications of the Alpha Go (or Alpha Zero) on the content of the AI foom debate, and Robin made an analogy to Linear Regression as one thing that machines can do better than humans, but which doesn’t make them super-human.
Does anyone remember what I’m talking about?
Maybe this? (There are a few subthreads on that post that mention linear regression.)
Question: Have Moral Mazes been getting worse over time?
Could the growth of Moral Mazes be the cause of cost disease?
I was thinking about how I could answer this question. I think that the thing that I need is a good quantitative measure of how “mazy” an organization is.
I considered the metric of “how much output for each input”, but 1) that metric is just cost disease itself, so it doesn’t help us distinguish the mazy cause from other possible causes, 2) If you’re good enough at rent seeking maybe you can get high revenue despite you poor production.
What metric could we use?
This is still a bit superficial/goodharty, but I think “number of layers of hierarchy” is at least one thing to look at. (Maybe find pairs of companies that output comparable products that you’re somehow able to measure the inputs and outputs of, and see if layers of management correlate with cost disease)
This is my current take about where we’re at in the world:
Deep learning, scaled up, might be basically enough to get AGI. There might be some additional conceptual work necessary, but the main difference between 2020 and the year in which we have transformative AI is that in that year, the models are much bigger.
If this is the case, then the most urgent problem is strong AI alignment + wise deployment of strong AI.
We’ll know if this is the case in the next 10 years or so, because either we’ll continue to see incredible gains from increasingly bigger Deep Learning systems or we’ll see those gains level off, as we start seeing decreasing marginal returns to more compute / training.
If deep learning is basically not sufficient, then all bets are off. In that case, it isn’t clear when transformative AI will arrive.
This may shift meaningfully shift priorities, for two reasons:
It may mean that some other countdown will reach a critical point before the “AGI clock” does. Genetic engineering, or synthetic biology, or major geopolitical upheaval (like a nuclear war), or some strong form of civilizational collapse will upset the game-board before we get to AGI.
There is more time to pursue “foundational strategies” that only pay off in the medium term (30 to 100 years). Things like, improving the epistemic mechanism design of human institutions, including governmental reform, human genetic engineering projects, or plans to radically detraumatize large fractions of the population.
This suggests to me that I should, in this decade, be planning and steering for how to robustly-positively intervene on the AI safety problem, while tracking the sideline of broader Civilizational Sanity interventions, that might take longer to payoff. While planning to reassess every few years, to see if it looks like we’re getting diminishing marginal returns to Deep Learning yet.
(This question is only related to a small point)
You write that one possible foundational strategy could be to “radically detraumatize large fractions of the population”. Do you believe that
A large part of the population is traumatized
That trauma is reversible
Removing/reversing that trauma would improve the development of humanity drastically?
If yes, why? I’m happy to get a 1k page PDF thrown at me.
I know that this has been a relatively popular talking point on twitter, but without a canonical resource, and I also haven’t seen it discussed on LW.
I was wondering if I would get comment on that part in particular. ; )
I don’t have a strong belief about your points one through three, currently. But it is an important hypothesis in my hypothesis space, and I’m hoping that I can get to the bottom of it in the next year or two.
I do confidently think that one of the “forces for badness” in the world is that people regularly feel triggered or threatened by all kinds of different proposals, reflexively act to defend themselves. I think this is among the top three problems in having good discourse and cooperative politics. Systematically reducing that trigger response would be super high value, if it were feasible.
My best guess is that that propensity to be triggered is not mostly the result of infant or childhood trauma. It seems more parsimonious to posit that it is basic tribal stuff. But I could imagine it having its root in something like “trauma” (meaning it is the result of specific experiences, not just general dispositions, and it is practically feasible, if difficult, to clear or heal the underlying problem in a way completely prevents the symptoms).
I think there is no canonical resource on trauma-stuff because 1) the people on twitter are less interested on average, in that kind of theory building than we are on lesswong and 2) because mostly those people are (I think) extrapolating from their own experience, in which some practices unlocked subjectively huge breakthroughs in personal well-being / freedom of thought and action.
Does that help at all?
I plan to blog more about how I understand some of these trigger states and how it relates to trauma. I do think there’s a decent amount of written work, not sure how “canonical”, but I’ve read some great stuff that from sources I’m surprised I haven’t heard more hype about. The most useful stuff I’ve read so far is the first three chapters of this book. It has hugely sharpened my thinking.
I agree that a lot of trauma discourse on our chunk of twitter is more for used on the personal experience/transformation side, and doesn’t let itself well to bigger Theory of Change type scheming.
http://www.traumaandnonviolence.com/chapter1.html
Thanks for the link! I’m going to take a look!
Yes, it definitely does–you just created the resource I will will link people to. Thank you!
Especially the third paragraph is cruxy. As far as I can tell, there are many people who have (to some extent) defused this propensity to get triggered for themselves. At least for me, LW was a resource to achieve that.
I was thinking lately about how there are some different classes of models of psychological change, and I thought I would outline them and see where that leads me.
It turns out it led me into a question about where and when Parts-based vs. Association-based models are applicable.
Google Doc version.
Parts-based / agent-based models
Some examples:
Focusing
IFS
IDC
Connection Theory
The NLP ecological check
This is the frame that I make the most use of, in my personal practice. It assumes that all behavior is the result of some goal directed subprocess in you (or parts), that is serving one of your needs. Sometimes parts adopt strategies that are globally harmful or cause problems, but those strategies are always solving or mitigating (if only barely) some problem of yours.
Some parts based approaches are pretty adamant about the goal directed-ness of all behavior.
For instance, I think (though I’m not interested in trying to find the quote right now), Self therapy, a book on IFS, states that all behavior is adaptive in this way. Nothing is due to habit. And the original Connection Theory document says the same.
Sometimes these parts can conflict with each other, or get in each other’s way, and you might engage in behavior that is far from optimal, different parts encat different behaviors (for instance, procrastination typically involves a part that is concerned about some impending state of the world, while another part of you, anticipating the psychological pain of consciously facing up to that bad possibility,
Furthermore, these parts are reasonably intelligent, and can update. If you can provide them a solution to the problem that they are solving, that is superior (by the standards of the part) than its current strategy, then it will immediately adopt that new strategy instead. This is markedly different from a model under which unwanted behaviors are “bad habits” that are mindfully retrained.
Association-based models
Examples:
TAPs
NLP anchoring
Lots of CBT and Mindfulness based therapy (eg “notice
Reinforcement learning / behavioral shaping
Tony Robbins’ “forming new neuro associations”
In contrast there is another simple model of the mind, that mostly operates with an ontology of simple (learned) association, instead of intelligent strategies. That is, it thinks of your behavior, including your emotional responses, mostly as habits, or stimulus response patterns, that can be trained or untrained.
For instance, say you have a problem of road rage. In the “parts” frame, you might deal with anger by dialoguing with with the anger, finding out what the anger is protecting, own or ally with that goal, and then find an alternative strategy that meets that goal without the anger. In the association frame, you might gradually retrain the anger response, by mindfully noticing as it arises, and then letting it go. Overtime, you’ll gradually train a different emotional reaction to the formerly rage-inducing stimulus.
Or, if you don’t want to wait that long, you might use some NLP trick to rapidly associate a new emotional pattern to a given stimulus, so that instead of feeling anger, you feel calm. (Or instead of feeling anxious jealousy, you feel loving abundant gratitude.)
This association process can sometimes be pretty dumb, such a skilled manipulater might cause you to associate a mental state like guilt or gratitude with tap on the shoulder, so that everytime you are tapped on the shoulder you return to the mental state. That phenomenon does not seem consistent with a naive form of the parts-based model.
And notably, an association model predicts that merely offering an alternative strategy (or frame) to a part doesn’t immediately or permanently change the behavior: you expect to have some “hold over” from the previous strategy because those associations will still fire. You have to clear them out somehow.
And this is my experience some of the time: sometimes, particularly with situations that have had a lot of emotional weight for me, I will immediately fall into old emotional patterns, even when I (or at least part part of me) has updated away from the beliefs that made that reaction relevant. For instance, I fall in love with a person because I have some story / CT path about how we are uniquely compatible, I gradually learn that this isn’t true, but I still have a strong emotional reaction when they walk into the room. What’s going on here? Some part of me isn’t updating, for some reason? It sure seems like some stimuli are activating old patterns even if those patterns aren’t adaptive and don’t even make sense in context. But this seems to suggest less intelligence on the part of my parts, it seems more like stimulus response machinery.
And on the other side, what’s happening when Tony Robins is splashing water in people’s faces to shake them out of their patterns? From a parts-based perspective, that doesn’t make any sense. Is the sub agent in question being permanently disrupted? (Or maybe you only have to disrupt it for a bit, to give space for a new association / strategy to take hold? And then after that the new strategy outcompetes the old one?)
[Big Question: how does the parts-based model interact with the associations-based model?
Is it just that human minds do both? What governs when which phenomenon applies?
When should I use which kind of technique?]
Narrative-based / frame-based models
Examples:
Transforming Yourself Self concept work
NLP reframing effects
Some other CBT stuff
Katy Byran’s the Work
Anything that involves reontologizing
A third category of psychological intervention are those that are based around narrative: you find, and “put on” a new way of interpreting, or making sense of, your experience, such that it has a different meaning that provides you different affordances. Generally you find a new narrative that is more useful for you.
The classic example is a simple reframe, where you feel frustrated that people keep mooching off of you, but you reframe this so that you instead feel magnanimous, emphasizing your generosity, and how great it is to have an opportunity to give back to people. Same circumstances, different story about them.
This class of interventions feels like it can slide easily into either the parts based frame or the association based frame. In the parts based frame, a narrative can be thought of as just another strategy that a part might adopt so long as that is the best way that the part can solve its problem (and so long as other parts don’t conflict).
But I think this fits even more naturally into the association frame, where you find a new way to conceptualize your situation and you do some work to reassociate that new conceptualization with the stimulus that previously activated your old narrative (this is exactly what Phil of Philosophical Counseling’s process does: you find a new narrative / belief structure and set up a regime under which you noticed when the old one arises, let it go, and feel into the new one.)
[Other classes of intervention that I am distinctly missing?]
I like this a lot, and think it’d make a good top level post.
Really? I would prefer to have something much more developed and/or to have solved my key puzzle here before I put as a top level post.
I saw the post more as giving me a framework that was helping for sorting various psych models, and the fact that you had one question about it didn’t actually feel too central for my own reading. (Separately, I think it’s basically fine for posts to be framed as questions rather than definitive statements/arguments after you’ve finished your thinking)
I wonder how the ancient schools of psychotherapy would fit here. Psychoanalysis is parts-based. Behaviorism is association-based. Rational therapy seems narrative-based. What about Rogers or Maslow?
Seems to me that Rogers and the “think about it seriously for 5 minutes” technique should be in the same category. In both cases, the goal is to let the client actually think about the problem and find the solution for themselves. Not sure if this is or isn’t an example of narrative-based, except the client is supposed to find the narrative themselves.
Maslow comes with a supposed universal model of human desires and lets you find yourself in that system. Jung kinda does the same, but with a mythological model. Sounds like an externally provided narrative. Dunno, maybe the narrative-based should be split into more subgroups, depending on where the narrative comes from (a universal model, an ad-hoc model provided by the therapist, an ad-hoc model constructed by the client)?
The way I have been taught NLP, you usually don’t use either anchors or an ecological check but both.
Behavior changes that are created by changing around anchors are not long-term stable when they violate ecology.
Changing around associations allows to create new strategies in a more detailed way then you get by just doing parts work and I have the impression that it’s often faster in creating new strategies.
(A) Interventions that are about resolving traumas feel to me like a different model.
(B) None of the three models you listed address the usefulness of connecting with the felt sense of emotions.
(C) There’s a model of change where you create a setting where people can have new behavioral experiences and then hopefully learn from those experiences and integrate what they learned in their lives.
CFAR’s goal of wanting to give people more agency about ways they think seems to work through C where CFAR wants to expose people to a bunch of experiences where people actually feel new ways to affect their thinking.
In the Danis Bois method both A and C are central.
Can someone affiliated with a university, ect. get me a PDF of this paper?
https://psycnet.apa.org/buy/1929-00104-001
It is on Scihub, but that version is missing a few pages in which they describe the methodology.
[I hope this isn’t an abuse of LessWrong.]
time for a new instance of this?
https://www.lesswrong.com/posts/4sAsygakd4oCpbEKs/lesswrong-help-desk-free-paper-downloads-and-more-2014
New (image) post: My strategic picture of the work that needs to be done
I edited the image into the comment box, predicting that the reason you didn’t was because you didn’t know you could (using markdown). Apologies if you prefer it not to be here (and can edit it back if so)
In this case it seems fine to add the image, but I feel disconcerted that mods have the ability to edit my posts.
I guess it makes sense that the LessWrong team would have the technical ability to do that. But editing a users post, without their specifically asking, feels like a pretty big breach of… not exactly trust, but something like that. It means I don’t have fundamental control over what is written under my name.
That is to say, I personally request that you never edit my posts, without asking (which you did, in this case) and waiting for my response. I furthermore, I think that should be a universal policy on LessWrong, though maybe this is just an idiosyncratic neurosis of mine.
Understood, and apologies.
A fairly common mod practice has been to fix typos and stuff in a sort of “move first and then ask if it was okay” thing. (I’m not confident this is the best policy, but it saves time/friction, and meanwhile I don’t think anyone had had an issue with it). But, your preference definitely makes sense and if others felt the same I’d reconsider the overall policy.
(It’s also the case that adding an image is a bit of a larger change than the usual typo fixing, and may have been more of an overstep of bounds)
In any case I definitely won’t edit your stuff again without express permission.
Cool.
: )
If it’s not just you, it’s at least pretty rare. I’ve seen the mods “helpfully” edit posts several times (without asking first) and this is the first time I’ve seen anyone complain about it.
I knew that I could, and didn’t, because it didn’t seem worth it. (Thinking that I still have to upload it to a third party photo repository and link to it. It’s easier than that now?)
In this case your blog already counted as a third party repository.
New post: Napping Protocol
Some of these seem likely to generalize and some seem likely to be more specific.
Curious about your thoughts “best experimental approaches to figuring out your own napping protocol.”
Doing actual mini-RCTs can be pretty simple. You only need 3 things:
1. A spreadsheet
2. A digital coin for randomization
3. A way to measure the variable that you care about
I think one of practically powerful “techniques” of rationality is doing simple empirical experiments like this. You want to get something? You don’t know how to get it? Try out some ideas and check which ones work!
There are other applications of empiricism that are not as formal, and sometimes faster. Those are also awesome. But at the very least, I’ve found that doing mini-RCTs is pretty enlightening.
On the object level, you can learn what actually works for hitting your goals.
On the process level, this trains some good epistemic norms and priors.
For one thing, I now have a much stronger intuition for the likelihood that an impressive effect is just noise. And getting into the habit of doing quantified hypothesis testing, such that you can cleanly falsify your hypotheses, teaches you to hold hypotheses lightly while inclining you to generate hypotheses in the first place.
Theorizing methods can enhance and accelerate this process, but if you have a quantified empirical feedback loop, your theorizing will be grounded. Science is hard, and most of our guesses are wrong. But that’s fine, so long as we actually check.
New (unedited) post: The bootstrapping attitude
Is there a LessWrong article that unifies physical determinism and choice / “free will”? Something about thinking of yourself as the algorithm computed on this brain?
Perhaps This one?
New (unedited) post: Exercise and nap, then mope, if I still want to
New post: _Why_ do we fear the twinge of starting?
Is there any particular reason why I should assign more credibility to Moral Mazes / Robert Jackall than I would to the work of any other sociologist?
(My prior on sociologists is that they sometimes produce useful frameworks, but generally rely on subjective hard-to-verify and especially theory-laden methodology, and are very often straightforwardly ideologically motivated.)
I imagine that someone else could write a different book, based on the same kind of anthropological research, that highlights different features of the corporate world, to tell the opposite story.
And that’s without anyone trying to be deceptive. There’s just a fundamental problem of case studies that they don’t tell you what’s typical, only give you examples.
I can totally imagine that Jackall landed on this narrative somehow, found that it held together and just confirmation biased for the rest of his career. Once his basic thesis was well-known, and associated with his name, it seems hard for something like that NOT to happen.
And this leaves me unsure what to do with the data of Moral Mazes. Should I default assume that Jackall’s characterization is a good description of the corporate world? Or should I throw this out as a useless set of examples confirmation biased together? Or something else?
It seems like the question of “is the most of the world dominated by Moral Mazes?” is an extremely important one. But also, its seems to me that it’s not operationalized enough to have a meaningful answer. At best, it seems like this is a thing that happens sometimes.
My own take is that moral mazes should be considered in the “interesting hypothesis” stage, and that the next step is to actually figure out how to go be empirical about checking it.
I made some cursory attempts at this last year, and then found myself unsure this was even the right question. The core operationalization I wanted was something like:
Does having more layers of management introduce pathologies into an organization?
How much value is generated by organizations scaling up?
Can you reap the benefits of organizations scaling up by instead having them splinter off?
(The “middle management == disconnected from reality == bad” hypothesis was the most clear-cut of the moral maze model to me, although I don’t think it was the only part of the model)
I have some disagreements with Zvi about this.
I chatted briefly with habryka about this and I think he said something like “it seems like a more useful question is to look for positive examples of orgs that work well, rather than try and tease out various negative ways orgs could fail to work.”
I think there are maybe two overarching questions this is all relevant to:
How should the rationality / xrisk / EA community handle scale? Should we be worried about introducing middle-management into ourselves?
What’s up with civilization? Is maziness a major bottleneck on humanity? Should we try to do anything about it? (My default answer here is “there’s not much to be done here, simply because the world is full of hard problems and this one doesn’t seem very tractable even if the models are straightforwardly true.” But, I do think this is a contender for humanity hamming problem)
There are multiple dimensions to the credibility question. You probably should increase your credence from prior to reading it/about it that large organizations very often have more severe misalignment than you thought. You probably should recognize that the model of middle-management internal competition has some explanatory power.
You probably should NOT go all the way to believing that the corporate world is homogeneously broken in exactly this way. I don’t think he makes that claim, but it’s what a lot of readers seem to take. There’s plenty of variation, and the Anna Karenina principle applies (paraphrased): well-functioning organizations are alike; disfunctional organizations are each broken in their own way. But really, it’s wrong too—each group is actually distinct, and has distinct sets of forces that have driven it to whatever pathologies or successes it has. Even when there are elements that appear very similar, they have different causes and likely different solutions or coping mechanisms.
“is most of the world dominated by moral mazes”? I don’t think this is a useful framing. Most groups have some elements of Moral Mazes. Some groups appear dominated by those elements, in some ways. From the outside, most groups are at least somewhat effective at their stated mission, so the level of domination is low enough that it hasn’t killed them (though there are certainly “zombie orgs” which HAVE been killed, but don’t know it yet).
My understanding is that there was a 10 year period starting around 1868, in which South Carolina’s legislature was mostly black, and when the universities were integrated (causing most white students to leave), before the Dixiecrats regained power.
I would like to find a relatively non-partisan account of this period.
Anyone have suggestions?
I would just read W. E. B. Du Bois—Black Reconstruction in America (1935)
When is an event surprising enough that I should be confused?
Today, I was reading Mistakes with Conservation of Expected Evidence. For some reason, I was under the impression that the post was written by Rohin Shah; but it turns out it was written by Abram Demski.
In retrospect, I should have been surprised that “Rohin” kept talking about what Eliezer says in the Sequences. I wouldn’t have guessed that Rohin was that “culturally rationalist” or that he would be that interested in what Eliezer wrote in the sequences. And indeed, I was updating that Rohin was more of a rationalist, with more rationalist interests, than I had thought. If I had been more surprised, I could have noticed my surprise / confusion, and made a better prediction.
But on the other hand, was my surprise so extreme that it should have triggered an error message (confusion), instead of merely an update? Maybe this was just fine reasoning after all?
From a Bayesian perspective, I should have observed this evidence, and increased my credence in both Rohin being more rationalist-y than I thought, and also in the hypothesis that this wasn’t written by Rohin. But practically, I would have needed to generate the second hypothesis, and I don’t think that I had strong enough reason to.
I feel like there’s a semi-interesting epistemic puzzle here. What’s the threshold for a surprising enough observation that you should be confused (much less notice your confusion)?
Surprise and confusion are two different things[1], but surprise usually goes along with confusion. I think it’s a good rationalist skill-to-cultivate to use “surprise” as a trigger to practice noticing confusion, because you don’t get many opportunities to do that. I think for most people this is worth doing for minor surprises, not so much because you’re that likely to need to do a major update, but because it’s just good mental hygiene/practice.
Surprise is “an unlikely thing happened.” Confusion is “a thing I don’t have a good explanation for happened.”
What was the best conference that you every attended?
IDEC—International Democratic Education Conference—it’s hosted by a democratic school in a different country each year, so I attended when my school was hosting (it was 2 days in our school and then 3 more days somewhere else). It was very open, had very good energy, had great people which I got to meet (and since it wasn’t too filled with talks actually got the time to talk to) - and oh, yeah, also a few good talks :)
If you have any more specific questions I’d be happy to answer.
I recall a Chriss Olah post in which he talks about using AIs as a tool for understanding the world, by letting the AI learn, and then using interpretability tools to study the abstractions that the AI uncovers.
I thought he specifically mentioned “using AI as a microscope.”
Is that a real post, or am I misremembering this one?
https://www.lesswrong.com/posts/X2i9dQQK3gETCyqh2/chris-olah-s-views-on-agi-safety
Are there any hidden risks to buying or owning a car that someone who’s never been a car owner might neglect?
I’m considering buying a very old (ie from the 1990s), very cheap (under $1000, ideally) minivan, as an experiment.
That’s inexpensive enough that I’m not that worried about it completely breaking down on me. I’m willing to just eat the monetary cost for the information value.
However, maybe there are other costs or other risks that I’m not tracking, that make this a worse idea.
Things like
- Some ways that a car can break make it dangerous, instead of non-functional.
- Maybe if a car breaks down in the middle of route 66, the government fines you a bunch?
- Something something car insurance?
Are there other things that I should know? What are the major things that one should check for to avoid buying a lemon?
Assume I’m not aware of even the most drop-dead basic stuff. I’m probably not.
(Also, I’m in the market for a minivan, or other car with 3 rows of seats. If you have an old car like that which you would like to sell, or if know someone who does, get in touch.
Do note that I am extremely price sensitive, but I would pay somewhat more than $1000 for a car, if I were confident that it was not a lemon.)
There are. https://www.iihs.org/ratings/driver-death-rates-by-make-and-model
You can explore the data yourself, but the general trend is that it appears there have been real improvements in crash fatality rates. Better designed structure, more and better airbags, stability control, and now in some new vehicles automatic emergency braking is standard.
Generally a bigger vehicle like a minivan is safer, and a newer version of that minivan will be safer, but you just have to go with what you can afford.
Main risk is simply that at this price point that minivan is going to have a lot of miles, and it’s simply probability how long it will run until a very expensive major repair is needed. One strategy is to plan to junk the vehicle and get a similar ‘beater’ vehicle when the present one fails.
If you’re so price sensitive $1000 is meaningful, well, uh try to find a solution to this crisis. I’m not saying one exists, but there are survival risks to poverty.
Lol. I’m not impoverished, but I want to cheaply experiment with having a car. It isn’t worth it to spend throw away $30,000 on a thing that I’m not going to get much value from.
Ok but at the price point you are talking you are not going to have a good time.
Analogy: would you “experiment with having a computer” by grabbing a packard bell from the 1990s and putting an ethernet card in it so it can connect to the internet from windows 95?
Do you need the minivan form factor? As a vehicle in decent condition (6-10 years old, under 100k miles, from a reputable brand) is cheapest in the small car form factor.
Not spending $30,000 makes sense, but my impression from car shopping last year was that trying to get a good car for less than $7k was fairly hard. (I get the ‘willingness to eat the cost’ price point of $1k, but wanted to highlight that the next price point up was more like 10k than 30k.)
Depending on your experimentation goals, you might want to rent a a car rather than buy.
Most auto shops will do a safety/mechanical inspection for a small amount (usually in the $50-200 range, but be aware that the cheaper ones subsidize it by anticipating that they can sell you services to fix the car if you buy it).
However, as others have said, this price point is too low for your first car as a novice, unless you have a mentor and intend to spend a lot of time learning to maintain/fix. Something reliable enough for you to actually run the experiment and get the information you want about the benefits vs frustrations of owning a car is going to run probably $5-$10K, depending on regional variance and specifics of your needs.
For a first car, look into getting a warranty, not because it’s a good insurance bet, but because it forces the seller to make claims of warrantability to their insurance company.
You can probably cut the cost in half (or more) if you educate yourself and get to know the local car community. If the car is a hobby rather than an experiment in transportation convenience, you can take a lot more risk, AND those risks are mitigated if you know how to get things fixed cheaply.
Is there a standard article on what “the critical risk period” is?
I thought I remembered an arbital post, but I can’t seem to find it.
I remember reading a Zvi Mowshowitz post in which he says something like “if you have concluded that the most ethical thing to do is to destroy the world, you’ve made a mistake in your reasoning somewhere.”
I spent some time search around his blog for that post, but couldn’t find it. Does anyone know what I’m talking about?
It sounds like a tagline for a blog.
Probably this one?
http://lesswrong.com/posts/XgGwQ9vhJQ2nat76o/book-trilogy-review-remembrance-of-earth-s-past-the-three
Thanks!
I thought that it was in the context of talking about EA, but maybe this is what I am remembering?
It seems unlikely though, since wouldn’t have read the spoiler-part.
Anyone have a link to the sequence post where someone posits that AIs would do art and science from a drive to compress information, but rather it would create and then reveal cryptographic strings (or something)?
I think you are thinking of “AI Alignment: Why It’s Hard, and Where to Start”:
There’s also a mention of that method in this post.
I remember reading a Zvi Mowshowitz post in which he says something like “if you have concluded that the most ethical thing to do is to destroy the world, you’ve made a mistake in your reasoning somewhere.”
I spent some time search around his blog for that post, but couldn’t find it. Does anyone know what I’m talking about?
Review of three body problem is my first guess
A hierarchy of behavioral change methods
Follow up to, and a continuation of the line of thinking from: Some classes of models of psychology and psychological change
Related to: The universe of possible interventions on human behavior (from 2017)
This post outlines a hierarchy of behavioral change methods. Each of these approaches is intended to be simpler, more light-weight, and faster to use (is that right?), than the one that comes after it. On the flip side, each of these approaches is intended to resolve a common major blocker of the approach before it.
I do not necessarily endorse this breakdown or this ordering. This represents me thinking out loud.
[Google Doc version]
[Note that all of these are more-or-less top down, and focused on the individual instead the environment]
Level 1: TAPs
If there’s some behavior that you want to make habitual, the simplest thing is to set, and then train a TAP. Identify a trigger and the action with which you want to respond to that trigger, and then practice it a few times.
This is simple, direct, and can work for actions as varied as “use NVC” and “correct my posture” and “take a moment to consider the correct spelling.”
This works particularly well for “remembering problems”, in which you can and would do the action, if only it occurred to you at the right moment.
Level 2: Modifying affect / meaning
Sometimes however, you’ll have set a TAP to do something, you’ll notice the trigger, and...you just don’t feel like doing the action.
Maybe you’ve decided that you’re going to take the stairs instead of the elevator, but you look at the stairs and then take the elevator anyway. Or maybe you want to stop watching youtube, and have a TAP to open your todo list instead, but you notice...and then just keep watching youtube.
The most natural thing to do here is to adjust your associations / affect around the behavior that you want to engage in or the behavior that you want to start. You not only want the TAP to fire, reminding you of the action, but you want the feeling of the action to pull you toward it, emotionally. Or another way of saying it, you change the meaning that you assign to the behavior.
Some techniques here include:
Selectively emphasizing different elements of an experience (like the doritos example in Nate’s post here), and other kinds of reframes
Tony Robins’ process for working with “neuro associations” of asking 1) what pain has kept me from taking this action in the past, 2) what pleasure have I gotten from not taking this action in the past, 3) what will it cost me if I don’t take this action?, 4) what pleasure will it bring me if I take this action.
This here goal chaining technique,
Some more heavy-duty NLP tools.
Behaviorist conditioning (I’m weary of this one, since it seems pretty symmetric.)
Level 3: Dialogue
The above approach only has a limited range of application, in that it can only work in situations where there are degrees of freedom in one’s affect to a stimulus or situation. In many cases, you might go in and try to change the affect around something from the top-down, and some part of you will object, or you will temporarily change the affect, but it will get “kicked out” later.
This is because your affects are typically not arbitrary. Rather they are the result of epistemic processes that are modeling the world and the impact of circumstances on your goals.
When this is the case, you’ll need to do some form of dialogue, which either updates a model of some objecting part, or modifieds the recommended strategy / affect to accommodate the objection, or find some other third option.
This can take the form of
Focusing
IDC
IFS
CT debugging
The most extreme instance of “some part has an objection” is when there is some relevant trauma somewhere in the system. Sort of by definition, this means that you’ll have an extreme objection to some possible behavior or affect changes, because that part of the state space is marked as critically bad.
Junk Drawer
As I noted, this schema describes top-down behavior change. It does not include cases where there is a problem, but you don’t have much of a sense what the problem is and/or how to approach it. For those kinds of bugs you might instead start with Focusing, or with a noticing regime.
For related reasons, this is super not relevant to blindspots.
I’m also neglecting environmental interventions, both those that simply redirect your attention (like a TAP), and those that shift the affect around an activity (like using social pressure to get yourself to do stuff, via coworking for instance). I can’t think of an environmental version of level 3.
Can anyone get a copy of this paper for me? I’m looking to get clarity about how important cryopreserving non-brain tissue is for preserving personality.
Older post: Initial Comparison between RAND and the Rationality Cluster
New post: my personal wellbeing support pillars
I’m interested in knowing your napping tools
Here you go.
New post: Napping Protocol
Thanks!
New post: The seed of a theory of triggeredness