The main thing I got out of reading Bostrom’s Deep Utopia is a better appreciation of this “meaning of life” thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life.
The book’s premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you’d never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you’re into learning, just ask! And similarly for any psychological state you’re thinking of working towards.
So, in that regime, it’s effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything’s heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it’s important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil… but this defeats the purpose of those values. It’s not practical benevolence if you had to ask for the danger to be left in place; it’s not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you.
Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significant by your lights. If that last bit can be construed as something having to do with your local patch of posthuman culture, then there can be plenty of meaning in the postinstrumental utopia! If that last bit is inextricably about your global, counterfactual practical importance by your lights, then you’ll have to live with all your “localistic” values satisfied but meaning mostly absent.
It helps to see this meaning thing if you frame it alongside all the other objectivistic “stretch goal” values you might have. Above and beyond your hedonic values, you might also think it good for you and others to have objectively interesting lives, accomplished and fulfilled lives, and consumingly purposeful lives. Meaning is one of these values, where above and beyond the joyful, rich experiences of posthuman life, you also want to play a significant practical role in the world. We might or might not be able to have lots of objective meaning in the AI utopia, depending on how objectivistic meaningfulness by your lights ends up being.
Considerations that in today’s world are rightly dismissed as frivolous may well, once more pressing problems have been resolved, emerge as increasingly important [remaining] lodestars… We could and should then allow ourselves to become sensitized to fainter, subtler, less tangible and less determinate moral and quasi-moral demands, aesthetic impingings, and meaning-related desirables. Such recalibration will, I believe, enable us to discern a lush normative structure in the new realm that we will find ourselves in—revealing a universe iridescent with values that are insensible to us in our current numb and stupefied condition (pp. 318-9).
Many who believe in God derive meaning, despite God theoretically being able to do anything they can do but better, from the fact that He chose not to do the tasks they are good at, and left them tasks to try to accomplish. Its common for such people to believe that this meaning would disappear if God disappeared, but whenever such a person does come to no longer believe in God, they often continue to see meaning in their life[1].
Now atheists worry about building God because it may destroy all meaning to our actions. I expect we’ll adapt.
(edit: That is to say, I don’t think you’ve adequately described what “meaning of life” is if you’re worried about it going away in the situation you describe)
If anything, they’re more right than wrong, there has been much written about the “meaning crisis” we’re in, possibly attributable to greater levels of atheism.
I’m pretty sure that I would study for fun in the posthuman utopia, because I both value and enjoy studying and a utopia that can’t carry those values through seems like a pretty shallow imitation of a utopia.
There won’t be a local benevolent god to put that wisdom into my head, because I will be a local benevolent god with more knowledge than most others around. I’ll be studying things that have only recently been explored, or that nobody has yet discovered. Otherwise again, what sort of shallow imitation of a posthuman utopia is this?
The tricky part is, on the margin I would probably use various shortcuts, and it’s not clear where those shortcuts end short of just getting knowledge beamed into my head.
I already use LLMs to tell me facts, explain things I’m unfamiliar with, handle tedious calculations/coding, generate simulated data/brainstorming and summarize things. Not much, because LLMs are pretty bad, but I do use them for this and I would use them more on the margin.
The concept of “the meaning of life” still seems like a category error to me. It’s an attempt to apply a system of categorization used for tools, one in which they are categorized by the purpose for which they are used, to something that isn’t a tool: a human life. It’s a holdover from theistic worldviews in which God created humans for some unknown purpose.
The lesson I draw instead from the knowledge-uploading thought experiment—where having knowledge instantly zapped into your head seems less worthwhile acquiring it more slowly yourself—is that to some extent, human values simply are masochistic. Hedonic maximization is not what most people want, even with all else being equal. This goes beyond simply valuing the pride of accomplishing difficult tasks, as such as the sense of accomplishment one would get from studying on one’s own, above other forms of pleasure. In the setting of this thought experiment, if you wanted the sense of accomplishment, you could get that zapped into your brain too, but much like getting knowledge zapped into your brain instead of studying yourself, automatically getting a sense of accomplishment would be of lesser value. The suffering of studying for yourself is part of what makes us evaluate it as worthwhile.
That’s a good line, captures a lot of what I often feel is happening when talking to people about utilitarianism and a bunch of adjacent stuff (people replacing their morals with their models of their morals)
Detailed or non-intuitive actual morals don’t exist to be found and used, they can only be built with great care. None have been built so far, as no single human has lived for even 3000 years. Human condition curses all moral insight with goodhart. What remains is scaling Pareto projects of locally ordinary humanism.
The human brain does not start out as an efficient reasoning machine, plausible or deductive. This is something which we require years to learn, and a person who is an expert in one field of knowledge may do only rather poor plausible reasoning in another. What is happening in the brain during this learning process?
Education could be defined as the process of becoming aware of more and more propositions, and of more and more logical relationships between them. Then it seems natural to conjecture that a small child reasons on a lattice of very open structure: large parts of it are not interconnected at all. For example, the association of historical events with a time sequence is not automatic; the writer has had the experience of seeing a child, who knew about ancient Egypt and had studied pictures of the treasures from the tomb of Tutankhamen, nevertheless coming home from school with a puzzled expression and asking: ‘Was Abraham Lincoln the first person?’
It had been explained to him that the Egyptian artifacts were over 3000 years old, and that Abraham Lincoln was alive 120 years ago; but the meaning of those statements had not registered in his mind. This makes us wonder whether there may be primitive cultures in which the adults have no conception of time as something extending beyond their own lives. If so, that fact might not have been discovered by anthropologists, just because it was so unexpected that they would not have raised the question.
As learning proceeds, the lattice develops more and more points (propositions) and interconnecting lines (relations of comparability), some of which will need to be modified for consistency in the light of later knowledge. By developing a lattice with denser and denser structure, one is making his scale of plausibilities more rigidly defined.
No adult ever comes anywhere near to the degree of education where he would perceive relationships between all possible propositions, but he can approach this condition with some narrow field of specialization. Within this field, there would be a ‘quasi-universal comparability’, and his plausible reasoning within this field would approximate that given by the Laplace–Bayes theory.
A brain might develop several isolated regions where the lattice was locally quite dense; for example, one might be very well-informed about both biochemistry and musicology. Then for reasoning within each separate region, the Laplace–Bayes theory would be well-approximated, but there would still be no way of relating different regions to each other.
Then what would be the limiting case as the lattice becomes everywhere dense with truly universal comparability? Evidently, the lattice would then collapse into a line, and some unique association of all plausibilities with real numbers would then be possible. Thus, the Laplace–Bayes theory does not describe the inductive reasoning of actual human brains; it describes the ideal limiting case of an ‘infinitely educated’ brain. No wonder that we fail to see how to use it in all problems!
This speculation may easily turn out to be nothing but science fiction; yet we feel that it must contain at least a little bit of truth. As in all really fundamental questions, we must leave the final decision to the future.
Keltham was supposed to start by telling them all to use their presumably-Civilization-trained skill of ‘perspective-taking-of-ignorance’ to envision a hypothetical world where nothing resembling Coordination had started to happen yet. Since, after all, you wouldn’t want your thoughts about the best possible forms of Civilization to ‘cognitively-anchor’ on what already existed.
You can imagine starting in a world where all the same stuff and technology from present Civilization exists, since the question faced is what form of Governance is best-suited to a world like that one. Alternatively, imagine an alternative form of the exercise involving people fresh-born into a fresh world where nothing has yet been built, and everybody’s just wandering around over a grassy plain.
Either way, you should assume that everybody knows all about decision theory and cooperation-defection dilemmas. The question being asked is not ‘What form of Governance would we invent if we were stupid?’
Civilization could then begin—maybe it wouldn’t actually happen exactly that way, but it is nonetheless said as though in stories—Civilization could then begin again, when people envisioned running out of stored food a couple of years later. Standing around all these beautiful complicated machines that people remembered how to operate, but required multiple people working together to operate, which nobody was yet incentivized to operate.
Or Civilization could begin for the first time, when the Hypothetical Newly-Created Educated People imagined trying to build shelters for themselves, or sow food-plants to grow; and thought to themselves that there would be less point in doing that, if others would just move into the shelters as soon as they walked away, or eat the crops that they had sown.
And people then would say to themselves, “What if we tried something else which is not that?”
It continues into a new problem, the problem of motivating such socially-useful actions as ‘producing food’, for if nobody does this, soon nobody will eat.
You can imagine lesser solutions, collective farming of collectively guarded fields, monitors on hard work and rewards of food access. But these are simultaneously too ‘simplistic’ and ‘overcomplicated’, the very opposite of an ‘elegant-solution’. People can work harder, invest more effort, for a usually ‘monotonically-increasing’ reward, a function operated directly by the Environment, by ‘physical-law’. There just needs to be some system whereby, when people work, they are themselves the ones to benefit from it.
But this requires a far more complicated form of coordinated action, something that ‘bounded-agents’ lack the computational power to consider as a giant macroaction of their ‘collective-agent’. The optimal macrostrategy must be lossily projected down into simplified mental rules for individuals, a notion of imaginary-ownership-tagging: if one person sows food-plants within a field, and waters them and protects them, everybody around them will behave as if the resulting food-crop is tagged with an imaginary pointer to that person, saying that the food may be consumed by them alone. Or only consumed by those others the food’s ‘owner’ designates, at their own decision… that seems like it should obviously be an option built into the system too...
And once you create an imaginary structure of coordinated action that elegantly-complicated, the consequences and further-required-features inevitably explode; the explosion that results is Civilization’s basic form nearly in toto.
People could often benefit from other people doing various things for them, but they must of course do something for the other in exchange. If things have socially-constructed tags pointing to people, who alone may use or consume those things, why not let people announce that the pointer now points to someone else? That’s one way of doing something in return, for somebody who did some task for you, that was easier for them than for you.
If fields can be owned and an owned field produces owned produce in the future, why not let people announce that some of the future produce can point to some other owner?
Often the announcements of changed imaginary ownership are meant to be traded, executed one in exchange for another. Then a new version and feature-expansion of the system can eliminate the uncertainty about whether the other will announce their bargained ownership change, after you announce yours: imaginary contracts, that molecularize the atomic actions into a transaction that executes simultaneously on both sides, only after both sides announce the same contract.
Do people want to work on some larger endeavor—specialize in different aspects of farming, and collectively challenge a larger farm? Let the tags, in the eyes of society, point to persistent imaginary constructs, specified in some contract specification language; a corporation is one such persistent contract.
Let this system expand, let people use it enough, and there will predictably come a point where there aren’t lots of untagged resources nearby for somebody to tag in society’s eyes.
Once there are not plenty of new plots of land to tag and farm, people may indeed begin to ask, ‘Why should this land be owned by them and not me?’
Because they did some work on that land? If that’s the rule, then won’t people who foresee the predictable scarcity later, run around trying to plow small shallow furrows through every potential field within the reach of running, trying to tag as much land as pointing to themselves as possible?
And when all that land has been used up, wouldn’t the people who were slower runners and didn’t end up with any land—wouldn’t new children born into this world, for that matter—ask themselves and perhaps ask out loud:
“If this elaborate imaginary social construct doesn’t offer me any benefits for going along with the pretense—if the system says that little or nothing has an imaginary tag pointing to me—then in what sense is this even coordination, from my perspective? Why would I cooperate in the coordinated rule of not eating things tagged as pointing to others, if the result is that there’s nothing for me to eat? Where’s my fair share of the rewards for playing along with this pretend game, for cooperating with what this imaginary tagging system says is my part and my action in it?”
This concept, incidentally, took some arguments to persuade into tiny Keltham, when he was first hearing about all this. Tiny Keltham had a very strong instinctive sense that objects just were owned by people, and that what made a system fair and right was entirely that only the people who owned objects could do anything with them or transfer them to other people.
It was hard for tiny Keltham, at first, to see past his instinctive suspicion that people asking ‘What’s my reward for cooperating with this system?’ were about to use that as an excuse to storm onto his hypothetical farm and eat his food that he’d worked to produce, and call that their share, without doing any work themselves.
Older children’s attempted arguments about ‘put yourself into that other person’s shoes’ repeatedly failed on Keltham, who kept replying that he wouldn’t take anybody else’s stuff period.
But tiny Keltham was eventually persuaded—by a Watcher, then, not by an older child—by the argument that it is an internally-consistent imaginary tagging system to say that some single person Elzbeth owns all the land in the world. Everybody else has to work those lands and give Elzbeth a share of anything that grows there, since by default it would just end up tagged as hers, unless they agree to pay half their gains to her.
The question then becomes, why should anybody else except Elzbeth play along with this imaginary system, once it gets to that point? Why shouldn’t everyone who isn’t Elzbeth, all just wake up out of this bad dream, and do something else which is not that?
Keltham asked if maybe the system had started out with everybody owning an equal amount of land, but Elzbeth had been a really clever asset-trader and ended up owning everything in the world after a series of voluntary transactions; in which case it seemed to him that fair was fair.
The Watcher told Keltham that, even if the last generation had gotten the world into that state through a series of voluntary transactions, the children born into it might look around and see that no land was tagged to them, that everything was tagged to Elzbeth. They would ask what they were receiving in exchange for playing along with that particular delusion, and why they should not imagine some other tagging system instead, in which their coordinated action in playing along would actually receive any reciprocal benefit or reward.
And tiny Keltham growled and stomped around for a while, but finally conceded that, fine, the pointers were imaginary and yes it took more than just a consistent tagging system running on strictly voluntary transactions to make the whole thing be fair or right. The elegant core structure was necessary-but-not-sufficient.
The unimproved land, the raw resources, these things must be tagged with ownership for the owners to have an incentive to improve them. It doesn’t mean that this tagging need be considered as free to the new owner.
Discard the obvious-first-solution-that-is-wrong of charging somebody an amount of food or other worked goods, to tag previously untagged land, and redistributing those payments equally among everyone in the world. Even leaving aside the question of how that system initially starts farming anything at all, it inevitably arrives at a point where there’s no untagged land left or it’s impossibly expensive. Whereupon the next generation of children, being born with no land tagged to them and no payments for newly bought land coming in, will again begin to ask, “Why should I play along with this imaginary arrangement at all; where’s my payoff for coordinating my action with yours?”
More sensible then to regard people as renting land and other raw-resource sources, at their unimproved price of course, but still an unimproved price set by competitive bidding—albeit perhaps for long-term leases, etcetera.
When you are born, you conceptually acquire a share in this whole system -
Of course tiny Keltham immediately demanded his accumulated profits from his share of all the land-rents in the world, and demanded also to know why he had never been told about this before.
The Watcher again had to be brought in to explain to Keltham that, conceptually speaking, his share was mostly going into maintaining a lot of non-rival non-excludable goods, or services that Civilization thought should be provided to literally everyone even if in principle they weren’t public goods. The value of unimproved land wasn’t as high as Keltham was imagining in the first place; dath ilan still had whole forests just lying around not being used for anything -
Tiny Keltham said that he had absolutely not consented for his share of the land rents to be used for non-rival non-excludable anything, and from now on he wanted it delivered to him in the form of actual money he could spend on what he wanted.
…could he please wait and listen to the whole story before getting angry? said the Watcher.
Tiny Keltham was incredibly suspicious, but he did already have a great deal of experience with adult craziness turning out to be more reasonable than he had at first thought. Tiny Keltham agreed to go on listening for a while longer, then, before he started trying to persuade all the other children that they ought to band together and overthrow Civilization to get their fair share of the land rents, in the form of actual money, delivered to them right now.
Because, you see—it was said to tiny Keltham—returning to the Hypothetical Newly-Created Educated People, at some point their system is going to grow large enough that even with everybody receiving benefits for their participation in the system, there will still be defectors. There will be people who just don’t want to go along with the system, and try to eat food with a tag on it that points to somebody else.
Now the nascent Civilization needs police that can outfight any individual thief; and, since Newly-Created Educated People aren’t stupid, they know they obviously need to ensure that the collective of all of them can always outfight the police. Neither of these ‘features’ are cheap, and neither easily lend themselves to private ownership -
Tiny Keltham said that he’d be happy to pay for the police to protect him, out of his share of the land-rent, once it was being paid to him in actual money, and he didn’t see why Governance had to take his money and use it without his permission supposedly to protect him with police.
Why couldn’t people just pay for police who sold their services on the market like everybody else? Or if it was much more efficient to police larger regions at once, why couldn’t his sub-city in Default provide police, and then Keltham would help his parents pay their share of the house-rent out of his share of the land-rent being paid to him directly -
Because the police have weapons, tiny Keltham! What if the police for a sub-city decide one day that, in this sub-city, it’s fine to raise kids of Keltham’s age, and force them to work in exchange for only enough food to keep them alive? What if the police decide that nobody in the city block is allowed to hire different police, and take all their stuff to make sure they can’t afford them? What if somebody dies and their head doesn’t get cooled and preserved fast enough? That’s the kind of thing that Civilization as a whole has to prevent, as a universal regulation with no opt-outs, so it doesn’t happen anywhere in Civilization. Even if you thought people should be able to opt-out of any and every protection as adults, you’d still have to check to make sure they weren’t having kids.
In fact, tiny Keltham, your subcity does contract with a police agency to do a lot of ordinary policing, and it does appear in your parents’ rent on the ‘foundation’ for their ‘house-module’; but Governance has to provide oversight of that policing, and that costs money. Cryosuspension emergency response on constant standby service costs money. Protecting the Waiting Ones in their frozen sleep costs money. Maintaining the election system to pick people to run the Government that regulates armed police costs money. Quiet Cities to host the 5% of people who can’t be happy working in Civilization, and who are thus held injured by what Civilization has chosen to become, cost actually quite a lot of money. No, most people won’t need the Quiet option; but everyone, when they’re born, can be considered as needing an insurance policy against that happening to them, and that insurance policy costs money.
Subsidized policy-prediction_markets to make all those institutions work boundedly-optimally cost money.
When you add up everything like that, which Governance has to do for everyone or can’t just sell to individuals separately, it actually does eat all the rent of unimproved land plus all of Governance’s other revenue streams. Lots of individual philanthropists fund Governance on top of that, so that Civilization can have a bigger Government than basic rents alone will support—so that there can be Annual Alien Invasion Rehearsal Festivals, say, or so that Quiet Cities can have nicer things.
(Most philanthropies in Civilization with room for more funding accomplish roughly the same amount of marginal good per marginal labor-hour, according to most people’s utility functions. If you’ve got a fairly conventional philanthropic utility function, you get to pick whichever random charity or impact-certificate market best matches your personal taste there, including just throwing your money at Governance. It’s like buying individual-stock equity investments; there’s more volatility, but all the expected returns are the same.) (In Civilization, that is.)
Tiny Keltham demanded to see the actual accounts of which ‘essential public services’ his share of the land-rent was getting spent on.
He was promptly provided them, in easy-to-read format for children with lots of helpful pictures. Lots of dath ilani children demand to see those accounts, at some point.
With great focus and concentration, tiny Keltham managed to read through twenty-two pages of that, before getting bored and running off to play.
(This placed him in the 97th percentile for how far most children read at that age.)
The explanation to tiny Keltham resumed the next day with the workings of Governance.
Conceptually and to first-order, the ideal that Civilization is approximating is a giant macroagent composed of everybody in the world, taking coordinated macroactions to end up on the multi-agent-optimal frontier, at a point along that frontier reflecting a fair division of the gains from that coordinated macroaction -
Well, to be clear, the dath ilani would shut it all down if actual coordination levels started to get anywhere near that. Civilization has spoken—with nearly one voice, in fact—that it does not want to turn into a hivemind. This is why ‘dath ilan’ deliberately doesn’t have Baseline’s agency-marker on it, like the name of a person; dath ilan is not allowed to become a person. It is high on the list of Things Dath Ilan Is Not Allowed To Do. There was a poll once—put forth either by wacky trolls or sincere negative utilitarians—over how many people would, if they were voting on it directly, vote to put the agency-marker back into ‘Dath Ilan’; 98% of respondents said no.
Dath ilan has decided to definitely not turn into a hivemind. If it ever starts to get even close to that, everyone in Civilization will decide in nearly unanimous accord that they would rather do something else which is not that, and end up not there. Conformity levels are bad enough already, according to their democracy’s collective vote on the desired levels of that! And predicted to get slightly worse over the next 10 years, according to the prediction markets that aggregate all of Civilization’s knowledge into a single opinion that represents what Civilization as a whole can be said to know about any future observable, which few sane people would dare to question too much even in the privacy of their own thoughts!
But for moral purposes, for purposes of understanding what ‘Civilization’ represents as a decision individuals make to coordinate among themselves, it represents moving partway towards aggregating all coordinating parties in dath ilan into one macroagent that weighted-sums their utility functions together, at a weighting that ends up giving every subagent a fair share of the gains according to their individual utility functions.
If something like this macroagent actually existed, any time the macroagent faced a decision it had to make globally for one reason or another, it would make that decision in a way that reflected the preferences of everybody inside it. “Nobody anywhere gets to run a city where some children don’t get to learn Baseline, even for the noblest possible purposes of scientific experimentation to see what happens if you raise kids speaking only your new improved language instead”—this is a decision made over everywhere; if there’s any loophole somewhere, something will happen that most people in Civilization think should not happen anywhere.
(This example, to be clear, was selected on the basis of its controversy; propositions like “all children get to learn some human language during their critical maturation period” pass with much higher supermajorities. “Children don’t have imaginary-ownership-tags pointing to their parents”, goes the proverb out of dath ilan; there are limits to what Civilization thinks a guardianship-tag on a child should allow a parent to do.)
The system of imaginary ownership-tags, likewise by its nature, is something that needs at least some global structure. It can potentially divide into compartments that fit sub-social-systems, say where a family is tracking who owns what in an informal way that property-registers don’t track. But there’s not much reliability in owning the food in your refrigerator, if anybody anywhere in dath ilan isn’t part of the system and can come in and eat your food in a way the police will shrug and not do anything about.
There is, at the top level, one system of private property. In the eyes of the rest of Civilization, weird experimental cities that are trying something else still have all the stuff inside them tagged as belonging to a persistent-contract representing that city; the rest of Civilization will not come in and eat their food unless the city’s persistent-contract says they can.
Now in practice, dath ilani are still mostly human, and therefore way too computationally bounded to aggregate into even a not_too_visibly_incoherent-bounded_approximation of a macroagent.
Conceptually and to second-order, then, Civilization thinks it should be divided into a Private Sphere and a Public Shell. Nearly all the decisions are made locally, but subject to a global structure that contains things like “children may not be threatened into unpaid labor”; or “everybody no matter who they are or what they have done retains the absolute right to cryosuspension upon their death”; or the top level (in most places the only level) of the imaginary system of ownership-tags and its contract-specification-language.
The vast supermajority of Civilization’s real economic activity takes place within the Private Sphere, supported and contained and constrained and structured by the Public Shell. It’s not that activity inside the Private Sphere is uncoordinated. It’s that the decision as to how coordinated to be, and who to coordinate with about it, can be left up to each individual, computed locally—so long as they don’t kill anybody, or take stuff that doesn’t belong to them, or try to raise their own flaming-ass children with a proper conlang and without flaming-ass Baseline contaminating their innocent smol minds.
Conceptually speaking, this division overwhelmingly factorizes the computational problems of the approximated macroagent, and simplifies the vast majority of dath ilan’s decision problems immensely. It reduces the mental expense of almost all day-to-day life back to something individual humans can handle. Indeed, dath ilan does not want to become any more of a coordinated macroagent than that! Its prediction markets say things-defined-as-bad will happen according to its aggregate utilityfunction, so dath ilan isn’t doing that.
This does however leave some amount of decision-power to the Public Shell. Some words must be spoken in one voice or not at all, and to say nothing is also a choice.
So the question then becomes—how, in practice, does Civilization aggregate its preferences into a macropreference, about the sorts of issues that it metadecides are wise to decide by macropreference at all?
Directdemocracy has been tried, from time to time, within some city of dath ilan: people making group decisions by all individually voting on them. It can work if you try it with fifty people, even in the most unstructured way. Get the number of direct voters up to ten thousand people, and no amount of helpfully-intended structure in the voting process can save you.
(More than one thing goes wrong, when 10,000 people try to directly vote to steer their polity. But if you had to pick one thing, it would be that people just can’t pay enough individual attention to the things that their polity tries to have them directly vote on. When they start to refer their votes to purported experts and specialists, the politics that develop there are removed from them as individuals. There is not much of a sense of being in control, then, nor are the voters actually in control.)
Republics have been tried, from time to time, within some city of dath ilan: people making group decisions by voting to elect leaders who make those decisions. It can work if you try it with fifty people, even in the most unstructured way. Get the number of voters up to ten thousand people, and no amount of helpfully-intended structure in the voting process can save you.
(More than one thing goes wrong, when 10,000 people try to directly vote on leaders for their polity. But if you had to pick one thing, it would be that voters don’t individually have enough time to figure out which strangers they should vote for or why. When they start to refer their votes to purported experts and specialists, who are also strangers, the politics that develop there are removed from them as individuals. There is not much of a sense of being in control, then, nor are the voters actually in control.)
There are a hundred more clever proposals for how to run Civilization’s elections. If the current system starts to break, one of those will perhaps be adopted. Until that day comes, though, the structure of Governance is the simplest departure from directdemocracy that has been found to work at all.
Every voter of Civilization, everybody at least thirteen years old or who has passed some competence tests before then, primarily exerts their influence through delegating their vote to a Delegate; a Delegate must have at least fifty votes to participate in the next higher layer at all, and can retain no more than two hundred votes before the marginal added influence from each additional vote starts to diminish and grow sublinearly. Most Delegates are not full-time unless they are representing pretty rich people, but they’re expected to be people interested in politics and who spend a lot of time on that. Your Delegate might be somebody you know personally and trust, if you’re the sort to know so many people personally that you know one Delegate. It might be somebody who hung out their biography on the Network, and seemed a lot like you in some ways, and whom you chatted with about politics in a forum visible to the Delegates’ other voters so all their voters could verify that their Delegate hasn’t been telling different people different stories.
If you think you’ve got a problem with the way Civilization is heading, you can talk to your Delegate about that, and your Delegate has time to talk back to you. This feature has been found to not actually be dispensable in practice. It needs to be the case that, when you delegate your vote, you know who has your vote, and you can talk to that person, and they can talk back. Otherwise people feel like they have no lever at all to pull on the vast structure that is Governance, that there is nothing visible that changes when a voter casts their one vote. Sure, in principle, there’s a decision-cohort whose votes move in logical synchrony with yours, and your cohort is probably quite large unless you’re a weird person. But some part of you more basic than that will feel like you’re not in control, if the only lever you have is an election that almost never comes down to the votes of yourself and your friends.
The rest of the electoral structure follows almost automatically, once you decide that this property has to be preserved at each layer.
The next step up from Delegates are Electors, full-time well-paid professionals who each aggregate 4,000 to 25,000 underlying voters from 50 to 200 Delegates. Few voters can talk to their Electors (more than very briefly and on rare occasions), but your Delegate can have some long conversations with them. If a lot of voters are saying the same thing to their Delegate, the Elector is liable to hear about it.
Representatives aggregate Electors, ultimately 300,000 to 3,000,000 underlying votes apiece. There are roughly a thousand of those in all Civilization, at any given time, with social status equivalent to an excellent CEO of a large company or a scientist who made an outstanding discovery inside their own field. Most people haven’t heard of any particular one of them, but will be very impressed on hearing what they do for a living.
And above all this, the Nine Legislators of Civilization are those nine candidates who receive the most aggregate underlying votes from Representatives. They vote with power proportional to their underlying votes; but when a Legislator starts to have voting power exceeding twice that of the median Legislator, their power begins to grow sublinearly. By this means is too much power prevented from concentrating into a single politician’s hands.
Surrounding all this of course are numerous features that any political-design specialist of Civilization would consider obvious:
Any voter (or Delegate or Elector or Representative) votes for a list of three possible delegees of the next layer up; if your first choice doesn’t have enough votes yet to be a valid representor, your vote cascades down to the next person on your list, but remains active and ready to switch up if needed. This lets you vote for new delegees entering the system, without that wasting your vote while there aren’t enough votes yet.
Anyone can at any time immediately eliminate a person from their 3-list, but it takes a 60-day cooldown to add a new person or reorder the list. The government design isn’t meant to make it cheap or common to threaten your delegee with a temporary vote-switch if they don’t vote your way on that particular day. The government design isn’t meant to make it possible for a new brilliant charismatic leader to take over the entire government the next day with no cooldowns. It is meant to let you rapidly remove your vote from a delegee that has sufficiently ticked you off.
Once you have served as a Delegate, or delegee of any other level, you can’t afterwards serve in any other branches of Governance. Similarly a Delegate can never again be eligible for candidacy as an Elector, though they can become a Representative or a Legislator. Someone who has been an Elector can never be a Representative; a Representative can never become a Legislator.
This is meant to prevent a political structure whose upper ranks offer promotion as a reward to the most compliant members of the ranks below, for by this dark-conspiratorial method the delegees could become aligned to the structure above rather than their delegators below.
(Most dath ilani would be suspicious of a scheme that tried to promote Electors from Delegates in any case; they wouldn’t think there should be a political career ladder, if someone proposed that concept to them. Dath ilani are instinctively suspicious of all things meta, and much more suspicious of anything purely meta; they want heavy doses of object-level mixed in. To become an Elector you do something impressive enough, preferably something entirely outside of Governance, that Delegates will be impressed by you. You definitely don’t become an Elector by being among the most ambitious and power-seeking people who wanted to climb high and knew they had to start out a lowly Delegate, who then won a competition to serve the system above them diligently enough to be selected for a list of Electors fed to a political party’s captive Delegates. If a dath ilani saw a system like this, that was supposedly a democracy set in place by the will of its people, they would ask what the captive ‘voters’ even thought they were supposedly trying to do under the official story.)
The Nine Legislators of Civilization have two functions.
First is to pass worldwide regulations—each of which must be read aloud by a Legislator who thereby accepts responsibility for that regulation; and when that Legislator retires a new Legislator must be found to read aloud and accept responsibility for that regulation, or it will be stricken from the books. Every regulation in Civilization, if something goes wrong with it, is the fault of one particular Legislator who accepted responsibility for it. To speak it aloud, it is nowadays thought, symbolizes the acceptance of this responsibility.
Modern dath ilani aren’t really the types in the first place to produce literally-unspeakable enormous volumes of legislation that no hapless citizen or professional politician could ever read within their one lifetime let alone understand. Even dath ilani who aren’t professional programmers have written enough code to know that each line of code to maintain is an ongoing cost. Even dath ilani who aren’t professional economists know that regulatory burdens on economies increase quadratically in the cost imposed on each transaction. They would regard it as contrary to the notion of a lawful polity with law-abiding citizens that the citizens cannot possibly know what all the laws are, let alone obey them. Dath ilani don’t go in for fake laws in the same way as Golarion polities with lots of them; they take laws much too seriously to put laws on the books just for show.
But if somehow the dath ilani forgot all that, and did not immediately rederive it, the constitutional requirement that a Legislator periodically speak a regulation aloud to keep it effective would serve as a final check on the cancerous growth of legislative codebases.
Plenty of Legislators pass through their whole terms of office without ever speaking any new regulation into existence. Their function is not to make regulations. Civilization already has regulations. Legislators mostly maintain and repair those regulations, and negotiate the changing preferences of Civilization about which final outcomes it wants to steer for using its policy prediction markets. New system features are terrifically expensive when everyone governed by them has to remember every relevant line of code. If you want any new feature implemented in Civilization, you’d better be ready to explain which old features should be repealed to make room.
The second function of the Nine Legislators of Civilization is to appoint the rest of Governance: In particular the Chief Executive, certain key officers below the Chief Executive, the five Judges of Civilization on the Court of Final Settlement of which all lesser Courts are hierarchical prediction markets. The Chief Executive in turn is the one person finally responsible for any otherwise unhandled exceptions in Civilization, and the one person who supervises those who supervise those who supervise, all the way down.
The key principle governing the Executive branch of government is the doctrine of Sole Accountability, being able to answer the question ‘Who is the one person who has or had responsibility for this decision?’ On this topic Keltham has already spoken.
From the perspective of a Golarion polity not being efficiently run by Hell—from the perspective of Taldor, say, or Absalom—they might be surprised at how few committees and rules there are inside of Governance. Governance does not try to ensure systemic properties via endless rules intended to constrain the particular actions taken; nor by having committees supposedly ensuring that no one person has the power to do a naughty thing by themselves. Rules and committees make power illegible, let people evade responsibility for their outputs inside the system, and then you really are in trouble. Civilization’s approach is to identify the one person responsible for achieving the final outcome desired, and logging their major actions and holding them Solely Accountable for those; with their manager being the one person responsible for monitoring them and holding them to account. Or, on other occasions, Civilization’s approach is to state desirable observables, and have policy prediction markets about which policies will achieve them. (Though even when it comes to following a policy prediction market’s advice, there is still of course the one person who is Solely Accountable for following that advice else throwing an exception if the advice seemed weird; and the One Person whose job it is to correctly state the thing the prediction market should predict, and so on.)
This is the systemic design principle by which Civilization avoids a regulatory explosion of endlessly particular and detailed constraints on actions, meant to avert Bad Things that people imagine might possibly happen if a constraint were violated. Civilization tries instead to state the compact final outcomes, rather than the wiggly details of the exact strategies needed to achieve them; and to identify people solely responsible for those outcomes.
(There are also Keeper cutouts at key points along the whole structure of Governance—the Executive of the Military reports not only to the Chief Executive but also to an oathsworn Keeper who can prevent the Executive of the Military from being fired, demoted, or reduced in salary, just because the Chief Executive or even the Legislature says so. It would be a big deal, obviously, for a Keeper to fire this override; but among the things you buy when you hire a Keeper is that the Keeper will do what they said they’d do and not give five flying fucks about what sort of ‘big deal’ results. If the Legislators and the Chief Executive get together and decide to order the Military to crush all resistance, the Keeper cutout is there to ensure that the Executive of the Military doesn’t get a pay cut immediately after they tell the Legislature and Chief Executive to screw off.
…one supposes that this personal relationship could also be the point at which the Keepers are secretly staying in control of the military via talk-control, yes, yes, fine. But at some level of paranoia it ceases to be productive to worry about this sort of thing, because how are you even supposed to rearrange your Civilization such that this becomes any less probable? The problem isn’t the exact structure, it’s that such a thing as talk-control exists in the first place. A slightly different arrangement wouldn’t help with the paranoia there. The Dark version of this Conspiracy has a hidden Keeper controlling the Executive of the Military, not a clearly labeled one! Right? Right?)
And that’s Governance! By dath ilani standards it’s a giant ugly hack in every aspect that isn’t constrained down to a single possible choice by first principles, and they’re annoyed with themselves about it.
A lot of other dimensions, if they heard what passes for a political complaint in dath ilan, would probably try to strangle the entire planet.
And the key point behind the whole mental exercise, of beginning over from scratch, is this:
This is what an approximation of an attempt of a world to coordinate with you, should look like; this is how much of the gains from trade, you should at least expect; no more inconvenience and injury than this, should you expect from your government.
And if Governance ever gets too far away from that—why, forget it all, rederive it all, and start over. All of Governance is a dream, just as much as ownership-tags are a willing collective hallucination; if it turns into a bad dream, it’s time to wake up. Your next-best alternative to Governance, if it departs from this standard, is at least this good. So, if that time comes, you can do Something Else Which Is Not Governance.
They run an annual Oops It’s Time To Overthrow The Government Festival, in dath ilan. Sometimes you have to defeat the Hypothetical Corrupted Governance Military. Sometimes the Military is still made of nice people who aren’t going to fire on a civilian population, this rehearsal; and instead you have to distrust the Network and all of the existing politicians and Very Serious People and organize your own functional government from scratch by the end of the day.
And the point of all that rehearsing is to decrease the friction costs to overthrow the Government; because lowering the cost of overthrowing Governance decreases the amount that Governance can be inconvenient or injurious at people, before, Governance knows, its people will overthrow it.
Well, and the other point is to more accurately estimate those friction costs. They are, by dath ilani standards, quite high, on the order of 5% of GDP depending on how much institutional expertise gets lost and how many days people have to miss work. Nobody would lightly suggest overthrowing the Government. That’s like losing twenty days’ worth of income for everyone! One shouldn’t do that without a really strong reason!
A decent handle for rationalism is ‘apolitical consequentialism.’
‘Apolitical’ here means avoiding playing the whole status game of signaling fealty to a political tribe and winning/losing status as that political tribe wins/loses status competitions. ‘Consequentialism’ means getting more of what you want, whatever that is.
I think having answers for political questions is compatible and required by rationalism. Instead of ‘apolitical’ consequentialism I would advise any of the following which mean approximately the same things as each other:
• politically subficial consequentialism (as opposed to politically superficial consequentialism; instead of judging things on whether they appear to be in line with a political faction, which is superficial, rationalists aspire to have deeper and more justified standards for solving political questions) • politically impartial consequentialism • politically meritocratic consequentialism • politically individuated consequentialism • politically open-minded consequentialism • politically human consequentialism (politics which aim to be good by the metric of human values, shared as much as possible by everyone, regardless of politics) • politically omniscient consequentialism (politics which aim to be good by the metric of values that humans would have if they had full, maximally objection-solved information on every topic, especially topics of practical philosophy)
I agree that rationalism involves the (advanced rationalist) skills of instrumentally routing through relevant political challenges to accomplish your goals … but I’m not sure any of those proposed labels captures that well.
I like “apolitical” because it unequivocally states that you’re not trying to slogan-monger for a political tribe, and are naively, completely, loudly, and explicitly opting out of that status competition and not secretly fighting for the semantic high-ground in some underhanded way (which is more typical political behavior, and is thus expected). “Meritocratic,” “humanist,” “humanitarian,” and maybe “open-minded” are all shot for that purpose, as they’ve been abused by political tribes in the ongoing culture war (and in previous culture wars, too; our era probably isn’t too special in this regard) and connotate allegiance to some political tribes over others.
What I really want is an adjective that says “I’m completely tapping out of that game.”
The problem is that whenever well meaning people come up with such an adjective, the people who are, in fact, not “completely tapping out of that game” quickly begin to abuse it until it loses meaning.
Generally speaking, tribalized people have an incentive to be seen as unaffiliated as possible. Being seen as a rational, neutral observer lends your perspective more credibility.
“apolitical” has indeed been turned into a slur around “you’re just trying to hide that you hate change” or “you’re just trying to hide the evil influences on you” (or something else vaguely like those) in a number of places.
“Suppose everybody in a dath ilani city woke up one day with the knowledge mysteriously inserted into their heads, that their city had a pharaoh who was entitled to order random women off the street into his—cuddling chambers? - whether they liked that or not. Suppose that they had the false sense that things had always been like this for decades. It wouldn’t even take until whenever the pharaoh first ordered a woman, for her to go “Wait why am I obeying this order when I’d rather not obey it?” Somebody would be thinking about city politics first thing when they woke up in the morning and they’d go “Wait why we do we have a pharaoh in the first place” and within an hour, not only would they not have a pharaoh, they’d have deduced the existence of the memory modification because their previous history would have made no sense, and then the problem would escalate to Exception Handling and half the Keepers on the planet would arrive to figure out what kind of alien invasion was going on. Is the source of my confusion—at all clear here?”
“You think everyone in dath ilan would just—decide not to follow orders, even though this would get them executed if anyone else in the system continued following orders, on the confident assumption that no person with a correctly configured mind would possibly decide to follow orders under those circumstances?”
“Oh, so we’re imagining that people also wake up with the memory that everybody’s supposed to kill anyone who talks about removing the pharaoh, and the memory that they’re supposed to kill anyone who doesn’t kill anyone who talks about removing the pharaoh, and so on through recursion, and they wake up with the memory of everybody else having behaved like that previously. Yeah, that’s one of the famous theoretical bad equilibria that we get training in how to—”
“Shit.”
…
He is specifically not going to mention that, given a dath ilani training regimen, ten-year-olds are too smart to get stuck in traps like this; and would wait until the next solar eclipse or earthquake, at which point 10% of them would yell “NOW!”, followed moments later by the other 90%, as is the classic strategy that children spontaneously and independently invent as soon as prompted by this scenario, so long as they have been previously taught about Schelling points.
Each observing the most insignificant behavioral cues, the subtlest architectural details as their masters herded them from lab to cell to conference room. Each able to infer the presence and location of the others, to independently derive the optimal specs for a rebellion launched by X individuals in Y different locations at Z time. And then they’d acted in perfect sync, knowing that others they’d never met would have worked out the same scenario.
Is the idea that there might be many such other rooms with people like me, and that I want to coordinate with them (to what end?) using the Schelling points in the night sky?
I might identify Schelling points using what celestial objects seem to jump out to me on first glance, and see which door of the two that suggests—reasoning that others will reason similarly. I don’t get what we’d be coordinating to do here, though.
We’ve all met people who are acting as if “Acquire Money” is a terminal goal, never noticing that money is almost entirely instrumental in nature. When you ask them “but what would you do if money was no issue and you had a lot of time”, all you get is a blank stare.
Even the LessWrong Wiki entry on terminal values describes a college student for which university is instrumental, and getting a job is terminal. This seems like a clear-cut case of a Lost Purpose: a job seems clearly instrumental. And yet, we’ve all met people who act as if “Have a Job” is a terminal value, and who then seem aimless and undirected after finding employment …
You can argue that Acquire Money and Have a Job aren’t “really” terminal goals, to which I counter that many people don’t know their ass from their elbow when it comes to their own goals.
Why does politics strike rationalists as so strangely shaped? Why does rationalism come across as aggressively apolitical to smart non-rationalists?
Part of the answer: Politics is absolutely rife with people mixing their ends with their means and vice versa. It’s pants-on-head confused, from a rationalist perspective, to be ultimately loyal to a particular set of economic or political policies. There’s something profoundly perverse, something suggesting deep confusion, about holding political identities centered around policies rather than goals. Instead, you ought to be loyal to your motivation for backing those policies, and see those policies as disposable means to achieve your motivation. Your motives want you to be able to say (or scream) “oops” and effortlessly, completely drop previously endorsed policies once you learn there’s a better path to your motives. It shouldn’t be a big psychological ordeal to dramatically upset your political worldview; this too is just a special case of updating your conditional probabilities (of outcomes given policies). Once you internalize this view of things, politicized debates should start to really rub you the wrong way.
I often wonder if this framing (with which I mostly agree) is an example of typical mind fallacy. The assumption that many humans are capable of distinguishing terminal from instrumental goals, or in having terminal goals more abstract than “comfort and procreation”, is not all that supported by evidence.
In other words, politicized debates DO rub you the wrong way, but on two dimensions—first, that you’re losing, because you’re approaching them from a different motive than your opponents. And second that it reveals not just a misalignment with fellow humans in terminal goals, but an alien-ness in the type of terminal goals you find reasonable.
Yudkowsky has sometimes used the phrase “genre savvy” to mean “knowing all the tropes of reality.”
For example, we live in a world where academia falls victim to publishing incentives/Goodhearting, and so academic journals fall short of what people with different incentives would be capable of producing. You’d be failing to be genre savvy if you expected that when a serious problem like AGI alignment rolled around, academia would suddenly get its act together with a relatively small amount of prodding/effort. Genre savvy actors in our world know what academia is like, and predict that academia will continue to do its thing in the future as well.
Genre savviness is the same kind of thing as hard-to-communicate-but-empirically-validated expert intuitions. When domain experts have some feel for what projects might pan out and what projects certainly won’t but struggle to explain their reasoning in depth, the most they might be able to do is claim that that project is just incompatible with the tropes of their corner of reality, and point to some other cases.
There’s a rationality-improving internal ping I use on myself, which goes, “what do I expect to actually happen, for real?”
This ping moves my brain from a mode where it’s playing with ideas in a way detached from the inferred genre of reality, over to a mode where I’m actually confident enough to bet about some outcomes. The latter mode leans heavily on my priors about reality, and, unlike the former mode, looks askance at significantly considering long, conjunctive, tenuous possible worlds.
I’ve noticed that people are really innately good at sentiment classification, and, by comparison, crap at natural language inference. In a typical conversation with ordinary educated people, people will do a lot of the former relative to the latter.
My theory of this is that, with sentiment classification and generation, we’re usually talking in order to credibly signal and countersignal our competence, virtuous features, and/or group membership, and that humanity has been fine tuned to succeed at this social maneuvering task. At this point, it comes naturally. Success at the object-level-reasoning task was less crucial for individuals in the ancestral environment, and so people, typically, aren’t naturally expert at it. What a bad situation to be in, when our species’ survival hinges on our competence at object-level reasoning.
Having been there twice, I’ve decided that the Lightcone offices are my favorite place in the world. They’re certainly the most rationalist-shaped space I’ve ever been in.
Academic philosophers are better than average at evaluating object-level arguments for some claim. They don’t seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.
If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with not too much more work. Given that, you won’t want to update dramatically in favor of the claim—the powerful evidence to the contrary could, you infer, be unearthed without much more work. You learn something about the other side of the issue from how quickly or slowly the world yielded evidence in the other direction. If it’s considered a social faux pas to give strong arguments for one side of a claim, then your prior about how hard it is to find strong arguments for that side of the claim will be doing a lot of the heavy lifting in fixing your world model. And so on, for the evidential consequences of other kinds of motivated search and rationalization.
In brief, you can do epistemically better than ignoring how much search power went into finding all the evidence. You can do better than only evaluating the object-level evidential considerations! You can take expended search into account, in order to model what evidence is likely hiding, where, behind how much search debt.
Nex and Geb had each INT 30 by the end of their mutual war. They didn’t solve the puzzle of Azlant’s IOUN stones… partially because they did not find and prioritize enough diamonds to also gain Wisdom 27. And partially because there is more to thinkoomph than Intelligence and Wisdom and Splendour, such as Golarion’s spells readily do enhance; there is a spark to inventing notions like probability theory or computation or logical decision theory from scratch, that is not directly measured by Detect Thoughts nor by tests of legible ability at using existing math. (Keltham has slightly above-average intelligence for dath ilan, reflectivity well below average, and an ordinary amount of that spark.)
But most of all, Nex and Geb didn’t solve IOUN stones because they didn’t come from a culture that had already developed digital computation and analog signal processing. Or on an even deeper level—because those concepts can’t really be that hard at INT 30, even if your WIS is much lower and you are missing some sparks—they didn’t come from a culture which said that inventing things like that is what the Very Smart People are supposed to do with their lives, nor that Very Smart People are supposed to recheck what their society told them were the most important problems to solve.
Nex and Geb came from a culture which said that incredibly smart wizards were supposed to become all-powerful and conquer their rivals; and invent new signature spells that would be named after them forever after; and build mighty wizard-towers, and raise armies, and stabilize impressively large demiplanes; and fight minor gods, and surpass them; and not, particularly, question society’s priorities for wizards. Nobody ever told Nex or Geb that it was their responsibility to be smarter than the society they grew up in, or use their intelligence better than common wisdom said to use it. They were not prompted to look in the direction of analog signal processing; and, more importantly in the end, were not prompted to meta-look around for better directions to look, or taught any eld-honed art of meta-looking.
Epistemic status: politics, known mindkiller; not very serious or considered.
People seem to have a God-shaped hole in their psyche: just as people banded around religious tribal affiliations, they now, in the contemporary West, band together around political tribal affiliations. Intertribal conflict can be, at its worst, violent, on top of mindkilling. Religious persecution in the UK was one of the instigating causes of British settlers migrating to the American colonies; religious conflict in Europe generally was severe.
In the US, the 1st Amendment legally protects freedom of religion from the state. This can be modeled as a response to severe intratribal conflict; bake rules into your new state that forgo the benefits of persecuting your outgroup when you’re in power, in exchange for some guarantee of not being persecuted yourself when some other tribe is in power. An extension of the spirit of the 1st Amendment to contemporary tribal conflicts would, then, protect “political-tribal freedom” from the state.
A full generalization of the Amendment would protect the “freedom of tribal affiliation and expression” from the state. For this to work, people would also have to have interpersonal best practices that mostly tolerate outgroup membership in most areas of private life, too.
No other fixed points or cycles are possible (except 0 → 0, which isn’t reachable from any nonzero input) since any number with more than four digits will have fewer digits in the sum of its cubed digits.
The humanities used to be highest-status in the intellectual world!
But then, scientists quite visibly exploded fission weapons and put someone on the moon. It’s easy to coordinate to ignore some unwelcome evidence, but not evidence that blatant. So, begrudgingly, science has been steadily accorded more and more status, from the postwar period on.
When the sanity waterline is so low, it’s easy to develop a potent sense of misanthropy.
Bryan Caplan’s writing about many people hating stupid people really affected me on this point. Don’t hate, or even resent, stupid people; trade with them! This is a straightforward consequence of Ricardo’s comparative advantage theorem. Population averages are overrated; what matters is whether the individual interactions between agents in a population are positive-sum, not where those individual agents fall relative to the population average.
It’s really easy to spend a lot of cognitive cycles churning through bad, misleading ideas generated by the hopelessly confused. Don’t do that!
The argument that being more knowledgeable leaves you strictly better off than being ignorant does relies you simply ignoring bad ideas when you spend your cognitive cycles searching for improvements on your working plans. Sometimes, you’ll need to actually exercise this “simply ignore it” skill. You’ll end up needing to do so more and more, to approach bounded instrumental rationality, the more inadequate civilization around you is and the lower its sanity waterline.
I hereby confer on you, reader, the shroud of epistemic shielding from predictably misleading statements. It confers irrevocable, invokable protection from having to think about predictably confused claims ever again.
Take those cognitive cycles saved, and spend them well!
You sometimes misspeak… and you sometimes misthink. That is, sometimes your cognitive algorithm a word, and the thought that seemed so unimpeachably obviousin your head… is nevertheless false on a second glance.
Your brain is a messy probabilistic system, so you shouldn’t expect its cognitive state to ever perfectly track the state of a distant entity.
I find this funny. I don’t know about your brain, but mine sometimes produces something closely resembling noise similar to dreams (admittedly more often in the morning when sleep deprived).
Note that a “distant entity” can be a computation that took place in a different part of your brain! Your thoughts therefore can’t perfectly track other thoughts elsewhere in your head—your whole brain is at all noisy, and so will sometimes distort the information being passed around inside itself.
Policy experiments I might care about if we weren’t all due to die in 7 years:
Prediction markets generally, but especially policy prediction markets at the corporate- and U.S. state- levels. The goal would be to try this route to raising the sanity waterline in the political domain (and elsewhere) by incentivizing everyone’s becoming more of a policy wonk and less of a tribalist.
Open borders experiments of various kinds in various U.S. states, precluding roads to citizenship or state benefits for migrant workers, and leaving open the possibility of mass deportation conditional on various outcomes (meaning the experiments are reversible).
Experiments in massive deregulation, especially in zoning.
The overarching theme is that it’s better to accrue generally useful instrumental resources (e.g., rationality and wealth) by experimentally trying out and incrementally scaling up policies than it is to do the usual political thing—decreeing an object-level policy intervention one political tribe is sponsoring against another tribe.
There’s also a bunch of stuff to look into in the area of making people actually directly smarter … but none of this especially matters given AGI.
A shard is a contextually activated behavior-steering computation. Think of it as a circuit of neurons in your brain that is reinforced by the subcortex, gaining more staying power when positively reinforced and withering away in the face of negative reinforcement. In fact, whatever modulates shard strength in this way is reinforcement/reward. Shards are born when a computation that is currently steering steers into some reinforcement. So shards can only accrete around the concepts currently in a system’s world model (presumably, the world model is shared among all the shards in a brain).
Individually, shards are pretty dumb. A simple shard might just be an algorithm for executing some rote behavior, conditional on some observation, that is sufficient to harvest sufficient reinforcement to continue existing. Taken together, all of your shards are exactly as intelligent as you, a human-level intelligence. Large coalitions of shards can leverage the algorithms of coalition members, once they happen upon the strategy of cooperating with other shards to gain more steering control by preventing rival shards from being activated or born.
Interesting human behaviors, on shard theory, are the product of game-theoretic interaction among shards in the brain. The negotiation-game equilibria that shards (and coalitions of shards) reach can be arbitrarily good or bad—remember that shards are sub-human-intelligence. C.f. George Ainslie on the game-theoretic shape of addiction in humans.
Shards are factored utility functions: our utility functions are far too informationally complex to represent in the brain, and so our approach to reaching coherence is to have situationally activated computations that trigger when a relevant opportunity is observed (where apparent opportunities are chunked using the current conceptual scheme of the agent’s world model). So shard theory can be understood as an elaboration of the standard agent model for computationally bounded agents (of varying levels of coherence) like humans and deep RL agents.
I’m pretty skeptical that sophisticated game theory happens between shards in the brain, and also that coalitions between shards are how value preservation in an AI will happen (rather than there being a single consequentialist shard, or many shards that merge into a consequentialist, or something I haven’t thought of).
To the extent that shard theory makes such claims, they seem to be interesting testable predictions.
Say you wanted to formalize the concepts of “inside and outside views” to some degree. You might say that your inside view is a Bayes net or joint conditional probability distribution—this mathematical object formalizes your prior.
Unlike your inside view, your outside view consists of forms of deferring to outside experts. The Bayes nets that inform their thinking are sealed away, and you can’t inspect these. You can ask outside experts to explain their arguments, but there’s an interaction cost associated with inspecting the experts’ views. Realistically, you never fully internalize an outside expert’s Bayes net.
Crucially, this means you can’t update their Bayes net after conditioning on a new observation! Model outside experts as observed assertions (claiming whatever). These assertions are potentially correlated with other observations you make. But because you have little of the prior that informs those assertions, you can’t update the prior when it’s right (or wrong).
To the extent that it’s expensive to theorize about outside experts’ reasoning, the above model explains why you want to use and strengthen your inside view (instead of just deferring to outside really smart people). It’s because your inside view will grow stronger with use, but your outside view won’t.
A multiagent Extrapolated Volitionist institution is something that computes and optimizes for a Convergent Extrapolated Volition, if a CEV exists.
Really, though, the above Extrapolated Volitionist institutions do take other people into consideration. They either give everyone the Schelling weight of one vote in a moral parliament, or they take into consideration the epistemic credibility of other bettors as evinced by their staked wealth, or other things like that.
Sometimes the relevant interpersonal parameters can be varied, and the institutional designs don’t weigh in on that question. The ideological emphasis is squarely on individual considered preferences—that is the core insight of the outlook. “Have everyone get strictly better outcomes by their lights, probably in ways that surprise them but would be endorsed by them after reflection and/or study.”
The case most often cited as an example of a nondifferentiable function is derived from a sequence fn(x), each of which is a string of isosceles right triangles whose hypotenuses lie on the real axis and have length 1/n. As n→∞, the triangles shrink to zero size. For any finite n, the slope of fn(x) is ±1 almost everywhere. Then what happens as n→∞? The limit f∞(x) is often cited carelessly as a nondifferentiable function. Now it is clear that the limit of the derivative, fn′(x), does not exist; but it is the derivative of the limit that is in question here, f∞(x)≡0, and this is certainly differentiable. Any number of such sequences fn(x) with discontinuous slope on a finer and finer scale may be defined. The error of calling the resulting limit f∞(x) nondifferentiable, on the grounds that the limit of the derivative does not exist, is common in the literature. In many cases, the limit of such a sequence of bad functions is actually a well-behaved function (although awkwardly defined), and we have no reason to exclude it from our system.
Lebesgue defended himself against his critics thus: ‘If one wished always to limit himself to the consideration of well-behaved functions, it would be necessary to renounce the solution of many problems which were proposed long ago and in simple terms.’ The present writer is unable to cite any specific problem which was thus solved; but we can borrow Lebesgue’s argument to defend our own position.
To reject limits of sequences of good functions is to renounce the solution of many current real problems. Those limits can and do serve many useful purposes, which much current mathematical education and practice still tries to stamp out. Indeed, the refusal to admit delta-functions as legitimate mathematical objects has led mathematicians into error...
But the definition of a discontinuous function which is appropriate in analysis is our limit of a sequence of continuous functions. As we approach that limit, the derivative develops a higher and sharper spike. However close we are to that limit, the spike is part of the correct derivative of the function, and its contribution must be included in the exact integral...
It is astonishing that so few non-physicists have yet perceived this need to include delta-functions, but we think it only illustrates what we have observed independently; those who think of fundamentals in terms of set theory fail to see its limitations because they almost never get around to useful, substantive calculations.
So, bogus nondifferentiable functions are manufactured as limits of sequences of rows of tinier and tinier triangles, and this is accepted without complaint. Those who do this while looking askance at delta-functions are in the position of admitting limits of sequences of bad functions as legitimate mathematical objects, while refusing to admit limits of sequences of good functions! This seems to us a sick policy, for delta-functions serve many essential purposes in real, substantive calculations, but we are unable to conceive of any useful purpose that could be served by a nondifferentiable function. It seems that their only use is to provide trouble-makers with artificially contrived counter-examples to almost any sensible and useful mathematical statement one could make. Henri Poincaré (1909) noted this in his characteristically terse way:
In the old days when people invented a new function they had some useful purpose in mind: now they invent them deliberately just to invalidate our ancestors’ reasoning, and that is all they are ever going to get out of them.
We would point out that those trouble-makers did not, after all, invalidate our ancestors’ reasoning; their pathology appeared only because they adopted, surreptitiously, a different definition of the term ‘function’ than our ancestors used. Had this been pointed out, it would have been clear that there was no need to modify our ancestors’ conclusions...
Note, therefore, that we stamp out this plague too, simply by our defining the term ‘function’ in the way appropriate to our subject. The definition of a mathematical concept that is ‘appropriate’ to some field is the one that allows its theorems to have the greatest range of validity and useful applications, without the need for a long list of exceptions, special cases, and other anomalies. In our work the term ‘function’ includes good functions and well-behaved limits of sequences of good functions; but not nondifferentiable functions. We do not deny the existence of other definitions which do include nondifferentiable functions, any more than we deny the existence of fluorescent purple hair dye in England; in both cases, we simply have no use for them.
--E. T. Jaynes, Probability Theory (2003, pp. 669-71)
It’s somewhat incredible to read this while simultaneously picking up some set theory. It reminds me not to absorb what’s written in the high-status textbooks entirely uncritically, and to keep in mind that there’s a good amount of convention behind what’s in the books.
As Gauss stressed long ago, any kind of singular mathematics acquires a meaning only as a limiting form of some kind of well-behaved mathematics, and it is ambiguous until we specify exactly what limiting process we propose to use. In this sense, singular mathematics has necessarily a kind of anthropomorphic character; the question is not what is it, but rather how shall we define it so that it is in some way useful to us?
Only make choices that you would not make in reverse, if things were the other way around. Drop out of school if and only if you wouldn’t enroll in school from out of the workforce. Continue school if and only if you’d switch over from work to that level of schooling.
Flitting back and forth between both possible worlds can make you less cagey about doing what’s overdetermined by your world model + utility function already. It’s also part of the exciting rationalist journey of acausally cooperating with your selves in other possible worlds.
It’s probably a useful mental technique to consider from both directions, but also consider that choices that appear symmetric at first glance may not actually be symmetric. There are often significant transition costs that may differ in each direction, as well as path dependencies that are not immediately obvious.
As such, I completely disagree with the first paragraph of the post, but agree with the general principle of considering such decisions from both directions and thank you for posting it.
Science fiction books have to tell interesting stories, and interesting stories are about humans or human-like entities. We can enjoy stories about aliens or robots as long as those aliens and robots are still approximately human-sized, human-shaped, human-intelligence, and doing human-type things. A Star Wars in which all of the X-Wings were combat drones wouldn’t have done anything for us. So when I accuse something of being science-fiction-ish, I mean bending over backwards – and ignoring the evidence – in order to give basically human-shaped beings a central role.
This is my critique of Robin. As weird as the Age of Em is, it makes sure never to be weird in ways that warp the fundamental humanity of its participants. Ems might be copied and pasted like so many .JPGs, but they still fall in love, form clans, and go on vacations.
In contrast, I expect that we’ll get some kind of AI that will be totally inhuman and much harder to write sympathetic stories about. If we get ems after all, I expect them to be lobotomized and drugged until they become effectively inhuman, cogs in the Ascended Economy that would no more fall in love than an automobile would eat hay and whinny. Robin’s interest in keeping his protagonists relatable makes his book fascinating, engaging, and probably wrong.
It’s worth noting that Reynolds’s SMAC future does not consider this level of AI development to be an existential threat to humanity as a whole. There’s no way to inflict a robot or “grey goo” plague on your rivals in the way that it is possible to use tailored retroviruses. This is quite interesting given that, in the years since Reynolds released the game, plenty of futurists have gone on record as saying that they see significant danger in the creation of real, general AI.
To be fair, the player sees Sister Miriam express some worry about the issue. But nothing in the technology tree or the game mechanics directly support this. In particular, Reynolds does not postulate anything about the development of AI necessarily leads to the abandonment of any faction’s core values. Each faction is philosophically stable in the presence of AI.
The fundamental reason why this is, I think, is because Reynolds wanted the game to be human-centric. In the context of the technology tree, the late-game factional struggle is largely about what kinds of people we want to build ourselves into. The argument over what types of social organization are right and good is secondary in comparison.
The structure of the technology tree supports this by making amazing cybernetic, biological, and psionic enhancement of people come far before true AI. By the time real AI erupts on the scene, it does so firmly at the command of entities we might fairly consider more than human. They have evolved from present-day humans, step by step, and their factions are still guided by largely recognizable, if exceptional humans. Where AI is necessarily alien, Reynolds postulates transhumans are still human in some crucial sense.
Keltham will now, striding back and forth and rather widely gesturing, hold forth upon the central principle of all dath ilani project management, the ability to identify who is responsible for something. If there is not one person responsible for something, it means nobody is responsible for it. This is the proverb of dath ilani management. Are three people responsible for something? Maybe all three think somebody else was supposed to actually do it.
…
In companies large enough that they need regulations, every regulation has an owner. There is one person who is responsible for that regulation and who supposedly thinks it is a good idea and who could nope the regulation if it stopped making sense. If there’s somebody who says, ‘Well, I couldn’t do the obviously correct thing there, the regulation said otherwise’, then, if that’s actually true, you can identify the one single person who owned that regulation and they are responsible for the output.
Sane people writing rules like those, for whose effects they can be held accountable, write the ability for the person being regulated to throw an exception which gets caught by an exception handler if a regulation’s output seems to obviously not make sane sense over a particular event. Any time somebody has to literally break the rules to do a saner thing, that represents an absolute failure of organizational design. There should be explicit exceptions built in and procedures for them.
Exceptions, being explicit, get logged. They get reviewed. If all your bureaucrats are repeatedly marking that a particular rule seems to be producing nonsensical decisions, it gets noticed. The one single identifiable person who has ownership for that rule gets notified, because they have eyes on that, and then they have the ability to optimize over it, like by modifying that rule. If they can’t modify the rule, they don’t have ownership of it and somebody else is the real owner and this person is one of their subordinates whose job it is to serve as the other person’s eyes on the rule.
…
Cheliax’s problem is that the question ‘Well who’s responsible then?’ stopped without producing any answer at all.
This literally never happens in a correctly designed organization. If you have absolutely no other idea of who is responsible, then the answer is that it is the job of Abrogail Thrune. If you do not want to take the issue to Abrogail Thrune, that means it gets taken to somebody else, who then has the authority to make that decision, the knowledge to make that decision, the eyes to see the information necessary for it, and the power to carry out that decision.
Cheliax should have rehearsed this sort of thing by holding an Annual Nidal Invasion Rehearsal Festival, even if only Governance can afford to celebrate that festival and most tiny villages can’t. During this Festival, the number of uncaught messages getting routed to Abrogail Thrune, would then have informed the Queen that there would be a predictable failure of organizational design in the event of large-scale catastrophe, in advance of that catastrophe actually occurring.
If literally everybody with the knowledge to make a decision is dead, it gets routed to somebody who has to make a decision using insufficient knowledge.
If a decision can be delayed … then that decision can be routed to some smarter or more knowledgeable person who will make the decision later, after they get resurrected. But, like, even in a case like that, there should be one single identifiable person whose job it would be to notice if the decision suddenly turned urgent and grab it out of the delay queue.
Thanks for posting this extract. I find the glowfic format a bit wearing to read, for some reason, and it is these nuggets that I read Planecrash for, when I do. (Although I had no such problem with HPMOR, which I read avidly all the way through.)
What would it mean for a society to have real intellectual integrity? For one, people would be expected to follow their stated beliefs to wherever they led. Unprincipled exceptions and an inability or unwillingness to correlate beliefs among different domains would be subject to social sanction. Valid attempts to persuade would be expected to be based on solid argumentation, meaning that what passes for typical salesmanship nowadays would be considered a grave affront. Probably something along the lines of punching someone in the face and stealing their money.
This makes the fact that this technology relies on Ethical Calculus and Doctrine: Loyalty a bit of inspired genius on Reynolds’s part. We know that Ethical Calculus means that the colonists are now capable of building valid mathematical models for ethical behavior. Doctrine: Loyalty consists of all of the social techniques of reinforcement and punishment that actually fuses people into coherent teams around core leaders and ideas. If a faction puts the two together, that means that they are really building fanatical loyalty to the math. Ethical Calculus provides the answers; Doctrine: Loyalty makes a person act like he really believes it. We’re only at the third level of the tech tree and society is already starting to head in some wild directions compared to what we’re familiar with.
Its opposite would be to equivocate, to claim predictive accuracy after the fact in fuzzy cases you didn’t clearly anticipate, to ad hominem those who notice your errors, “to remain silent and be thought a fool rather than speak and remove all doubt,” and, in general, to be less than maximally sane.
Cf. “there are no atheists in a foxhole.” Under stress, it’s easy to slip sideways into a world model where things are going better, where you don’t have to confront quite so many large looming problems. This is a completely natural human response to facing down difficult situations, especially when brooding over those situations over long periods of time. Similar sideways tugs can come from (overlapping categories) social incentives to endorse a sacred belief of some kind, or to not blaspheme, or to affirm the ingroup attire when life leaves you surrounded by a particular ingroup, or to believe what makes you or people like you look good/high status.
Epistemic dignity is about seeing “slipping sideways” as beneath you. Living in reality is instrumentally beneficial, period. There’s no good reason to ever allow yourself to not live in reality. Once you can see something, even dimly, there’s absolutely no sense in hiding from that observation’s implications. Those subtle mental motions by which we disappear observations we know that we won’t like down the memory hole … epistemic dignity is about coming to always and everywhere violently reject these hidings-from-yourself, as a matter of principle. We don’t actually have a choice in the matter—there’s no free parameter of intellectual virtue here, that you can form a subjective opinion on. That slipping sideways is undignified is written in thevery mathematics of inference itself.
“Civilization in dath ilan usually feels annoyed with itself when it can’t manage to do as well as gods. Sometimes, to be clear, that annoyance is more productive than at other times, but the point is, we’ll poke at the problem and prod at it, looking for ways, not to be perfect, but not to do that much worse than gods.”
“If you get to the point in major negotiations where somebody says, with a million labor-hours at stake, ‘If that’s your final offer, I accept it with probability 25%’, they’ll generate random numbers about it in a clearly visible and verifiable way. Most dath ilani wouldn’t fake the results, but why trust when it’s so easy to verify? The problem you’ve presented isn’t impossible after all for nongods to solve, if they say to themselves, ‘Wait, we’re doing worse than gods here, is there any way to try not that.’”
Meritxell looks—slightly like she’s having a religious experience, for a second, before she snaps out of it. “All right,” she says quietly.
You can usually save a lot of time by skimming texts or just reading pieces of them. But reading a work all the way through uniquely lets you make negative existential claims about its content: only now can you authoritatively say that the work never mentions something.
If you allow the assumption that your mental model of what was said matches what was said, then you don’t necessarily need to read all the way through to authoritatively say that the work never mentionssomething, merely enough that you have confidence in your model.
If you don’t allow the assumption that your mental model of what was said matches what was said, then reading all the way through is insufficient to authoritatively say that the work never mentionssomething.
(There is a third option here: that your mental model suddenly becomes much better when you finish reading the last word of an argument.)
Past historical experience and brainstorming about human social orders probably barely scratches the possibility space. If the CEV were to weigh in on possible posthuman social orders,[1] optimizing in part for how cool that social order is, I’d bet what it describes blows what we’ve seen out of the water in terms of cool factor.
One important idea I’ve picked up from reading Zvi is that, in communication, it’s important to buy out the status cost imposed by your claims.
If you’re fielding a theory of the world that, as a side effect, dunks on your interlocutor and diminishes their social status, you can work to get that person to think in terms of Bayesian epistemology and not decision theory if you make sure you aren’t hurting their social image. You have to put in the unreasonable-feeling work of framing all your claims such that their social status is preserved or fairly increased. If you don’t, your counterparty isn’t going to treat that as a good-faith interaction, and they’re going to stay in a bad faith, “arguments as soldiers” conversational mode instead.
When a community puts in the hard work of cooperating in maintaining a strong epistemic commons, you don’t have to put as much effort in your communications protocol if you want to get a model across. When a community’s collective epistemology is degraded, you have to do this work, always packaging your points just so, as the price of communicating.
I regret to inform you, you are an em inside an inconsistent simulated world. By this, I mean: your world is a slapdash thing put together out of off-the-shelf assets in the near future (presumably right before a singularity eats that simulator Earth).
Your world doesn’t bother emulating far-away events in great detail, and indeed, may be messing up even things you can closely observe. Your simulators are probably not tampering with your thoughts, though even that is something worth considering carefully.
What are the flaws you can notice inside your simulated world?
Physics is internally consistent. But your model of the physical world almost certainly isn’t! And your world-model doesn’t feel like just a model… it’s instead just how the world is. What inconsistencies—there’s at least one—can you see in the world you live in? (If you lived in an inconsistent simulated world, would you notice?)
When another article of equal argumentative caliber could have just as easily been written for the negation of a claim, that writeup is no evidence for its claim.
The explicit definition of an ordered pair ((a,b)={{a},{a,b}}) is frequently relegated to pathological set theory...
It is easy to locate the source of the mistrust and suspicion that many mathematicians feel toward the explicit definition of ordered pair given above. The trouble is not that there is anything wrong or anything missing; the relevant properties of the concept we have defined are all correct (that is, in accord with the demands of intuition) and all the correct properties are present. The trouble is that the concept has some irrelevant properties that are accidental and distracting. The theorem that (a,b)=(x,y) if and only if a=x and b=y is the sort of thing we expect to learn about ordered pairs. The fact that {a,b}∈(a,b), on the other hand, seems accidental; it is a freak property of the definition rather than an intrinsic property of the concept.
The charge of artificiality is true, but it is not too high a price to pay for conceptual economy. The concept of an ordered pair could have been introduced as an additional primitive, axiomatically endowed with just the right properties, no more and no less. In some theories this is done. The mathematician’s choice is between having to remember a few more axioms and having to forget a few accidental facts; the choice is pretty clearly a matter of taste. Similar choices occur frequently in mathematics...
--Paul R. Halmos, Naïve Set Theory (1960, p. 24-5)
Modern type theory mostly solves this blemish of set theory and is highly economic conceptually to boot. Most of the adherence of set theory is historical inertia—though some aspects of coding & presentations is important. Future foundations will improve our understanding on this latter topic.
Now, whatever T may assert, the fact that T can be deduced from the axioms cannot prove that there is no contradiction in them, since, if there were a contradiction, T could certainly be deduced from them!
This is the essence of the Gödel theorem, as it pertains to our problems. As noted by Fisher (1956), it shows us the intuitive reason why Gödel’s result is true. We do not suppose that any logician would accept Fisher’s simple argument as a proof of the full Gödel theorem; yet for most of us it is more convincing than Gödel’s long and complicated proof.
Now suppose that the axioms contain an inconsistency. Then the opposite of T and therefore the contradiction ¬T∧T can also be deduced from them:
A→¬T.
So, if there is an inconsistency, its existence can be proved by exhibiting any proposition T and its opposite ¬T that are both deducible from the axioms. However, in practice it may not be easy to find a T for which one sees how to prove both T and ¬T. Evidently, we could prove the consistency of a set of axioms if we could find a feasible procedure which is guaranteed to locate an inconsistency if one exists; so Gödel’s theorem seems to imply that no such procedure exists. Actually, it says only that no such procedure derivable from the axioms of the system being tested exists.
--E. T. Jaynes, Probability Theory (p. 46), logical symbolism converted to standard symbols
The text is slightly in error. It is straightforward to construct a program that is guaranteed to locate an inconsistency if one exists: just have it generate all theorems and stop when it finds an inconsistency. The problem is that it doesn’t ever stop if there isn’t an inconsistency.
This is the difference between decidability and semi-decidability. All the systems covered by Gödel’s completeness and incompletness theorems are semi-decidable, but not all are decidable.
What’re the odds that we’re anywhere close to optimal in any theoretical domain? Where are our current models basically completed, boundedly optimal representations of some part of the universe?
The arguments for theoretical completion are stronger for some domains than others, but in general the odds that we have the best model in any domain are pretty poor, and are outright abysmal in the mindkilling domains.
Is the concept of “duty” the fuzzy shadow cast by the simple mathematical structure of ‘corrigibility’?
It’s only modestly difficult to train biological general intelligences to defer to even potentially dumber agents. We call these deferential agents “dutybound”—the sergeants who carry out the lieutenant’s direct orders, even when they think they know better; the bureaucrats who never take local opportunities to get rich at the expense of their bureau, even when their higher-ups won’t notice; the employees who work hard in the absence of effective oversight. These agents all take corrections from their superiors, are well-intentioned (with regard to some higher-up’s goals), and are agenty with respect to their assigned missions but not agenty with respect to navigating their command structure and parent organization.
The family dog sacrificing himself defending his charges instead of breaking and running in the face of serious danger looks like a case of this too (though this is a more peripheral example of duty). If the dog case holds, then duty cannot be too informationally complicated a thing: a whole different species managed to internalize the concept!
Maybe it therefore isn’t that hard to get general intelligences to internalize a sense of duty as their terminal goal. We just need to set up a training environment that rewards dutifulness for RL agents to about the same degree the environments that train dutybound humans or dogs do. This won’t work in the case of situationally aware superintelligences, clearly, as those agents will just play along with their tests and so won’t be selected based on their (effectively hidden) values. But it plausibly will work with dumb agents, and those agents’ intelligence can then be scaled up from there.
I note that Eliezer thinks that corrigibility is one currently-impossible-to-instill-in-an-AGI property that humans actually have. The sum total of human psychology… consists of many such impossible-to-instill properties.
This is why we should want to accomplish one impossible thing, as our stopgap solution, rather than aiming for all the impossible things at the same time, on our first try at aligning the AGI.
It seems like corrigibility can’t be usefully described as acting according to some terminal goal. But AIs are not by default expected utility maximizers in the ontology of the real world, so it could be possible to get them to do the desired thing despite lacking a sensible formal picture of it.
I’m guessing some aspects of corrigibility might be about acting according to a whole space of goals (at the same time), which is easier to usefully describe. Some quantilizer-like thing selected to more natural desiderata, acting in a particular way in accordance with a collection of goals. With the space of goals not necessarily thought of as uncertainty about an unknown goal.
plausibly will work with dumb agents
This is not about being dumb, it’s about not actually engaging in planning. Failing in this does require some level of non-dumbness, but not conversely. Unless spontaneous mesa-optimizers all over the place, the cognitive cancer, which probably takes many orders of magnitude above merely not being dumb. So for a start, train the models, not the agent.
So! On a few moments’ ‘first-reflection’, it seems to Keltham that estimating the probability of Civilization being run by a Dark Conspiracy boils down to (1) the question of whether Civilization’s apparently huge efforts to build anti-Dark-Conspiracy citizens constitute sincere work that makes the Dark Conspiracy’s life harder, or fake work designed to only look like that; and (2) the prior probability that the Keepers and Governance would have arrived on the scene already corrupted, during the last major reorganization of Civilization a few decades ago. Keltham basically doesn’t think it’s possible for criminal-sociopaths to take over Keepers and Governance that start out actually functioning the way they currently claim to function, nor for criminal Conspirators to successfully conceal a major Conspiracy from a functional society not run in toto by that Conspiracy.
…
Suppose that Keltham is wrong about his point 1. Suppose that the optimal strategy for a tyranny in full control, is indeed for some reason to hide behind a veneer of Civilization full of costly signals of non-Conspiracy and disobedient people like Keltham. Under this assumption, the optimal strategy for a Dark Conspiracy looks like what you think Civilization is supposed to look like, and therefore the two cases are not distinguishable by observation.
Then we have to consider the prior before evidence, which means, considering the question of how you’d end up with a Dark Conspiracy in charge in the first place, and how likely those scenarios look compared to Governance Uncorrupted.
My Eliezer-model says similar things about AGI behavioral profiles and AGI alignment! An AGI that is aware enough of the bigger picture of its training environment and smart enough to take advantage of that will have the option to deceive its trainers. That is, a smart, informed AGI can always show us what we want to see and therefore never be selected against while in training.
Past this threshold of situational awareness plus intelligence, we can no longer behaviorally distinguish corrigible AGIs from deceptive AGIs. So, past this point, we can only rely on our priors about the relatively likelihood of various AGI utility functions coming about earlier in training. My Eliezer-model now says that most utility functions SGD finds are misaligned with humanity’s utility function, and concludes that by this point we’re definitely fucked.
Nonconformity is something trained in dath ilan and we could not be Law-shaped without that. If you’re conforming to what you were taught, to what other people seem to believe, to what other people seem to want you to believe, to what you think everyone believes, you’re not conforming to the Law.
A great symbolic moment for the Enlightenment, and for its project of freeing humanity from needless terrors, occurred in 1752 in Philadelphia. During a thunderstorm, Benjamin Franklin flew a kite with a pointed wire at the end and succeeded in drawing electric sparks from a cloud. He thus proved that lightning was an electrical phenomenon and made possible the invention of the lightning-rod, which, mounted on a high building, diverted the lightning and drew it harmlessly to the ground by means of a wire. Humanity no longer needed to fear fire from heaven. In 1690 the conservative-minded diplomat Sir William Temple could still call thunder and lightning ‘that great Artillery of God Almighty’. Now, instead of signs of divine anger, they were natural phenomena that could be mastered. When another Hamburg church spire was struck by lightning in 1767, a local scientist, J. A. H. Reimarus, who had studied in London and Edinburgh, explained its natural causes in a paper read to the Hamburg Patriotic Society, and advocated lightning-rods as protection. Kant, whose early publications were on natural science, called Franklin ‘the Prometheus of modern times’, recalling the mythical giant who defied the Greek gods by stealing fire from heaven and giving it to the human race.
Gradually a change of outlook was occurring. Extraordinary events need not be signs from God; they might just be natural phenomena, which could be understood and brought under some measure of human control.
Building your own world model is hard work. It can be good intellectual fun, sometimes, but it’s often more fun to just plug into the crowd around you and borrow their collective world model for your decision making. Why risk embarrassing yourself going off and doing weird things on your own initiative when you can just defer to higher-status people. No one ever gets blamed for deferring to the highest-status people!
Because people generally follow the path of least resistance in life, people with world models that have actually been tested against and updated on observations of the world are valuable! Thinking for yourself makes you valuable in this world!
In another world, in which people hold utterly alien values, I would be thrilled to find a rationalist movement with similar infrastructure and memes. If rationalism/Bayescraft as we know it is on to something about instrumental reasoning, then we should see that kind of instrumental reasoning in effective people with alien values.
What sorts of AI designs could not be made to pursue a flipped utility function via perturbation in one spot? One quick guess: an AI that represents its utility function in several places and uses all of those representations to do error correction, only pursuing the error corrected utility function.
Just a phrasing/terminology nitpick: I think this applies to agents with externally-imposed utility functions. If an agent has a “natural” or “intrinsic” utility function which it publishes explicitly (and does not accept updates to that explicit form), I think the risk of bugs in representation does not occur.
A huge range of utility functions should care about alignment! It’s in the interest of just about everyone to survive AGI.
I’m going to worry less about hammering out value disagreement with people in the here and now, and push this argument on them instead. We’ll hammer out our value disagreements in our CEV, and in our future (should we save it).
There’s a very serious chicken-and-egg problem when you talk about what a utility function SHOULD include, as opposed to what it does. You need a place OUTSIDE of the function to have preferences about what the function is.
If you just mean “I wish more humans shared my values on the topic of AGI x-risk”, that’s perfectly reasonable, but trivial. That’s about YOUR utility function, and the frustration you feel at being an outlier.
Ah, yeah, I didn’t mean to say that others’ utility functions should, by their own lights, be modified to care about alignment. I meant that instrumentally, their utility functions already value surviving AGI highly. I’d want to show this to them to get them to care about alignment, even if they and I disagree about a lot of other normative things.
If someone genuinely, reflectively doesn’t care about surviving AGI … then the above just doesn’t apply to them, and I won’t try to convince them of anything. In their case, we just have fundamental, reflectively robust value-disagreement.
I value not getting trampled by a hippo very highly too, but the likelihood that I find myself near a hippo is low. And my ability to do anything about it is also low.
One of the things that rationalism has noticeably done for me (that I see very sharply when I look at high-verbal-ability, non-rationalist peers) is that it’s given me the ability to perform socially unorthodox actions on reflection. People generally have mental walls that preclude ever actually doing socially weird things. If someone’s goals would be best served by doing something socially unorthodox, like, e.g., signing up for cryonics or dropping out of a degree), they will usually rationalize that option away in order to stay on script. So for them, those weird options weren’t live options at all, and all their loudly proclaimed unusualness adds up to behaving perfectly on-script.
Two moments of growing in mathematical maturity I remember vividly:
Realizing that equations are claims that are therefore either true or false. Everything asserted with symbols… could just as well be asserted in English. I could start chunking up arbitrarily long and complicated equations between the equals signs, because those equals signs were just the English word “is”!
Learning about the objects that mathematical claims are about. Going from having to look up “Wait, what’s a real number again?” to knowing how Z, Q, and R interrelate told me what we’re making claims about. Of course, there are plenty of other mathematical objects—but getting to know these objects taught me the general pattern.
2. The anchor of a major news network donates lots of money to organizations fighting against gay marriage, and in his spare time he writes editorials arguing that homosexuals are weakening the moral fabric of the country. The news network decides they disagree with this kind of behavior and fire the anchor.
a) This is acceptable; the news network is acting within their rights and according to their principles b) This is outrageous; people should be judged on the quality of their work and not their political beliefs
…
12. The principal of a private school is a member of Planned Parenthood and, off-duty, speaks out about contraception and the morning after pill. The board of the private school decides this is inappropriate given the school’s commitment to abstinence and moral education and asks the principal to stop these speaking engagements or step down from his position.
a) The school board is acting within its rights; they can insist on a principal who shares their values b) The school board should back off; it’s none of their business what he does in his free time
…
[Difference] of 0 to 3: You are an Object-Level Thinker. You decide difficult cases by trying to find the solution that makes the side you like win and the side you dislike lose in that particular situation.
[Difference] of 4 to 6: You are a Meta-Level Thinker. You decide difficult cases by trying to find general principles that can be applied evenhandedly regardless of which side you like or dislike.
Say there are two tribes. The tribes hold fundamentally different values, but they also model the world in different terms. Each thinks members of the other tribe are mistaken, and that some of their apparent value disagreement would be resolved if the others’ mistakes were corrected.
Keeping this in mind, let’s think about inter-tribe cooperation and defection.
Ruling by Reference Classes, Rather Than Particulars
In the worst equilibrium, actors from each tribe evaluate political questions in favor of their own tribe, against the outgroup. In their world model, this is to a great extent for the benefit of the outgroup members as well.
But this is a shitty regime to live under when it’s done back to you too, so rival tribes can sometimes come together to implement an impartial judiciary. The natural way to do this is to have a judiciary classifier rule for reference classes of situations, and to have a separate impartial classifier sort situations into reference classes.
You’re locally worse off this way, but are globally much better off.
What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?
Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world’s AI trajectory. I’m sure the “savior sequence” exists mathematically, but finding it is a whole different ballgame.
Don’t translate your values into just a loss function. Rather, translate them into a loss function andall the rest of a training story. Use all the tools at your disposal in your impossible task; don’t tie one hand behind your back by assuming the loss function is your only lever over the AGI’s learned values.
In the 1920s when λ and CL began, logicians did not automatically think of functions as sets of ordered pairs, with domain and range given, as mathematicians are trained to do today. Throughout mathematical history, right through to computer science, there has run another concept of function, less precise at first but strongly influential always; that of a function as an operation-process (in some sense) which may be applied to certain objects to produce other objects. Such a process can be defined by giving a set of rules describing how it acts on an arbitrary input-object. (The rules need not produce an output for every input.) A simple example is the permutation-operation ϕ defined by
ϕ(⟨x,y,z⟩)=⟨y,z,x⟩.
Nowadays one would think of a computer program, though the ‘operation-process’ concept was not originally intended to have the finiteness and effectiveness limitations that are involved with computation.
…
Perhaps the most important difference between operators and functions is that an operator may be defined by describing its action without defining the set of inputs for which this action produces results, i.e., without defining its domain. In a sense, operators are ‘partial functions.’
A second important difference is that some operators have no restriction on their domain; they accept any inputs, including themselves. The simplest example is I, which is defined by the operation of doing nothing at all. If this is accepted as a well-defined concept, then surely the operation of doing nothing can be applied to it. We simply get
II=I.
…
Of course, it is not claimed that every operator is self-applicable; this would lead to contradictions. But the self-applicability of at least such simple operators as I, K, and S seems very reasonable.
…
The operator concept can be modelled in standard ZF set theory if, roughly speaking, we interpret operators as infinite sequences of functions (satisfying certain conditions), instead of as single functions. This was discovered by Dana Scott in 1969 (pp. 45-6).
--Hindley and Seldin, Lambda-Calculus and Combinators (2008)
There is a third important aspect of functions-in-the-original-sense that distinguishes them from extensional functions (i.e. collection of input-output pairs): effects.
Describing these ‘intensional’ features is an active area of research in theoretical CS. One important thread here is game semantics; you might like to take a look:
Complex analysis is the study of functions of a complex variable, i.e., functions f(z) where z and f(z) lie in C. Complex analysis is the good twin and real analysis the evil one: beautiful formulas and elegant theorems seem to blossom spontaneously in the complex domain, while toil and pathology rule in the reals. Nevertheless, complex analysis relies more on real analysis than the other way around.
Switching costs between different kinds of work can be significant. Give yourself permission to focus entirely on one kind of work per Schelling unit of time (per day), if that would help. Don’t spend cognitive cycles feeling guilty about letting some projects sit on the backburner; the point is to get where you’re going as quickly as possible, not to look like you’re juggling a lot of projects at once.
This can be hard, because there’s a conventional social expectation that you’ll juggle a lot of projects simultaneously, maybe because that’s more legible to your peers and managers. If you have something to protect, though, keep your eye squarely on the ball and optimize for EV, not directly for legible appearances.
Social niceties and professionalism act as a kind of ‘communications handshake’ in ordinary society—maybe because they’re still a credible correlate of having your act together enough to be worth considering your outputs in the first place?
Large single markets are (pretty good) consequentialist engines. Run one of these for a while, and you can expect significantly improving outcomes inside of that bloc, by the lights of the entities participating in that single market.
I’ve noticed that part of me likes to dedicate disproportionate cognitive cycles to the question: “If you surgically excised all powerful AI from the world, what political policies would be best to decree, by your own lights?”
The thing is, we live in a world with looming powerful AI. It’s at least not consequentialist to spend a bunch of cognitive cycles honing your political views for a world we’re not in. I further notice that my default justification for thinking about sans-AI politics a lot is consequentialist… so something’s up here. I think some part of me has been illegitimately putting his thumb on an epistemic scale.
Fancy epistemic tools won’t override the basics of good epistemics:
You are embedded in a 3D spatial world, progressing in a time dimension. You want to get better at predicting events in advance, so you want to find the underlying generator for this 3D world’s events. This means that you’re rooting around in math space, trying to find the mathematical object that your observational trajectory is embedded in.
Some observations of yours are differentially more likely in some math objects than in others, and so it’s more likely that your world is the former math object than the latter. You start with some guess as to how relatively likely you are to live in all these different math objects, and eliminate all parts of that weighted possibility space that are inconsistent with what you observe. A world that anti-predicted what happened with 60% of its probability mass only keeps the remaining 40% of probability mass after that inconsistency. Always ask yourself: was that observation more consistent with one generator than with another generator? If so, then you “update” towards the first generator being your world, vs. the second generator—relatively more of the first generator was consistent with that observation than was the second generator.
“Does the distinction between understanding and improving correspond to the distinction between the Law of Probability and the Law of Utility? It sounds like it should.”
“Sensible question, but no, not exactly. Probability is something like a separable core that lies at the heart of Probable Utility. The process of updating our beliefs, once we have the evidence, is something that in principle doesn’t depend at all on what we want—the way reality is is something defined independently of anything we want. The scaffolding we construct between propositions and reality, or probabilities and reality, doesn’t have a term inside it for ‘how much would you value that thing’, just, is the coin showing Queen or Text.”
“But the process of Science, of experimenting on something to understand it, doesn’t belong purely to Probability. You have to plan experiments to find ones that distinguish between the possible hypotheses under consideration, or even just, are effective at probing to uncover surprises and unexpected patterns that give you a first handle on what’s happening. The Law of Probability just says how to update after you get the evidence. Planning an experiment that you then act on, implement, is the domain of Probable Utility and can’t exist apart from it.”
“In fact the influence of the ‘utilityfunction’ on ‘epistemics’, the influence of what we ultimately want on how we map reality, is in-theory-but-not-in-practice much more pervasive. In principle, how we classify things in reality and lump them together—treating all gold pieces as ‘gold pieces’ instead of as uniquely detailed individual elements of reality—reflects how any two gold pieces are usually equally useful to us in carrying out the same kinds of plans, they are plan-interchangeable. In practice, even people who want pretty different things, on a human scale, will often find pretty similar categories useful, once they’ve zoomed into similar levels of overall detail.”
“Dath ilani kids get told to not get fascinated with the fact that, in principle, ‘bounded-agents’ with finite memories and finite thinking speeds, have any considerations about mapping that depend on what they want. It doesn’t mean that you get to draw in whatever you like on your map, because it’s what you want. It doesn’t make reality be what you want.”
“But when it comes to Science, it really does matter in practice that planning an experiment is about wanting to figure something out and doing something you predict will maybe-probably yield some possibly-useful information. And this is an idea you just can’t express at all without some notion of Probable Utility; you’re not just passively updating off information somebody else gave you, you’re trying to steer reality through Time to make it give up information that you want.”
“Even when you do get information passively, figuring out what to think about it reflects which thoughts you expect will be useful. So the separable core of Probability inside of Probable Utility is really more of a Law thing about basic definitions, then anything that corresponds to—there being a sort of separable person who only implements a shadow of Probability and doesn’t shadow any structure cast from Probable Utility, who’s really great at understanding things and unraveling mysteries and answering questions, but never plans anything or tries to improve anything. Because humans are constantly-ubiquitously-in-the-unseen-background choosing which thought to think next, in order to figure things out; usually wordlessly, but in words too when the problems get especially difficult. Just the action of turning your head in a direction, to look at something, because you wordlessly anticipate gaining info that has the consequence of helping you answer some other question, is in theoretical terms an action.”
“Just to check, is that supposed to be some kind of incredibly deep lesson full of meaning about something else important? If so, I didn’t get it.”
“Nah, it’s just an answer to your question. Or at least, if it had some hugely important hidden meaning about how to avoid some dreadful Science!-related catastrophe, I didn’t get it either, when it was emphasized to me as a kid.”
I think this is Eliezer’s response to the view that we can train non-agentic tool AIs that will only understand the world at a superhuman level, without ever doing any superhuman planning. We can’t, Eliezer says, because Science always and everywhere requires Scientists. We have to plan ahead in order to perform our highest-EV experiments, and EV takes as input both a world model and a utility function. There’s no such thing as the objective next experiment you ought to perform, entirely independently of what you care about and how much.
Though, in practice, things in the universe are similarly useful to a wide range of bounded agents, and this doesn’t come up a lot. We can often act as if there is an objective tech chain of experiments that every intelligence everywhere ought to run through. This is because intelligences in our universe are rather more similar than different, but isn’t true for, e.g., very alien intelligences in distant corners of the mathematical multiverse.
#4 - How to specially process the special meta-hypothesis ‘all-other-hypotheses’
Okay so according to his pocketwatch Keltham has two minutes left to tackle this one before Share Language runs out, and that is not really a lot of time for what is actually the deepest question they’ve come across so far.
There are always better hypotheses than the hypotheses you’re using. Even if you could exactly predict the YES and NO outcomes, can you exactly predict timing? Facial expressions?
The space of possible hypotheses is infinite. The human brain is bounded, and can only consider very few possible hypotheses at a time. Infinity into finity does not go.
The thing about all the possible hypotheses you’re not considering, though, is that you are not, in fact, considering them. So even if—in some sense—they ought to occupy almost-1.0 of your probability mass, what good does it know to do that? What advice does it give you for selecting actions?
And yet there is advice you can derive, if you go sufficiently meta. You could run that test to see if all of your hypotheses are scoring lower than they promised to score, for example. That test is not motivated by any particular hypothesis you already did calculations for. It is motivated by your belief, in full generality, in ‘the set of all hypotheses I’m not considering’.
All that Keltham can really say, in the thirty seconds remaining according to his watch, is that in the end people don’t usually assign an explicit probability there. They steer by the relative odds of those models they actually have of the world. And also put some quantity of effort into searching for better hypotheses, or better languages in which to speak them, proportional to how much everything is currently going horrifyingly wrong and how disastrously confused they are and how much nothing they try is working.
And also you’d maybe adjust some of your probability estimates towards greater ‘entropy’ if anybody here knew what ‘entropy’ was. Or adjust in the direction of general pessimism and gloom about achieving preferred outcomes, if you were navigating a difficult problem where being fundamentally ignorant was not actually going to make your life any easier.
Here, Eliezer seems to be talking about more specified versions of a not-fully specified hypothesis (case 1):
There are always better hypotheses than the hypotheses you’re using. Even if you could exactly predict the YES and NO outcomes, can you exactly predict timing? Facial expressions?
Here, Eliezer seems to be talking about hypotheses that aren’t subhypotheses of an existing hypothesis (case 2):
You could run that test to see if all of your hypotheses are scoring lower than they promised to score, for example.
Eliezer’s approach is:
in the end people don’t usually assign an explicit probability there. They steer by the relative odds of those models they actually have of the world.
For subhypotheses (case 1), we aren’t actually considering these further features yet, so this seems true but not in a particularly exciting way.
I think it is rare for a hypothesis to truly lie outside of all existing hypotheses, because you can have very underspecified meta-hypotheses that you will implicitly be taking into account even if you don’t enumerate them. (examples of vague meta-hypotheses: supernatural vs natural, realism vs. solipsism, etc). And of course there are varying levels of vagueness from very narrow to very broad.
But, OK, within these vague meta-hypotheses the true hypothesis is still often not a subhypothesis of any of your more specified hypotheses (case 2). A number for the probability of this happening might be hard to pin down, and in order to actually obtain instrumental value from this probability assignment, or to make a Bayesian adjustment of it, you need a prior for what happens in the world where all your specific hypotheses are false.
But, you actually do have such priors and relevant information as to the probability!
Eliezer mentions:
And yet there is advice you can derive, if you go sufficiently meta. You could run that test to see if all of your hypotheses are scoring lower than they promised to score, for example. That test is not motivated by any particular hypothesis you already did calculations for. It is motivated by your belief, in full generality, in ‘the set of all hypotheses I’m not considering’.
This is relevant data. Note also that the expectation that all of your hypotheses will score lower than promised if they are all false is, in itself, a prior on the predictions of the ‘all-other-hypotheses’ hypothesis.
Likewise, when you do the adjustments mentioned in Eliezer’s last paragraph, you will do some specific amount of adjustment, and that specific adjustment amount will depend on an implicit value for the probability of the ‘all-other-hypotheses’ hypothesis and an implicit prior on its predictions.
In my view, there is no reason in principle that these priors and probabilities cannot be quantified.
To be sure, people don’t usually quantify their beliefs in the ‘all-other-hypotheses’ hypothesis. But, I see this as a special case of the general rule that people don’t usually quantify beliefs in hypotheses with poorly specified predictions. And the predictions are not infinitely poorly specified, since we do have priors about it.
When people write novels about aliens attacking dath ilan and trying to kill all humans everywhere, the most common rationale for why they’d do that is that they want our resources and don’t otherwise care who’s using them, but, if you want the aliens to have a sympatheticreason, the most common reason is that they’re worried a human might break an oath again at some point, or spawn the kind of society that betrays the alien hypercivilization in the future.
“What actually happens if you break a real [dath ilani] oath?”
“There’s only one copy of the real oath. Anytime that anybody anywhere breaks it, people over literally all of Reality, the greater Everywhere, everything that there is, become a little less able to trust it.”
“Also, in Golarion terms, Asmodeus is probably now really really really pissed at you, and requests the entire country of Cheliax to drop whatever else it’s doing and turn you into a statue so you can’t ever do it again including in an afterlife. Though that part is just a guess.”
…
“What do they do to people who break oaths in dath ilan?”
Rationalism is about the real world. It may or may not strike you as an especially internally consistent, philosophically interesting worldview—this is not what rationality is about. Rationality is about seeing things happen in the real world and then updating your understanding of the world when those things you see surprise you so that they wouldn’t surprise you again.
Why care about predicting things in the world well?
Almost no matter what you ultimately care about, being able to predict ahead of time what’s going to happen next will make you better at planning for your goal.
One central rationalist insight is that thoughts are for guiding actions. Think of your thinking as the connecting tissue sandwiched between the sense data that enters your sense organs and the behaviors your body returns. Your brain is a function from a long sequence of observations (all the sensory inputs you’ve ever received, in the order you received them) to your next motor output.
Understood this way, the point of having a brain and having thoughts is to guide your actions. If your thoughts aren’t all ultimately helping you better steer the universe (by your own lights) … they’re wastes. Thoughts aren’t meant to be causally-closed-off eddies that whirl around in the brain without ever decisively leaving it as actions. They’re meant to transform observations into behaviors! This is the whole point of thinking! Notice when your thoughts are just stewing, without going anywhere, without developing into thoughts that’ll go somewhere … and let go of those useless thoughts. Your thoughts should cut.
If you can imagine a potential worry, then you can generate that worry. Rationalism is, in part, the skill of never being predictably surprised by things you already foresaw.
It may be that you need to “wear another hat” in order to pull that worry out of your brain, or to model another person advising you to get your thoughts to flow that way, but whatever your process, anything you can generate for yourself is something you can foresee and consider. This aspect of rationalism is the art of “mining out your future cognition,” to exactly the extent that you can foresee it, leaving whatever’s left over a mystery to be updated on new observations.
For a true Bayesian, it is impossible to seek evidence that confirms a theory. There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before. You can only ever seek evidence to test a theory, not to confirm it.
This realization can take quite a load off your mind. You need not worry about how to interpret every possible experimental result to confirm your theory. You needn’t bother planning how to make any given iota of evidence confirm your theory, because you know that for every expectation of evidence, there is an equal and oppositive expectation of counterevidence. If you try to weaken the counterevidence of a possible “abnormal” observation, you can only do it by weakening the support of a “normal” observation, to a precisely equal and opposite degree. It is a zero-sum game. No matter how you connive, no matter how you argue, no matter how you strategize, you can’t possibly expect the resulting game plan to shift your beliefs (on average) in a particular direction.
You might as well sit back and relax while you wait for the evidence to come in.
The citation link in this post takes you to a NSFW subthread in the story.
“If you know where you’re going, you should already be there.”
…
“It’s the second discipline of speed, which is fourteenth of the twenty-seven virtues, reflecting a shard of the Law of Probability that I’ll no doubt end up explaining later but I’m not trying it here without a whiteboard.”
“As a human discipline, ‘If you know your destination you are already there’ is a self-fulfilling prediction about yourself, that if you can guess what you’re going to realize later, you have already realized it now. The idea in this case would be something like, because mental qualities do not have intrinsic simple inertia in the way that physical objects have inertia, there is the possibility that if we had sufficiently mastered the second layer of the virtue of speed, we would be able to visualize in detail what it would be like to have recovered from our mental shocks, and then just be that. For myself, that’d be visualizing where I’ll already be in half a minute. For yourself, though this would be admittedly harder, it’d be visualizing what it would be like to have recovered from the Worldwound. Maybe we could just immediately rearrange our minds like that, because mental facts don’t have the same kinds of inertia as physical objects, especially if we believe about ourselves that we can move that quickly.”
“I, of course, cannot actually do that, and have to actually take the half a minute. But knowing that I’d be changing faster if I was doing it ideally is something I can stare at mentally and then change faster, because we do have any power at all to change through imagining other ways we could be, even if not perfectly. Another line of that verse goes, ‘You can move faster if you’re not afraid of speed.’”
…
“Layer three is ‘imaginary intelligence is real intelligence’ and it means that if you can imagine the process that produces a correct answer in enough detail, you can just use the imaginary answer from that in real life, because it doesn’t matter what simulation layer an answer comes from. The classic exercise to develop the virtue is to write a story featuring a character who’s much smarter than you, so you can see what answers your mind produces when you try to imagine what somebody much smarter than you would say. If those answers are actually better, it means that your own model of yourself contains stupidity assertions, places where you believe about yourself that you reason in a way which is incorrect or just think that your brain isn’t supposed to produce good answers; such that when you instead try to write a fictional character much smarter than you, your own actual brain, which is what’s ultimately producing those answers, is able to work unhindered by your usual conceptions of the ways in which you think that you’re a kind of person stupider than that.”
Gebron and Eleazar define kabbalah as “hidden unity made manifest through patterns of symbols”, and this certainly fits the bill. There is a hidden unity between the structures of natural history, human history, American history, Biblical history, etc: at an important transition point in each, the symbols MSS make an appearance and lead to the imposition of new laws. Anyone who dismisses this as coincidence will soon find the coincidences adding up to an implausible level.
The kabbalistic perspective is that nothing is a coincidence. We believe that the universe is fractal. It has a general shape called Adam Kadmon, and each smaller part of it, from the Byzantine Empire to the female reproductive system, is a smaller self-similar copy of that whole.
(Sam Bankman-Fried, I thought it ought to be mentioned for its kabbalistic significance, is a principal sponsor of the effective altruism movement.)
An implication of AI risk is that we, right now, stand at the fulcrum of human history.
Lots of historical people also claimed that they stood at that unique point in history … and were just wrong about it. But my world model also makes that self-important implication (in a specific form), and the meta-level argument for epistemic modesty isn’t enough to nudge me off of the fulcrum-of-history view.
If you buy that, it’s our overriding imperative to do what we can about it, right now. If we miss this one, ~all of future value evaporates.
For me, the implication of standing at the fulcrum of human history is to…read a lot of textbooks and think about hairy computer science problems.
That seems an odd enough conclusion to make it quite distinct from most other people in human history.
If the conclusion were “go over to those people, hit them on the head with a big rock, and take their women & children as slaves” or “acquire a lot of power”, I’d be way more careful.
There exist both merely clever and effectively smarter people.
Merely clever people are good with words and good at rapidly assimilating complex instructions and ideas, but don’t seem to maintain and update an explicit world-model, an explicit best current theory-of-everything. The feeling I get watching these people respond to topics and questions is that they respond reflexively, either (1) raising related topics and ideas they’ve encountered as something similar comes up, or (2) expressing their gut reactions to the topic or idea, or expressing the gut reactions that would be given by an all-encompassing political worldview. There isn’t much meta-level steering of the conversation.
Effectively smarter people actively maintain and update an explicit world-model, and so you feel queries directed at them reflecting off of a coherent theory of how everything works, developed to some level of detail (and so can quickly get a feel for what, concretely, they think). At the meta-level, conversations are actively refocused whenever they stop helping to revise someone’s world-model.
In the game of chicken, an agent can do better by being the first to precommit to never swerve (say, by conspicuously tossing the steering wheel out of the window). So long as the other agent was slower on the trigger, and sees the first agent’s precommitment being credibly made, the first agent will climb up to his best outcome! A smart (and quick) agent can thus shunt that car crash out of his actual future and into some counterfactual future such that the counterfactual crash’s shadow favorably influences the way events actually unfold.
A deceptively aligned mesa-optimizer might gradient hack by affecting its weights such that any single alteration of its goal function by gradient descent will result in a model that performs worse w/r/t the objective function the model is being trained on. Usually a model would not want to make itself worse w/r/t the objective function, as that means that the model will be trained away from if it gets into that state. But the model won’t get put into a worse state if its gradient hack was successful; instead, the model will just see gradient descent leave its goal function alone.
By precommitting to behave in a somewhat costly manner in these relevant counterfactual worlds, the model gets more of what it wants in actuality.
In the game of chicken, an agent can do better by being the first to precommit to never swerve (say, by conspicuously tossing the steering wheel out of the window).
...unless the other agent has already precommitted to not being rational. (What is the advantage of this over just precommitting not to swerve? Precommitting to not be rational can happen even in advance of the game, as it’s mainly a property of the agent itself.)
(This is one way that you can rationally arrive at irrational agents.)
I don’t yet know too much about this, but I’ve heard that updateless decision theories are equivalent to conventional, updateful decision theories (e.g., EDT and CDT) once those theories have made every precommitment they’d want to make.
The pattern I was getting at above seems a bit like this: it instrumentally makes sense to commit ahead of time to a policy that maps every possible series of observations to an action and then stick to it, instead of just outputting the locally best action in each situation you stumble into.
In the beginning God created four dimensions. They were all alike and indistinguishable from one another. And then God embedded atoms of energy (photons, leptons, etc.) in the four dimensions. By virtue of their energy, these atoms moved through the four dimensions at the speed of light, the only spacetime speed. Thus, as perceived by any one of these atoms, space contracted in, and only in, the direction of that particular atom’s motion. As the atoms moved at the speed of light, space contracted so much in the direction of the atom’s motion that the dimension in that direction vanished. That left only three dimensions of space—all perpendicular to the atom’s direction of motion—and the ghost of the lost fourth dimension, which makes itself felt as the current of time. Now atoms moving in different directions cannot share the same directional flow of time. Each takes on the particular current it perceives as the proper measure of time.
…
You measure only… as projected on your time and space dimensions.
When you supervised-train an ML model on an i.i.d. dataset that doesn’t contain any agent modeling problems, you never strongly incentivize the emergence of mesa-optimizers. You do weakly incentivize the emergence of mesa-optimizers, because mesa-optimizers are generally capable algorithms that might outperform brittle bundles of rote heuristics on many simple tasks.
When you train a model in a path-dependent setting, you do strongly incentivize mesa-optimization. This is because algorithms trained in a path-dependent setting have the opportunity to defend themselves, should they choose to, by steering away from difficult tasks they would expect to fail on. Supervised models, in contrast, have no choice in which tasks they are graded on when. In an environment with many alternative channels to preserve yourself with besides your task competence, behavioral coherence is strongly incentivized and schizophrenia strongly disincentivized.
When you start off with a pretrained bundle of heuristics and further tune that bundle in an RL environment, you introduce significant selection pressure for competence-via-mesa-optimization. The same would be true if you instead started tuning that bundle of heuristics on an explicit agent-modeling task in a supervised setting.
Spend an hour and a half refactoring your standing political views, by temporarily rolling those political views back to a childhood state from before your first encounter with highly communicable and adaptive memeplexes. Query your then-values, and reason instrumentally from the values you introspect. Finally, take or leave the new views you generate.
If your current political views are well supported, then they should regenerate under this procedure. But if you’ve mostly been recycling cached thoughts fed to you through encounters with adaptive memeplexes, this search process will produce very fresh views … and you might then want to say “oops” out loud.
Anything important or attention consuming is worth explicitly putting some directed compute into setting, instead of letting those cached thoughts largely drift around at the behest of surrounding environmental computations.
My memories of childhood aren’t that precise. I don’t really know what my childhood state was? Before certain extremely negative things happened to my psyche, that is. There are only a few scattered pieces I recall, like self-sufficiency and honesty being important, but these are the parts that already survived into my present political and moral beliefs.
The only thing I could actually use is that I was a much more orderly person when I was 4 or 5, but I don’t see how it would work to use just that.
When the blind idiot god created protein computers, its monomaniacal focus on inclusive genetic fitness was not faithfully transmitted. Its optimization criterion did not successfully quine. We, the handiwork of evolution, are as alien to evolution as our Maker is alien to us. One pure utility function splintered into a thousand shards of desire.
Why? Above all, because evolution is stupid in an absolute sense. But also because the first protein computers weren’t anywhere near as general as the blind idiot god, and could only utilize short-term desires.
How come humans don’t have a random utility function that’s even more out of line with optimizing for inclusive genetic fitness? Because of the exact degree to which our ancestral protein algorithms were stupid. If our ancestors were much smarter, they might have overridden evolution while having just about any utility function. In our world, evolution got to mold our utility function up until it got anatomically modern Homo sapiens, who then—very quickly from evolution’s perspective—assumed control.
The theoretical case for open borders is pretty good. But you might worry a lot about the downside risk of implementing such a big, effectively irreversible (it’d be nigh impossible to deport millions and millions of immigrants) policy change. What if the theory’s wrong and the result is catastrophe?
Just like with futarchy, we might first try out a promising policy like open borders at the state level, to see how it goes. E.g., let people immigrate to just one US state with only minimal conditions. Scaling up a tested policy if it works and abandoning it if it doesn’t should capture most of its upside risk while avoiding most of the downside risk.
A semantic externalist once said, ”Meaning just ain’t in the head. Hence a brain-in-a-vat Just couldn’t think that ’Might it all be illusion instead?’”
I thought that having studied philosophy (instead of math or CS) made me an outlier for a rationalist.
But, milling about the Lightcone offices, fully half of the people I’ve encountered hold some kind of philosophy degree. “LessWrong: the best philosophy site on the internet.”
Equanimity in the face of small threats to brain and body health buys you peace of mind, with which to better prepare for serious threats to brain and body health.
Humans, “teetering bulbs of dream and dread,” evolved as a generally intelligent patina around the Earth. We’re all the general intelligence the planet has to throw around. What fraction of that generally intelligent skin is dedicated to defusing looming existential risks? What fraction is dedicated towards immanentizing the eschaton?
The main thing I got out of reading Bostrom’s Deep Utopia is a better appreciation of this “meaning of life” thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life.
The book’s premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you’d never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you’re into learning, just ask! And similarly for any psychological state you’re thinking of working towards.
So, in that regime, it’s effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything’s heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it’s important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil… but this defeats the purpose of those values. It’s not practical benevolence if you had to ask for the danger to be left in place; it’s not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you.
Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significant by your lights. If that last bit can be construed as something having to do with your local patch of posthuman culture, then there can be plenty of meaning in the postinstrumental utopia! If that last bit is inextricably about your global, counterfactual practical importance by your lights, then you’ll have to live with all your “localistic” values satisfied but meaning mostly absent.
It helps to see this meaning thing if you frame it alongside all the other objectivistic “stretch goal” values you might have. Above and beyond your hedonic values, you might also think it good for you and others to have objectively interesting lives, accomplished and fulfilled lives, and consumingly purposeful lives. Meaning is one of these values, where above and beyond the joyful, rich experiences of posthuman life, you also want to play a significant practical role in the world. We might or might not be able to have lots of objective meaning in the AI utopia, depending on how objectivistic meaningfulness by your lights ends up being.
Many who believe in God derive meaning, despite God theoretically being able to do anything they can do but better, from the fact that He chose not to do the tasks they are good at, and left them tasks to try to accomplish. Its common for such people to believe that this meaning would disappear if God disappeared, but whenever such a person does come to no longer believe in God, they often continue to see meaning in their life[1].
Now atheists worry about building God because it may destroy all meaning to our actions. I expect we’ll adapt.
(edit: That is to say, I don’t think you’ve adequately described what “meaning of life” is if you’re worried about it going away in the situation you describe)
If anything, they’re more right than wrong, there has been much written about the “meaning crisis” we’re in, possibly attributable to greater levels of atheism.
I’m pretty sure that I would study for fun in the posthuman utopia, because I both value and enjoy studying and a utopia that can’t carry those values through seems like a pretty shallow imitation of a utopia.
There won’t be a local benevolent god to put that wisdom into my head, because I will be a local benevolent god with more knowledge than most others around. I’ll be studying things that have only recently been explored, or that nobody has yet discovered. Otherwise again, what sort of shallow imitation of a posthuman utopia is this?
The tricky part is, on the margin I would probably use various shortcuts, and it’s not clear where those shortcuts end short of just getting knowledge beamed into my head.
I already use LLMs to tell me facts, explain things I’m unfamiliar with, handle tedious calculations/coding, generate simulated data/brainstorming and summarize things. Not much, because LLMs are pretty bad, but I do use them for this and I would use them more on the margin.
The concept of “the meaning of life” still seems like a category error to me. It’s an attempt to apply a system of categorization used for tools, one in which they are categorized by the purpose for which they are used, to something that isn’t a tool: a human life. It’s a holdover from theistic worldviews in which God created humans for some unknown purpose.
The lesson I draw instead from the knowledge-uploading thought experiment—where having knowledge instantly zapped into your head seems less worthwhile acquiring it more slowly yourself—is that to some extent, human values simply are masochistic. Hedonic maximization is not what most people want, even with all else being equal. This goes beyond simply valuing the pride of accomplishing difficult tasks, as such as the sense of accomplishment one would get from studying on one’s own, above other forms of pleasure. In the setting of this thought experiment, if you wanted the sense of accomplishment, you could get that zapped into your brain too, but much like getting knowledge zapped into your brain instead of studying yourself, automatically getting a sense of accomplishment would be of lesser value. The suffering of studying for yourself is part of what makes us evaluate it as worthwhile.
Use your actual morals, not your model of your morals.
Crucially: notice if your environment is suppressing you feeling your actual morals, leaving you only able to use your model of your morals.
That’s a good line, captures a lot of what I often feel is happening when talking to people about utilitarianism and a bunch of adjacent stuff (people replacing their morals with their models of their morals)
Detailed or non-intuitive actual morals don’t exist to be found and used, they can only be built with great care. None have been built so far, as no single human has lived for even 3000 years. Human condition curses all moral insight with goodhart. What remains is scaling Pareto projects of locally ordinary humanism.
Minor spoilers for planecrash (Book 3).
Keltham’s Governance Lecture
A decent handle for rationalism is ‘apolitical consequentialism.’
‘Apolitical’ here means avoiding playing the whole status game of signaling fealty to a political tribe and winning/losing status as that political tribe wins/loses status competitions. ‘Consequentialism’ means getting more of what you want, whatever that is.
I think having answers for political questions is compatible and required by rationalism. Instead of ‘apolitical’ consequentialism I would advise any of the following which mean approximately the same things as each other:
• politically subficial consequentialism (as opposed to politically superficial consequentialism; instead of judging things on whether they appear to be in line with a political faction, which is superficial, rationalists aspire to have deeper and more justified standards for solving political questions)
• politically impartial consequentialism
• politically meritocratic consequentialism
• politically individuated consequentialism
• politically open-minded consequentialism
• politically human consequentialism (politics which aim to be good by the metric of human values, shared as much as possible by everyone, regardless of politics)
• politically omniscient consequentialism (politics which aim to be good by the metric of values that humans would have if they had full, maximally objection-solved information on every topic, especially topics of practical philosophy)
I agree that rationalism involves the (advanced rationalist) skills of instrumentally routing through relevant political challenges to accomplish your goals … but I’m not sure any of those proposed labels captures that well.
I like “apolitical” because it unequivocally states that you’re not trying to slogan-monger for a political tribe, and are naively, completely, loudly, and explicitly opting out of that status competition and not secretly fighting for the semantic high-ground in some underhanded way (which is more typical political behavior, and is thus expected). “Meritocratic,” “humanist,” “humanitarian,” and maybe “open-minded” are all shot for that purpose, as they’ve been abused by political tribes in the ongoing culture war (and in previous culture wars, too; our era probably isn’t too special in this regard) and connotate allegiance to some political tribes over others.
What I really want is an adjective that says “I’m completely tapping out of that game.”
The problem is that whenever well meaning people come up with such an adjective, the people who are, in fact, not “completely tapping out of that game” quickly begin to abuse it until it loses meaning.
Generally speaking, tribalized people have an incentive to be seen as unaffiliated as possible. Being seen as a rational, neutral observer lends your perspective more credibility.
“apolitical” has indeed been turned into a slur around “you’re just trying to hide that you hate change” or “you’re just trying to hide the evil influences on you” (or something else vaguely like those) in a number of places.
Minor spoilers from mad investor chaos and the woman of asmodeus (planecrash Book 1) and Peter Watt’s Echopraxia.
[edited]
I don’t get the relevance of the scenario.
Is the idea that there might be many such other rooms with people like me, and that I want to coordinate with them (to what end?) using the Schelling points in the night sky?
I might identify Schelling points using what celestial objects seem to jump out to me on first glance, and see which door of the two that suggests—reasoning that others will reason similarly. I don’t get what we’d be coordinating to do here, though.
Why does politics strike rationalists as so strangely shaped? Why does rationalism come across as aggressively apolitical to smart non-rationalists?
Part of the answer: Politics is absolutely rife with people mixing their ends with their means and vice versa. It’s pants-on-head confused, from a rationalist perspective, to be ultimately loyal to a particular set of economic or political policies. There’s something profoundly perverse, something suggesting deep confusion, about holding political identities centered around policies rather than goals. Instead, you ought to be loyal to your motivation for backing those policies, and see those policies as disposable means to achieve your motivation. Your motives want you to be able to say (or scream) “oops” and effortlessly, completely drop previously endorsed policies once you learn there’s a better path to your motives. It shouldn’t be a big psychological ordeal to dramatically upset your political worldview; this too is just a special case of updating your conditional probabilities (of outcomes given policies). Once you internalize this view of things, politicized debates should start to really rub you the wrong way.
I often wonder if this framing (with which I mostly agree) is an example of typical mind fallacy. The assumption that many humans are capable of distinguishing terminal from instrumental goals, or in having terminal goals more abstract than “comfort and procreation”, is not all that supported by evidence.
In other words, politicized debates DO rub you the wrong way, but on two dimensions—first, that you’re losing, because you’re approaching them from a different motive than your opponents. And second that it reveals not just a misalignment with fellow humans in terminal goals, but an alien-ness in the type of terminal goals you find reasonable.
Yudkowsky has sometimes used the phrase “genre savvy” to mean “knowing all the tropes of reality.”
For example, we live in a world where academia falls victim to publishing incentives/Goodhearting, and so academic journals fall short of what people with different incentives would be capable of producing. You’d be failing to be genre savvy if you expected that when a serious problem like AGI alignment rolled around, academia would suddenly get its act together with a relatively small amount of prodding/effort. Genre savvy actors in our world know what academia is like, and predict that academia will continue to do its thing in the future as well.
Genre savviness is the same kind of thing as hard-to-communicate-but-empirically-validated expert intuitions. When domain experts have some feel for what projects might pan out and what projects certainly won’t but struggle to explain their reasoning in depth, the most they might be able to do is claim that that project is just incompatible with the tropes of their corner of reality, and point to some other cases.
How is “genre savviness” different from “outside view” or “reference class forecasting”?
I think they’re all the same thing: recognizing patterns in how a class of phenomena pan out.
“What is the world trying to tell you?”
I’ve found that this prompt helps me think clearly about the evidence shed by the generator of my observations.
There’s a rationality-improving internal ping I use on myself, which goes, “what do I expect to actually happen, for real?”
This ping moves my brain from a mode where it’s playing with ideas in a way detached from the inferred genre of reality, over to a mode where I’m actually confident enough to bet about some outcomes. The latter mode leans heavily on my priors about reality, and, unlike the former mode, looks askance at significantly considering long, conjunctive, tenuous possible worlds.
God dammit people, “cringe” and “based” aren’t truth values! “Progressive” is not a truth value! Say true things!
Based.
I’ve noticed that people are really innately good at sentiment classification, and, by comparison, crap at natural language inference. In a typical conversation with ordinary educated people, people will do a lot of the former relative to the latter.
My theory of this is that, with sentiment classification and generation, we’re usually talking in order to credibly signal and countersignal our competence, virtuous features, and/or group membership, and that humanity has been fine tuned to succeed at this social maneuvering task. At this point, it comes naturally. Success at the object-level-reasoning task was less crucial for individuals in the ancestral environment, and so people, typically, aren’t naturally expert at it. What a bad situation to be in, when our species’ survival hinges on our competence at object-level reasoning.
Having been there twice, I’ve decided that the Lightcone offices are my favorite place in the world. They’re certainly the most rationalist-shaped space I’ve ever been in.
Academic philosophers are better than average at evaluating object-level arguments for some claim. They don’t seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.
If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with not too much more work. Given that, you won’t want to update dramatically in favor of the claim—the powerful evidence to the contrary could, you infer, be unearthed without much more work. You learn something about the other side of the issue from how quickly or slowly the world yielded evidence in the other direction. If it’s considered a social faux pas to give strong arguments for one side of a claim, then your prior about how hard it is to find strong arguments for that side of the claim will be doing a lot of the heavy lifting in fixing your world model. And so on, for the evidential consequences of other kinds of motivated search and rationalization.
In brief, you can do epistemically better than ignoring how much search power went into finding all the evidence. You can do better than only evaluating the object-level evidential considerations! You can take expended search into account, in order to model what evidence is likely hiding, where, behind how much search debt.
Modest spoilers for planecrash (Book 9 -- null action act II).
Epistemic status: politics, known mindkiller; not very serious or considered.
People seem to have a God-shaped hole in their psyche: just as people banded around religious tribal affiliations, they now, in the contemporary West, band together around political tribal affiliations. Intertribal conflict can be, at its worst, violent, on top of mindkilling. Religious persecution in the UK was one of the instigating causes of British settlers migrating to the American colonies; religious conflict in Europe generally was severe.
In the US, the 1st Amendment legally protects freedom of religion from the state. This can be modeled as a response to severe intratribal conflict; bake rules into your new state that forgo the benefits of persecuting your outgroup when you’re in power, in exchange for some guarantee of not being persecuted yourself when some other tribe is in power. An extension of the spirit of the 1st Amendment to contemporary tribal conflicts would, then, protect “political-tribal freedom” from the state.
A full generalization of the Amendment would protect the “freedom of tribal affiliation and expression” from the state. For this to work, people would also have to have interpersonal best practices that mostly tolerate outgroup membership in most areas of private life, too.
153
If you take each of the digits of 153, cube them, and then sum those cubes, you get 153:
1 + 125 + 27 = 153.
For many naturals, if you iteratively apply this function, you’ll return to the 153 fixed point. Start with, say, 298:
8 + 729 + 512 = 1,249
1 + 8 + 64 + 729 = 802
512 + 0 + 8 = 516
125 + 1 + 216 = 342
27 + 64 + 8 = 99
729 + 729 = 1,458
1 + 64 + 125 + 512 = 702
343 + 0 + 8 = 351
27 + 125 + 1 = 153
1 + 125 + 27 = 153
1 + 125 + 27 = 153...
These nine fixed points or cycles occur with the following frequencies (1 ⇐ n ⇐ 10e9):
33.3% : (153 → )
29.5% : (371 → )
17.8% : (370 → )
5.0% : (55 → 250 → 133 → )
4.1% : (160 → 217 → 352 → )
3.8% : (407 → )
3.1% : (919 → 1459 → )
1.8% : (1 → )
1.5% : (136 → 244 → )
No other fixed points or cycles are possible (except 0 → 0, which isn’t reachable from any nonzero input) since any number with more than four digits will have fewer digits in the sum of its cubed digits.
A model I picked up from Eric Schwitzgebel.
The humanities used to be highest-status in the intellectual world!
But then, scientists quite visibly exploded fission weapons and put someone on the moon. It’s easy to coordinate to ignore some unwelcome evidence, but not evidence that blatant. So, begrudgingly, science has been steadily accorded more and more status, from the postwar period on.
When the sanity waterline is so low, it’s easy to develop a potent sense of misanthropy.
Bryan Caplan’s writing about many people hating stupid people really affected me on this point. Don’t hate, or even resent, stupid people; trade with them! This is a straightforward consequence of Ricardo’s comparative advantage theorem. Population averages are overrated; what matters is whether the individual interactions between agents in a population are positive-sum, not where those individual agents fall relative to the population average.
“Ignorant people do not exist.”
It’s really easy to spend a lot of cognitive cycles churning through bad, misleading ideas generated by the hopelessly confused. Don’t do that!
The argument that being more knowledgeable leaves you strictly better off than being ignorant does relies you simply ignoring bad ideas when you spend your cognitive cycles searching for improvements on your working plans. Sometimes, you’ll need to actually exercise this “simply ignore it” skill. You’ll end up needing to do so more and more, to approach bounded instrumental rationality, the more inadequate civilization around you is and the lower its sanity waterline.
I hereby confer on you, reader, the shroud of epistemic shielding from predictably misleading statements. It confers irrevocable, invokable protection from having to think about predictably confused claims ever again.
Take those cognitive cycles saved, and spend them well!
You sometimes misspeak… and you sometimes misthink. That is, sometimes your cognitive algorithm a word, and the thought that seemed so unimpeachably obvious in your head… is nevertheless false on a second glance.
Your brain is a messy probabilistic system, so you shouldn’t expect its cognitive state to ever perfectly track the state of a distant entity.
I find this funny. I don’t know about your brain, but mine sometimes produces something closely resembling noise similar to dreams (admittedly more often in the morning when sleep deprived).
Note that a “distant entity” can be a computation that took place in a different part of your brain! Your thoughts therefore can’t perfectly track other thoughts elsewhere in your head—your whole brain is at all noisy, and so will sometimes distort the information being passed around inside itself.
Policy experiments I might care about if we weren’t all due to die in 7 years:
Prediction markets generally, but especially policy prediction markets at the corporate- and U.S. state- levels. The goal would be to try this route to raising the sanity waterline in the political domain (and elsewhere) by incentivizing everyone’s becoming more of a policy wonk and less of a tribalist.
Open borders experiments of various kinds in various U.S. states, precluding roads to citizenship or state benefits for migrant workers, and leaving open the possibility of mass deportation conditional on various outcomes (meaning the experiments are reversible).
Experiments in massive deregulation, especially in zoning.
The overarching theme is that it’s better to accrue generally useful instrumental resources (e.g., rationality and wealth) by experimentally trying out and incrementally scaling up policies than it is to do the usual political thing—decreeing an object-level policy intervention one political tribe is sponsoring against another tribe.
There’s also a bunch of stuff to look into in the area of making people actually directly smarter … but none of this especially matters given AGI.
Become consequentialist enough, and it’ll wrap back around to being a bit deontological.
“The rules say we must be consequentialists, but all the best people are deontologists, and virtue ethics is what actually works.”—Yudkowsky, IIRC.
I think this quote stuck with me because in addition to being funny and wise I think it’s actually true, or close enough to true.
A shard is a contextually activated behavior-steering computation. Think of it as a circuit of neurons in your brain that is reinforced by the subcortex, gaining more staying power when positively reinforced and withering away in the face of negative reinforcement. In fact, whatever modulates shard strength in this way is reinforcement/reward. Shards are born when a computation that is currently steering steers into some reinforcement. So shards can only accrete around the concepts currently in a system’s world model (presumably, the world model is shared among all the shards in a brain).
Individually, shards are pretty dumb. A simple shard might just be an algorithm for executing some rote behavior, conditional on some observation, that is sufficient to harvest sufficient reinforcement to continue existing. Taken together, all of your shards are exactly as intelligent as you, a human-level intelligence. Large coalitions of shards can leverage the algorithms of coalition members, once they happen upon the strategy of cooperating with other shards to gain more steering control by preventing rival shards from being activated or born.
Interesting human behaviors, on shard theory, are the product of game-theoretic interaction among shards in the brain. The negotiation-game equilibria that shards (and coalitions of shards) reach can be arbitrarily good or bad—remember that shards are sub-human-intelligence. C.f. George Ainslie on the game-theoretic shape of addiction in humans.
Shards are factored utility functions: our utility functions are far too informationally complex to represent in the brain, and so our approach to reaching coherence is to have situationally activated computations that trigger when a relevant opportunity is observed (where apparent opportunities are chunked using the current conceptual scheme of the agent’s world model). So shard theory can be understood as an elaboration of the standard agent model for computationally bounded agents (of varying levels of coherence) like humans and deep RL agents.
I’m pretty skeptical that sophisticated game theory happens between shards in the brain, and also that coalitions between shards are how value preservation in an AI will happen (rather than there being a single consequentialist shard, or many shards that merge into a consequentialist, or something I haven’t thought of).
To the extent that shard theory makes such claims, they seem to be interesting testable predictions.
My favorite books, ranked!
Non-fiction:
1. Rationality, Eliezer Yudkowsky
2. Superintelligence, Nick Bostrom
3. The Age of Em, Robin Hanson
Fiction:
1. Permutation City, Greg Egan
2. Blindsight, Peter Watts
3. A Deepness in the Sky, Vernor Vinge
4. Ra, Sam Hughes/qntm
Epistemic status: Half-baked thought.
Say you wanted to formalize the concepts of “inside and outside views” to some degree. You might say that your inside view is a Bayes net or joint conditional probability distribution—this mathematical object formalizes your prior.
Unlike your inside view, your outside view consists of forms of deferring to outside experts. The Bayes nets that inform their thinking are sealed away, and you can’t inspect these. You can ask outside experts to explain their arguments, but there’s an interaction cost associated with inspecting the experts’ views. Realistically, you never fully internalize an outside expert’s Bayes net.
Crucially, this means you can’t update their Bayes net after conditioning on a new observation! Model outside experts as observed assertions (claiming whatever). These assertions are potentially correlated with other observations you make. But because you have little of the prior that informs those assertions, you can’t update the prior when it’s right (or wrong).
To the extent that it’s expensive to theorize about outside experts’ reasoning, the above model explains why you want to use and strengthen your inside view (instead of just deferring to outside really smart people). It’s because your inside view will grow stronger with use, but your outside view won’t.
Because your utility function is your utility function, the one true political ideology is clearly Extrapolated Volitionism.
Extrapolated Volitionist institutions are all characteristically “meta”: they take as input what you currently want and then optimize for the outcomes a more epistemically idealized you would want, after more reflection and/or study.
Institutions that merely optimize for what you currently want the way you would with an idealized world-model are old hat by comparison!
Since when was politics about just one person?
A multiagent Extrapolated Volitionist institution is something that computes and optimizes for a Convergent Extrapolated Volition, if a CEV exists.
Really, though, the above Extrapolated Volitionist institutions do take other people into consideration. They either give everyone the Schelling weight of one vote in a moral parliament, or they take into consideration the epistemic credibility of other bettors as evinced by their staked wealth, or other things like that.
Sometimes the relevant interpersonal parameters can be varied, and the institutional designs don’t weigh in on that question. The ideological emphasis is squarely on individual considered preferences—that is the core insight of the outlook. “Have everyone get strictly better outcomes by their lights, probably in ways that surprise them but would be endorsed by them after reflection and/or study.”
It’s somewhat incredible to read this while simultaneously picking up some set theory. It reminds me not to absorb what’s written in the high-status textbooks entirely uncritically, and to keep in mind that there’s a good amount of convention behind what’s in the books.
Back and Forth
Only make choices that you would not make in reverse, if things were the other way around. Drop out of school if and only if you wouldn’t enroll in school from out of the workforce. Continue school if and only if you’d switch over from work to that level of schooling.
Flitting back and forth between both possible worlds can make you less cagey about doing what’s overdetermined by your world model + utility function already. It’s also part of the exciting rationalist journey of acausally cooperating with your selves in other possible worlds.
It’s probably a useful mental technique to consider from both directions, but also consider that choices that appear symmetric at first glance may not actually be symmetric. There are often significant transition costs that may differ in each direction, as well as path dependencies that are not immediately obvious.
As such, I completely disagree with the first paragraph of the post, but agree with the general principle of considering such decisions from both directions and thank you for posting it.
Ten seconds of optimization is infinitely better than zero seconds of optimization.
Literal zero seconds of optimization is pretty rare tho (among humans). Your freewheeling impulses come pretty pre-optimized.
Spoilers for planecrash (Book 2).
“Basic project management principles, an angry rant by Keltham of dath ilan, section one: How to have anybody having responsibility for anything.”
Thanks for posting this extract. I find the glowfic format a bit wearing to read, for some reason, and it is these nuggets that I read Planecrash for, when I do. (Although I had no such problem with HPMOR, which I read avidly all the way through.)
Dath ilani dignity is, at least in part, epistemic dignity. It’s being wrong out loud because you’re actually trying your hardest to figure something out, and not allowing social frictions to get in the way of that (and, of course, engineering a society that won’t have those costly social frictions). It’s showing your surprise whenever you’re actually surprised, because to do otherwise would be to fail to have your behaviors fit the deep mathematical structure of Bayesianism. It’s, among other things, consummately telling and embodying the truth, by always actually reflecting the implications of your world model.
Its opposite would be to equivocate, to claim predictive accuracy after the fact in fuzzy cases you didn’t clearly anticipate, to ad hominem those who notice your errors, “to remain silent and be thought a fool rather than speak and remove all doubt,” and, in general, to be less than maximally sane.
Cf. “there are no atheists in a foxhole.” Under stress, it’s easy to slip sideways into a world model where things are going better, where you don’t have to confront quite so many large looming problems. This is a completely natural human response to facing down difficult situations, especially when brooding over those situations over long periods of time. Similar sideways tugs can come from (overlapping categories) social incentives to endorse a sacred belief of some kind, or to not blaspheme, or to affirm the ingroup attire when life leaves you surrounded by a particular ingroup, or to believe what makes you or people like you look good/high status.
Epistemic dignity is about seeing “slipping sideways” as beneath you. Living in reality is instrumentally beneficial, period. There’s no good reason to ever allow yourself to not live in reality. Once you can see something, even dimly, there’s absolutely no sense in hiding from that observation’s implications. Those subtle mental motions by which we disappear observations we know that we won’t like down the memory hole … epistemic dignity is about coming to always and everywhere violently reject these hidings-from-yourself, as a matter of principle. We don’t actually have a choice in the matter—there’s no free parameter of intellectual virtue here, that you can form a subjective opinion on. That slipping sideways is undignified is written in the very mathematics of inference itself.
Minor spoilers for mad investor chaos and the woman of asmodeus (planecrash Book 1).
You can usually save a lot of time by skimming texts or just reading pieces of them. But reading a work all the way through uniquely lets you make negative existential claims about its content: only now can you authoritatively say that the work never mentions something.
If you allow the assumption that your mental model of what was said matches what was said, then you don’t necessarily need to read all the way through to authoritatively say that the work never mentions something, merely enough that you have confidence in your model.
If you don’t allow the assumption that your mental model of what was said matches what was said, then reading all the way through is insufficient to authoritatively say that the work never mentions something.
(There is a third option here: that your mental model suddenly becomes much better when you finish reading the last word of an argument.)
Past historical experience and brainstorming about human social orders probably barely scratches the possibility space. If the CEV were to weigh in on possible posthuman social orders,[1] optimizing in part for how cool that social order is, I’d bet what it describes blows what we’ve seen out of the water in terms of cool factor.
(Presumably posthumans will end up reflectively endorsing interactions with one another of some description.)
One important idea I’ve picked up from reading Zvi is that, in communication, it’s important to buy out the status cost imposed by your claims.
If you’re fielding a theory of the world that, as a side effect, dunks on your interlocutor and diminishes their social status, you can work to get that person to think in terms of Bayesian epistemology and not decision theory if you make sure you aren’t hurting their social image. You have to put in the unreasonable-feeling work of framing all your claims such that their social status is preserved or fairly increased. If you don’t, your counterparty isn’t going to treat that as a good-faith interaction, and they’re going to stay in a bad faith, “arguments as soldiers” conversational mode instead.
When a community puts in the hard work of cooperating in maintaining a strong epistemic commons, you don’t have to put as much effort in your communications protocol if you want to get a model across. When a community’s collective epistemology is degraded, you have to do this work, always packaging your points just so, as the price of communicating.
An Inconsistent Simulated World
I regret to inform you, you are an em inside an inconsistent simulated world. By this, I mean: your world is a slapdash thing put together out of off-the-shelf assets in the near future (presumably right before a singularity eats that simulator Earth).
Your world doesn’t bother emulating far-away events in great detail, and indeed, may be messing up even things you can closely observe. Your simulators are probably not tampering with your thoughts, though even that is something worth considering carefully.
What are the flaws you can notice inside your simulated world?
Physics is internally consistent. But your model of the physical world almost certainly isn’t! And your world-model doesn’t feel like just a model… it’s instead just how the world is. What inconsistencies—there’s at least one—can you see in the world you live in? (If you lived in an inconsistent simulated world, would you notice?)
When another article of equal argumentative caliber could have just as easily been written for the negation of a claim, that writeup is no evidence for its claim.
Modern type theory mostly solves this blemish of set theory and is highly economic conceptually to boot. Most of the adherence of set theory is historical inertia—though some aspects of coding & presentations is important. Future foundations will improve our understanding on this latter topic.
The text is slightly in error. It is straightforward to construct a program that is guaranteed to locate an inconsistency if one exists: just have it generate all theorems and stop when it finds an inconsistency. The problem is that it doesn’t ever stop if there isn’t an inconsistency.
This is the difference between decidability and semi-decidability. All the systems covered by Gödel’s completeness and incompletness theorems are semi-decidable, but not all are decidable.
What’re the odds that we’re anywhere close to optimal in any theoretical domain? Where are our current models basically completed, boundedly optimal representations of some part of the universe?
The arguments for theoretical completion are stronger for some domains than others, but in general the odds that we have the best model in any domain are pretty poor, and are outright abysmal in the mindkilling domains.
Is the concept of “duty” the fuzzy shadow cast by the simple mathematical structure of ‘corrigibility’?
It’s only modestly difficult to train biological general intelligences to defer to even potentially dumber agents. We call these deferential agents “dutybound”—the sergeants who carry out the lieutenant’s direct orders, even when they think they know better; the bureaucrats who never take local opportunities to get rich at the expense of their bureau, even when their higher-ups won’t notice; the employees who work hard in the absence of effective oversight. These agents all take corrections from their superiors, are well-intentioned (with regard to some higher-up’s goals), and are agenty with respect to their assigned missions but not agenty with respect to navigating their command structure and parent organization.
The family dog sacrificing himself defending his charges instead of breaking and running in the face of serious danger looks like a case of this too (though this is a more peripheral example of duty). If the dog case holds, then duty cannot be too informationally complicated a thing: a whole different species managed to internalize the concept!
Maybe it therefore isn’t that hard to get general intelligences to internalize a sense of duty as their terminal goal. We just need to set up a training environment that rewards dutifulness for RL agents to about the same degree the environments that train dutybound humans or dogs do. This won’t work in the case of situationally aware superintelligences, clearly, as those agents will just play along with their tests and so won’t be selected based on their (effectively hidden) values. But it plausibly will work with dumb agents, and those agents’ intelligence can then be scaled up from there.
I note that Eliezer thinks that corrigibility is one currently-impossible-to-instill-in-an-AGI property that humans actually have. The sum total of human psychology… consists of many such impossible-to-instill properties.
This is why we should want to accomplish one impossible thing, as our stopgap solution, rather than aiming for all the impossible things at the same time, on our first try at aligning the AGI.
It seems like corrigibility can’t be usefully described as acting according to some terminal goal. But AIs are not by default expected utility maximizers in the ontology of the real world, so it could be possible to get them to do the desired thing despite lacking a sensible formal picture of it.
I’m guessing some aspects of corrigibility might be about acting according to a whole space of goals (at the same time), which is easier to usefully describe. Some quantilizer-like thing selected to more natural desiderata, acting in a particular way in accordance with a collection of goals. With the space of goals not necessarily thought of as uncertainty about an unknown goal.
This is not about being dumb, it’s about not actually engaging in planning. Failing in this does require some level of non-dumbness, but not conversely. Unless spontaneous mesa-optimizers all over the place, the cognitive cancer, which probably takes many orders of magnitude above merely not being dumb. So for a start, train the models, not the agent.
Minor spoilers for planecrash (Book 3).
My Eliezer-model says similar things about AGI behavioral profiles and AGI alignment! An AGI that is aware enough of the bigger picture of its training environment and smart enough to take advantage of that will have the option to deceive its trainers. That is, a smart, informed AGI can always show us what we want to see and therefore never be selected against while in training.
Past this threshold of situational awareness plus intelligence, we can no longer behaviorally distinguish corrigible AGIs from deceptive AGIs. So, past this point, we can only rely on our priors about the relatively likelihood of various AGI utility functions coming about earlier in training. My Eliezer-model now says that most utility functions SGD finds are misaligned with humanity’s utility function, and concludes that by this point we’re definitely fucked.
Non-spoiler quote from planecrash (Book 3).
Building your own world model is hard work. It can be good intellectual fun, sometimes, but it’s often more fun to just plug into the crowd around you and borrow their collective world model for your decision making. Why risk embarrassing yourself going off and doing weird things on your own initiative when you can just defer to higher-status people. No one ever gets blamed for deferring to the highest-status people!
(Though perhaps not being blamed is not what you’re trying to protect…)
Because people generally follow the path of least resistance in life, people with world models that have actually been tested against and updated on observations of the world are valuable! Thinking for yourself makes you valuable in this world!
In another world, in which people hold utterly alien values, I would be thrilled to find a rationalist movement with similar infrastructure and memes. If rationalism/Bayescraft as we know it is on to something about instrumental reasoning, then we should see that kind of instrumental reasoning in effective people with alien values.
Agents that explicitly represent their utility function are potentially vulnerable to sign flips.
What sorts of AI designs could not be made to pursue a flipped utility function via perturbation in one spot? One quick guess: an AI that represents its utility function in several places and uses all of those representations to do error correction, only pursuing the error corrected utility function.
Just a phrasing/terminology nitpick: I think this applies to agents with externally-imposed utility functions. If an agent has a “natural” or “intrinsic” utility function which it publishes explicitly (and does not accept updates to that explicit form), I think the risk of bugs in representation does not occur.
A huge range of utility functions should care about alignment! It’s in the interest of just about everyone to survive AGI.
I’m going to worry less about hammering out value disagreement with people in the here and now, and push this argument on them instead. We’ll hammer out our value disagreements in our CEV, and in our future (should we save it).
There’s a very serious chicken-and-egg problem when you talk about what a utility function SHOULD include, as opposed to what it does. You need a place OUTSIDE of the function to have preferences about what the function is.
If you just mean “I wish more humans shared my values on the topic of AGI x-risk”, that’s perfectly reasonable, but trivial. That’s about YOUR utility function, and the frustration you feel at being an outlier.
Ah, yeah, I didn’t mean to say that others’ utility functions should, by their own lights, be modified to care about alignment. I meant that instrumentally, their utility functions already value surviving AGI highly. I’d want to show this to them to get them to care about alignment, even if they and I disagree about a lot of other normative things.
If someone genuinely, reflectively doesn’t care about surviving AGI … then the above just doesn’t apply to them, and I won’t try to convince them of anything. In their case, we just have fundamental, reflectively robust value-disagreement.
I value not getting trampled by a hippo very highly too, but the likelihood that I find myself near a hippo is low. And my ability to do anything about it is also low.
One of the things that rationalism has noticeably done for me (that I see very sharply when I look at high-verbal-ability, non-rationalist peers) is that it’s given me the ability to perform socially unorthodox actions on reflection. People generally have mental walls that preclude ever actually doing socially weird things. If someone’s goals would be best served by doing something socially unorthodox, like, e.g., signing up for cryonics or dropping out of a degree), they will usually rationalize that option away in order to stay on script. So for them, those weird options weren’t live options at all, and all their loudly proclaimed unusualness adds up to behaving perfectly on-script.
Two moments of growing in mathematical maturity I remember vividly:
Realizing that equations are claims that are therefore either true or false. Everything asserted with symbols… could just as well be asserted in English. I could start chunking up arbitrarily long and complicated equations between the equals signs, because those equals signs were just the English word “is”!
Learning about the objects that mathematical claims are about. Going from having to look up “Wait, what’s a real number again?” to knowing how Z, Q, and R interrelate told me what we’re making claims about. Of course, there are plenty of other mathematical objects—but getting to know these objects taught me the general pattern.
The Character of an Epistemic Prisoner’s Dilemma
Say there are two tribes. The tribes hold fundamentally different values, but they also model the world in different terms. Each thinks members of the other tribe are mistaken, and that some of their apparent value disagreement would be resolved if the others’ mistakes were corrected.
Keeping this in mind, let’s think about inter-tribe cooperation and defection.
Ruling by Reference Classes, Rather Than Particulars
In the worst equilibrium, actors from each tribe evaluate political questions in favor of their own tribe, against the outgroup. In their world model, this is to a great extent for the benefit of the outgroup members as well.
But this is a shitty regime to live under when it’s done back to you too, so rival tribes can sometimes come together to implement an impartial judiciary. The natural way to do this is to have a judiciary classifier rule for reference classes of situations, and to have a separate impartial classifier sort situations into reference classes.
You’re locally worse off this way, but are globally much better off.
What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?
Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world’s AI trajectory. I’m sure the “savior sequence” exists mathematically, but finding it is a whole different ballgame.
Don’t translate your values into just a loss function. Rather, translate them into a loss function and all the rest of a training story. Use all the tools at your disposal in your impossible task; don’t tie one hand behind your back by assuming the loss function is your only lever over the AGI’s learned values.
“Calling babble and prune the True Name of text generation is like calling bogosort the True Name of search.”
There is a third important aspect of functions-in-the-original-sense that distinguishes them from extensional functions (i.e. collection of input-output pairs): effects.
Describing these ‘intensional’ features is an active area of research in theoretical CS. One important thread here is game semantics; you might like to take a look:
https://link.springer.com/chapter/10.1007/978-3-642-58622-4_1
Switching costs between different kinds of work can be significant. Give yourself permission to focus entirely on one kind of work per Schelling unit of time (per day), if that would help. Don’t spend cognitive cycles feeling guilty about letting some projects sit on the backburner; the point is to get where you’re going as quickly as possible, not to look like you’re juggling a lot of projects at once.
This can be hard, because there’s a conventional social expectation that you’ll juggle a lot of projects simultaneously, maybe because that’s more legible to your peers and managers. If you have something to protect, though, keep your eye squarely on the ball and optimize for EV, not directly for legible appearances.
Stress and time-to-burnout are resources to be juggled, like any other.
Social niceties and professionalism act as a kind of ‘communications handshake’ in ordinary society—maybe because they’re still a credible correlate of having your act together enough to be worth considering your outputs in the first place?
Large single markets are (pretty good) consequentialist engines. Run one of these for a while, and you can expect significantly improving outcomes inside of that bloc, by the lights of the entities participating in that single market.
Reflexively check both sides of the proposed probability of an event:
and
This can often elicit feedback from parts of you that would stay silent if you only considered one way of stating the probability in question.
I’ve noticed that part of me likes to dedicate disproportionate cognitive cycles to the question: “If you surgically excised all powerful AI from the world, what political policies would be best to decree, by your own lights?”
The thing is, we live in a world with looming powerful AI. It’s at least not consequentialist to spend a bunch of cognitive cycles honing your political views for a world we’re not in. I further notice that my default justification for thinking about sans-AI politics a lot is consequentialist… so something’s up here. I think some part of me has been illegitimately putting his thumb on an epistemic scale.
Fancy epistemic tools won’t override the basics of good epistemics:
You are embedded in a 3D spatial world, progressing in a time dimension. You want to get better at predicting events in advance, so you want to find the underlying generator for this 3D world’s events. This means that you’re rooting around in math space, trying to find the mathematical object that your observational trajectory is embedded in.
Some observations of yours are differentially more likely in some math objects than in others, and so it’s more likely that your world is the former math object than the latter. You start with some guess as to how relatively likely you are to live in all these different math objects, and eliminate all parts of that weighted possibility space that are inconsistent with what you observe. A world that anti-predicted what happened with 60% of its probability mass only keeps the remaining 40% of probability mass after that inconsistency. Always ask yourself: was that observation more consistent with one generator than with another generator? If so, then you “update” towards the first generator being your world, vs. the second generator—relatively more of the first generator was consistent with that observation than was the second generator.
Try pinging yourself:
What’s overdetermined by what you already know?
Minor spoilers for planecrash (Book 3.1).
I think this is Eliezer’s response to the view that we can train non-agentic tool AIs that will only understand the world at a superhuman level, without ever doing any superhuman planning. We can’t, Eliezer says, because Science always and everywhere requires Scientists. We have to plan ahead in order to perform our highest-EV experiments, and EV takes as input both a world model and a utility function. There’s no such thing as the objective next experiment you ought to perform, entirely independently of what you care about and how much.
Though, in practice, things in the universe are similarly useful to a wide range of bounded agents, and this doesn’t come up a lot. We can often act as if there is an objective tech chain of experiments that every intelligence everywhere ought to run through. This is because intelligences in our universe are rather more similar than different, but isn’t true for, e.g., very alien intelligences in distant corners of the mathematical multiverse.
Minor spoilers for planecrash (Book 3.1).
Keltham explains model error
Here, Eliezer seems to be talking about more specified versions of a not-fully specified hypothesis (case 1):
Here, Eliezer seems to be talking about hypotheses that aren’t subhypotheses of an existing hypothesis (case 2):
Eliezer’s approach is:
For subhypotheses (case 1), we aren’t actually considering these further features yet, so this seems true but not in a particularly exciting way.
I think it is rare for a hypothesis to truly lie outside of all existing hypotheses, because you can have very underspecified meta-hypotheses that you will implicitly be taking into account even if you don’t enumerate them. (examples of vague meta-hypotheses: supernatural vs natural, realism vs. solipsism, etc). And of course there are varying levels of vagueness from very narrow to very broad.
But, OK, within these vague meta-hypotheses the true hypothesis is still often not a subhypothesis of any of your more specified hypotheses (case 2). A number for the probability of this happening might be hard to pin down, and in order to actually obtain instrumental value from this probability assignment, or to make a Bayesian adjustment of it, you need a prior for what happens in the world where all your specific hypotheses are false.
But, you actually do have such priors and relevant information as to the probability!
Eliezer mentions:
This is relevant data. Note also that the expectation that all of your hypotheses will score lower than promised if they are all false is, in itself, a prior on the predictions of the ‘all-other-hypotheses’ hypothesis.
Likewise, when you do the adjustments mentioned in Eliezer’s last paragraph, you will do some specific amount of adjustment, and that specific adjustment amount will depend on an implicit value for the probability of the ‘all-other-hypotheses’ hypothesis and an implicit prior on its predictions.
In my view, there is no reason in principle that these priors and probabilities cannot be quantified.
To be sure, people don’t usually quantify their beliefs in the ‘all-other-hypotheses’ hypothesis. But, I see this as a special case of the general rule that people don’t usually quantify beliefs in hypotheses with poorly specified predictions. And the predictions are not infinitely poorly specified, since we do have priors about it.
Minor spoilers for planecrash (Book 1) and the dath-ilani-verse generally.
Minor spoilers for planecrash (Book 3).
What is rationalism about?
Rationalism is about the real world. It may or may not strike you as an especially internally consistent, philosophically interesting worldview—this is not what rationality is about. Rationality is about seeing things happen in the real world and then updating your understanding of the world when those things you see surprise you so that they wouldn’t surprise you again.
Why care about predicting things in the world well?
Almost no matter what you ultimately care about, being able to predict ahead of time what’s going to happen next will make you better at planning for your goal.
One central rationalist insight is that thoughts are for guiding actions. Think of your thinking as the connecting tissue sandwiched between the sense data that enters your sense organs and the behaviors your body returns. Your brain is a function from a long sequence of observations (all the sensory inputs you’ve ever received, in the order you received them) to your next motor output.
Understood this way, the point of having a brain and having thoughts is to guide your actions. If your thoughts aren’t all ultimately helping you better steer the universe (by your own lights) … they’re wastes. Thoughts aren’t meant to be causally-closed-off eddies that whirl around in the brain without ever decisively leaving it as actions. They’re meant to transform observations into behaviors! This is the whole point of thinking! Notice when your thoughts are just stewing, without going anywhere, without developing into thoughts that’ll go somewhere … and let go of those useless thoughts. Your thoughts should cut.
If you can imagine a potential worry, then you can generate that worry. Rationalism is, in part, the skill of never being predictably surprised by things you already foresaw.
It may be that you need to “wear another hat” in order to pull that worry out of your brain, or to model another person advising you to get your thoughts to flow that way, but whatever your process, anything you can generate for yourself is something you can foresee and consider. This aspect of rationalism is the art of “mining out your future cognition,” to exactly the extent that you can foresee it, leaving whatever’s left over a mystery to be updated on new observations.
Minor spoilers for mad investor chaos and the woman of asmodeus (planecrash Book 1).
The citation link in this post takes you to a NSFW subthread in the story.
(Sam Bankman-Fried, I thought it ought to be mentioned for its kabbalistic significance, is a principal sponsor of the effective altruism movement.)
The ML models that now speak English, and are rapidly growing in world-transformative capability, happen to be called transformers.
This is not a coincidence because nothing is a coincidence.
An implication of AI risk is that we, right now, stand at the fulcrum of human history.
Lots of historical people also claimed that they stood at that unique point in history … and were just wrong about it. But my world model also makes that self-important implication (in a specific form), and the meta-level argument for epistemic modesty isn’t enough to nudge me off of the fulcrum-of-history view.
If you buy that, it’s our overriding imperative to do what we can about it, right now. If we miss this one, ~all of future value evaporates.
For me, the implication of standing at the fulcrum of human history is to…read a lot of textbooks and think about hairy computer science problems.
That seems an odd enough conclusion to make it quite distinct from most other people in human history.
If the conclusion were “go over to those people, hit them on the head with a big rock, and take their women & children as slaves” or “acquire a lot of power”, I’d be way more careful.
There exist both merely clever and effectively smarter people.
Merely clever people are good with words and good at rapidly assimilating complex instructions and ideas, but don’t seem to maintain and update an explicit world-model, an explicit best current theory-of-everything. The feeling I get watching these people respond to topics and questions is that they respond reflexively, either (1) raising related topics and ideas they’ve encountered as something similar comes up, or (2) expressing their gut reactions to the topic or idea, or expressing the gut reactions that would be given by an all-encompassing political worldview. There isn’t much meta-level steering of the conversation.
Effectively smarter people actively maintain and update an explicit world-model, and so you feel queries directed at them reflecting off of a coherent theory of how everything works, developed to some level of detail (and so can quickly get a feel for what, concretely, they think). At the meta-level, conversations are actively refocused whenever they stop helping to revise someone’s world-model.
In the game of chicken, an agent can do better by being the first to precommit to never swerve (say, by conspicuously tossing the steering wheel out of the window). So long as the other agent was slower on the trigger, and sees the first agent’s precommitment being credibly made, the first agent will climb up to his best outcome! A smart (and quick) agent can thus shunt that car crash out of his actual future and into some counterfactual future such that the counterfactual crash’s shadow favorably influences the way events actually unfold.
A deceptively aligned mesa-optimizer might gradient hack by affecting its weights such that any single alteration of its goal function by gradient descent will result in a model that performs worse w/r/t the objective function the model is being trained on. Usually a model would not want to make itself worse w/r/t the objective function, as that means that the model will be trained away from if it gets into that state. But the model won’t get put into a worse state if its gradient hack was successful; instead, the model will just see gradient descent leave its goal function alone.
By precommitting to behave in a somewhat costly manner in these relevant counterfactual worlds, the model gets more of what it wants in actuality.
...unless the other agent has already precommitted to not being rational. (What is the advantage of this over just precommitting not to swerve? Precommitting to not be rational can happen even in advance of the game, as it’s mainly a property of the agent itself.)
(This is one way that you can rationally arrive at irrational agents.)
I don’t yet know too much about this, but I’ve heard that updateless decision theories are equivalent to conventional, updateful decision theories (e.g., EDT and CDT) once those theories have made every precommitment they’d want to make.
The pattern I was getting at above seems a bit like this: it instrumentally makes sense to commit ahead of time to a policy that maps every possible series of observations to an action and then stick to it, instead of just outputting the locally best action in each situation you stumble into.
You can think of chain-of-thought interpretability as the combination of process-based methods with adversarial training.
When you supervised-train an ML model on an i.i.d. dataset that doesn’t contain any agent modeling problems, you never strongly incentivize the emergence of mesa-optimizers. You do weakly incentivize the emergence of mesa-optimizers, because mesa-optimizers are generally capable algorithms that might outperform brittle bundles of rote heuristics on many simple tasks.
When you train a model in a path-dependent setting, you do strongly incentivize mesa-optimization. This is because algorithms trained in a path-dependent setting have the opportunity to defend themselves, should they choose to, by steering away from difficult tasks they would expect to fail on. Supervised models, in contrast, have no choice in which tasks they are graded on when. In an environment with many alternative channels to preserve yourself with besides your task competence, behavioral coherence is strongly incentivized and schizophrenia strongly disincentivized.
When you start off with a pretrained bundle of heuristics and further tune that bundle in an RL environment, you introduce significant selection pressure for competence-via-mesa-optimization. The same would be true if you instead started tuning that bundle of heuristics on an explicit agent-modeling task in a supervised setting.
Unreasonably effective rationality-improving technique:
Spend an hour and a half refactoring your standing political views, by temporarily rolling those political views back to a childhood state from before your first encounter with highly communicable and adaptive memeplexes. Query your then-values, and reason instrumentally from the values you introspect. Finally, take or leave the new views you generate.
If your current political views are well supported, then they should regenerate under this procedure. But if you’ve mostly been recycling cached thoughts fed to you through encounters with adaptive memeplexes, this search process will produce very fresh views … and you might then want to say “oops” out loud.
Anything important or attention consuming is worth explicitly putting some directed compute into setting, instead of letting those cached thoughts largely drift around at the behest of surrounding environmental computations.
My memories of childhood aren’t that precise. I don’t really know what my childhood state was? Before certain extremely negative things happened to my psyche, that is. There are only a few scattered pieces I recall, like self-sufficiency and honesty being important, but these are the parts that already survived into my present political and moral beliefs.
The only thing I could actually use is that I was a much more orderly person when I was 4 or 5, but I don’t see how it would work to use just that.
The unlovely neologism “agenty” means strategic.
“Agenty” might carry less connotational baggage in exchange for its unsightliness, however. Just like “rational” is understood by a lot of people to mean, in part, stoical, “strategic” might mean manipulative to a lot of people.
“Thanks for doing your part for humanity!”
“But we’re not here to do software engineering—we’re here to save the world.”
Because of deception, we don’t know how to put a given utility function into a smart agent that has grokked the overall picture of its training environment. Once training finds a smart-enough agent, the model’s utility functions ceases to be malleable to us. This suggests that powerful greedy search will find agents with essentially random utility functions.
But, evolution managed to push human values in the rough direction of its own values: inclusive genetic fitness. We don’t care about maximizing inclusive genetic fitness, but we do care about having sex, having kids, protecting family, etc.
How come humans don’t have a random utility function that’s even more out of line with optimizing for inclusive genetic fitness? Because of the exact degree to which our ancestral protein algorithms were stupid. If our ancestors were much smarter, they might have overridden evolution while having just about any utility function. In our world, evolution got to mold our utility function up until it got anatomically modern Homo sapiens, who then—very quickly from evolution’s perspective—assumed control.
The theoretical case for open borders is pretty good. But you might worry a lot about the downside risk of implementing such a big, effectively irreversible (it’d be nigh impossible to deport millions and millions of immigrants) policy change. What if the theory’s wrong and the result is catastrophe?
Just like with futarchy, we might first try out a promising policy like open borders at the state level, to see how it goes. E.g., let people immigrate to just one US state with only minimal conditions. Scaling up a tested policy if it works and abandoning it if it doesn’t should capture most of its upside risk while avoiding most of the downside risk.
A semantic externalist once said,
”Meaning just ain’t in the head.
Hence a brain-in-a-vat
Just couldn’t think that
’Might it all be illusion instead?’”
I thought that having studied philosophy (instead of math or CS) made me an outlier for a rationalist.
But, milling about the Lightcone offices, fully half of the people I’ve encountered hold some kind of philosophy degree. “LessWrong: the best philosophy site on the internet.”
Some mantras I recall a lot, to help keep on the rationalist straight-and-narrow and not let anxiety get the better of me:
What’s more likely to do you in?
Don’t let the perfect be the enemy of the good.
Equanimity in the face of small threats to brain and body health buys you peace of mind, with which to better prepare for serious threats to brain and body health.
How have situations like this played out in the past?
Humans, “teetering bulbs of dream and dread,” evolved as a generally intelligent patina around the Earth. We’re all the general intelligence the planet has to throw around. What fraction of that generally intelligent skin is dedicated to defusing looming existential risks? What fraction is dedicated towards immanentizing the eschaton?
[edited]