Precisely Bound Demons and their Behavior
EY posted this on reddit, I’d like to know what you’d do with it:
I can’t promise this will turn into a sufficiently good environment for storytelling or that I’ll write in it, but you never know unless you try, and worldbuilding can be fun regardless… One in X people (X ~ 10,000?) has the ability to summon demons, once per Y days, and bind them to arbitrary commands at will. Demons are malevolent and will interpret any instruction in such ways as to cause the most damage. Evil summoners can sometimes reach an accommodation of sorts by giving the demons orders which benefit themselves and hurt others more, in which case the demon will often go along with it, most of the time.
Most good people with the ability to summon demons were advised never to do so, unless it became necessary to defeat an evil demon-summoner creating horror on a mass scale.
This world’s Industrial Revolution began when it was realized that mathematically precise and complete commands to demons apparently could not be misinterpreted. For example (this could perhaps be picked apart): A demon told to accelerate a vehicle along an exactly given vector for a specified time, applying the same added acceleration at any given time to all particles in the vehicle, and causing no other impact on the material universe, will do only that… if the language of the contract can be mathematically specified in an absolutely unambiguous way. (What exactly is the ‘vehicle’? Maybe you’d better have the demon apply acceleration to a sphere to which the engine car is attached.)
Demon-summoners promptly began to use their powers in the most economically rewarding way, such as by summoning demons who would just accelerate particular train engine cars; and this occurred on a mass scale throughout society.
This is a point where I wouldn’t mind help worldbuilding: given this basic setup, what industrially useful demonic bindings can be precisely specified? Suppose the world is such that electricity doesn’t exist, but fire does, and steam. Demon summoners will end up being rare enough, whatever frequency is ‘rare enough’, that the society doesn’t come apart as the result of whatever powers you invent.
Bindings can also tell demons to act based on the result of a calculation, if that calculation is precisely specified. There is no known limit on how much calculation can be done this way. If a demon is told to behave in a way that depends on a calculation that does not halt, it is the same as telling the demon “do what you want”, which is a very bad thing to tell a demon (though for poorly understood reasons, demons’ most malevolent free actions are not as destructive as the worst human commands). Summoners are well-advised to tell the demon to only compute something for a bounded number of steps, though no known limit exists on how high the bound can be.
From our perspective, they discovered that demons can act like unboundedly large and fast computers.
This kind of demonic calculation has been previously used to investigate interesting math questions and create demons that e.g. loft steerable airplanes. But as the calculations used in spiritual industry grow more complex, people have the bright idea that cognitive calculations can also be specified. They begin to publish specifications for simple cognitive constructs, like gradient-descent sigmoid neural networks. It would be useful (think those spiritualists) if demons could be told to recognize particular faces by recourse to a neural network, without giving any demon underspecified instructions about ‘if you recognize person X’ that would allow their malevolence room to act.
Shortly thereafter, the world ends.
Our N protagonists find themselves in a Groundhog Day Loop of period ???, trying to prevent the seemingly inevitable end of the world that occurs when some damned idiot summoner, somewhere, instructs a demon to act like the equivalent of AIXI-tl. For reasons that are unclear, even though ‘natural’ demons don’t instantly destroy the world given an instruction like ‘do what you want’, the cognitively bound equivalent of AIXI-tl can construct self-replicating agentic goo in the environment in order to serve its purposes (in the case of AIXI-tl, maximizing a reward channel).
After some failures trying to prevent the end of the world the normal way, the thought has occurred to our protagonists that the only Power great enough to prevent the end of the world would be a demon bound to implement a ‘nice’ superpowerful cognitive binding, or at least a cognitive binding that carries out intuitively specified instructions well enough to shut down all attempts at summoning non-value-aligned cognitive demons.
But the mathematical technology that the Looped summoners presently have for specifying cognitive bindings is incredibly primitive—at the level of AIXI-tl. They can’t even solve a problem like ‘Specify an advanced agent that, otherwise given freedom to act on the material universe however it likes, just wants to flip a certain button and then shut itself off in an orderly and nondestructive fashion, without e.g. constructing any other agents to maximize the probability of it being absolutely shut off forever, etc.’
And doing research on this topic, at least openly, does tend to destroy the world before the non-Looped researchers can get properly started. If you say “Can we have a non-destructive version of AIXI-tl?” then somebody goes off and summons AIXI-tl.
The story opens well into the Loops, as the Loopers try to conquer the world and restrain all other summoners in order to create an environment where they can actually get some collaborative research done before the end of a Loop, and maybe live in a world for longer than ??? days for bloody once. They are, of course, regarded as supervillains by the general public. Being not a little crazy by this point, many of them are happy to play the part so far as that goes—wear black, live in a dark castle, accept the service of the sort of member of the appropriate sex who wants to swear themselves to a supervillain, etcetera.
Demons seem blind to the Loops, so some Loopers may also be using seemingly destructive ordinary demonic contracts to gain an advantage. Opinions differ among the Loopers as to what degree the Loops are real, other people in the Loops are worth optimizing for, etcetera. “If those other people are even real in the same way we are, they’re all going to die anyway and go on dying until we end this somehow” is a common but not universal sentiment.
The questions I pose to you:
What sort of industrially scaled, or personally awesome uses for a mathematically specified, precisely bound demon can you imagine? What was the prior world that existed before the Loops?
What kind of advantage do our Loopers have from their preliminary research into cognitive demons?
How are they trying to take over the world in the first written Loop?
What sort of really awesome character would you like to see in this situation? Feel free to pick references from fiction, e.g. “BBC!Sherlock”. My trying to write them played straight will just generate a new Yudkowskian character.
Among other things, the Groundhog Day format hopefully means that I can have characters freely do what a subreddit and/or high bidders suggest, within the limits of my own filtering for intelligent action; and when that all goes pear-shaped, it’s back to the next reset.
If anyone can give an unboundedly-computable specification of either a nice Sovereign agent, or less improbably, a trainable good Genie, the characters Win. While I can’t make promises in my own person at this point, if that started to be a reasonable prospect, I’d expect I could swing a million-dollar prize to be set up for that perhaps improbable case. It’s not like there are better uses for money.
As is my usual practice, the world and characters would be open for anyone else to use and profit on.
ADDED 1: Demons have limits as to how much material force they can exert, within what range. You cannot summon a demon and tell it to hurl the moon into the sun. Pulling a train is about as much as they can do. AIXI-tl kills by creating self-replicating smart goo, not by instantly optimizing the whole universe from within its local radius. Demons cannot be used for long-range communication, except by making flashes of light that are seen elsewhere.
ADDED 2: Demons are cunning but can still often be outwitted by clever humans… unless you’ve given the demon precise instructions to act on the material world in a way that depends on a calculation, in which case that calculation can be arbitrarily powerful. You can’t instruct a demon ‘make nanotech’ (not that this would ever be a good idea) because the demon isn’t smart enough to figure that out on its own without a calculatory binding.
ADDED 3: Name not set in stone, better names welcome.
Since this seems like a pretty transparent metaphor for Friendly AI, it looks as though Eliezer is planning to go through with his idea of crowdsourcing FAI research. Any predictions for how this is going to go? I’m personally not optimistic that the subreddit is actually going to produce any important, novel results*, but at the very least, it’ll increase exposure to the idea of FAI with a general audience. (After all, HPMoR was what originally brought me to LW.)
* It seems to me that the main strength of crowdsourcing in solving problems is the ability to propose a truly gigantic amount of solutions in a very short amount of time, which only helps if (a) the true solution is easy enough to guess that someone can stumble upon it largely by chance, (b) other people then recognize the solution as a good one and upvote it, and (c) the solution is easily testable to see if it is a good one or a bad one (otherwise people will keep on proposing solutions without realizing that they’ve already stumbled across the right answer). All three of these were true of HPMoR; all of them are probably false in the context of FAI research.
One of the main things that stops me from writing about things—FAI included—is that if something feels very important, anxiety kicks in and inhibits the thought-to-keyboard process. If that problem is at all common, then a thin veil of frivolity will do wonders for research productivity.
That seems fair, but I’d say that unless you’re already intelligent enough to do important, original work in the field of FAI (or any other field of mathematics, really), a productivity boost won’t help much. To use an analogy: a car whose engine is broken won’t run no matter how much gasoline you put in its tank.
(Not to imply that the people who frequent /r/hpmor are unintelligent, just that the bar for doing successful FAI research is really, really high, and unless you can clear that bar, increasing the number of people working on the problem isn’t likely to help—in my view, anyway. I could be wrong.)
I quite liked the story idea until I realised that its a pretty transparent metaphor for Friendly AI… no, wait, it actually is a story about FAI. Starting off with worldbuilding a fantasy magicpunk setting and then suddenly switching to FAI seems… kinda like bait and switch?
Having said that, I really like this setting. The main problem is that there seem to be two entirely different themes—FAI and sorcerers taking over the world. If you start discussing hard maths you are going to lose many readers, but then if the goal is to inspire FAI work, does this matter?
The secondary problem is if you can just reset the timeline as many times as you want, there is no sense of urgency or tension. Maybe they discover that each time they reset, cracks start to appear in the walls between realms , deamon summoning becomes easier, and the daemons are one step closer to being able to break through on their own?
Meta: is there any point in discussing this here when the reddit conversation is so much bigger? I’m probably just going to copy my comment over.
Mathematics always has some primitive, undefined concepts at the root of it. Demons have the option of interpreting these malevolently or asking for more and more clarification until a loophole is reached.
Relativity of simultaneity. A demon can choose a reference frame with respect to which applying the acceleration to all particles at the same time leads to the vehicle being torn apart.
To make this work, the demons will need to think in a particular mathematical language, whose primitives they take for granted and have relatively unmysterious empirical significance.
Alternatively, perhaps demons are somehow forced (or motivated) to ‘do what i mean’—they never perversely interpret the semantics of what you say. But they also don’t coherently extrapolate your volition, which means they’re free to perversely manipulate aspects of the situation that you didn’t explicitly talk about (especially when you didn’t consciously think about them either). E.g., if you give a demon the English-language instruction “pick up that bucket of water,” it isn’t free to come up with a suboptimal semantics (like “pick up” means “decapitate” and “bucket of water” means “all my friends”), but it is free to execute the correctly-interpreted instruction in a dangerous way (e.g., picking it up with so much speed and force that it produces a shockwave). If demons interpret the meaning of commands with maximal benevolence but choose a means to the specified end with maximal malice, then mathematically specifying everything about the means makes demons safe(r).
If the demons understand harm and are very clever in figuring out what will lead to it, what happens when we ask them minimize harm, or maximize utility, or do the opposite of what it would want to do otherwise, or {rigidly specified version of something like this}?
Can we force demons to tell us (for instance) how they’d rank various policy packages in government, what personal choices they’d prefer I make, &c., so we can back-engineer what not to do? They’re not infinitely clever, but how clever are they?
There are ten thousand wrong solutions and four good solutions. You don’t get much info from being told a particular bad solution. The opposite of a bad solution is a bad solution.
So ask a series of “which of X and Y would you prefer that we do”. The demon always prefers the worst thing, but is constrained to truthfully describe its preferences. This is a single bit of data, but it’s really useful.
Actually, I can think of another loophole. Just ask the demon to do X in a manner which causes, by the demon’s own standards, the least harm. Because it is stipulated that the demon always wants to do things that cause the most harm by human standards, it follows that the demons must have a concept of “harm” that is congruent with human standards. The demon is not only a malevolent genie, it’s a consistently malevolent genie and you can take advantage of this.
It may seem that we have not really stipulated that the demon ranks everything by human standards, just that the demon’s topmost preference is the one ranked the worst by human standards. However, you can ask the demon “do X in a way that is not (topmost preference)” and by stipulation it will still do the most harm, thus implying that the demon’s second preference is also ranked by human standards; by induction all the demon’s preferences are ranked by human standards.
This can break if the demon does things that do the most harm by human standards because it has its own standards opposite from a human and does the least harm by its own standards. If so, just ask it for something that causes the most harm by its standards instead.
(If you’re wondering what happens if the demon picks the definition of “the demon’s standards” that it prefers, it can’t actually do that. One of the choices would be a lie, and the demon is a non-lying genie, not a lying-if-plausible-deniability genie.)
The looping does introduce a confounding factor—the best solutions are going to require foreknowledge and thus be rather inapplicable to real life.
BTW, is this inspired by the ‘Infinite Loops’ general-purpose fanfiction setting?
Do demons communicate between themselves? Can it be shown that Looping the world is the only way for it not to be forever ended? What are the worst sacrifices a Summoner can make to prolong the Loop for a unit of time? (If Looping is common knowledge) is there a way to make a Looping world better than a non-Looping? Like, you can optimize everything until you don’t have disease, poverty, lack of fun, then you go forward and un-Loop?
I believe we have the most comprehensive demon summoning guides on the internet:
How to Summon a Demon