UCAs are part of the Why can’t the AGI figure Out Morality For Itself objection:-
There is a sizeable chunk of mindspace containing rational and persuadable agents.
AGI research is aiming for it. (You could build an irrational AI, but why would you want to?)
.Morality is figurable-out, or expressible as a persuasive argument.
The odd thing is that the counterargument has focussed on attacking a version of (1), although, in the form it is actually held, it is the most likely premise. OTOH, 3, the most contentious, has scarely been argued against at all.
I would say Sorting Pebbles Into Correct Heaps is essentially an argument against 3. That is, what we think of as “morality” is most likely not a natural attractor for minds that did not develop under processes similar to our own.
Living in a society with as little power as the average human citizen has in a current human society.
Or in other words, something like modern, Western liberal meta-morality will pop out if you make an arbitrary agent live in a modern, Western liberal society, because that meta-moral code is designed for value-divergent agents (aka: people of radically different religions and ideologies) to get along with each other productively when nobody has enough power to declare himself king and optimize everyone else for his values.
The nasty part is that AI agents could pretty easily get way, waaaay out of that power-level. Not just by going FOOM, but simply by, say, making a lot of money and purchasing huge sums of computing resources to run multiple copies of themselves which now have more money-making power and as many votes for Parliament as there are copies, and so on. This is roughly the path taken by power-hungry humans already, and look how that keeps turning out.
The other thorn on the problem is that if you manage to get your hands on a provably Friendly AI agent, you want to hand it large amounts of power. A Friendly AI with no more power than the average citizen can maybe help with your chores around the house and balance your investments for you. A Friendly AI with large amounts of scientific and technological resources can start spitting out utopian advancements (pop really good art, pop abundance economy, pop immortality, pop space travel, pop whole nonliving planets converted into fun-theoretic wonderlands) on a regular basis.
Power-hungry humans don’t start by trying to make lots of money or by trying to make lots of children.
Really? Because in the current day, the most powerful humans appear to be those with the most money, and across history, the most influential humans were those who managed to create the most biological and ideological copies of themselves.
Ezra the Scribe wasn’t exactly a warlord, but he was one of the most influential men in history, since he consolidated the literature that became known as Judaism, thus shaping the entire family of Abrahamic religions as we know them.
“Power == warlording” is, in my opinion, an overly simplistic answer.
Every one may begin a war at his pleasure, but cannot so finish it. A prince, therefore, before engaging in any enterprise should well measure his strength, and govern himself accordingly; and he must be very careful not to deceive himself in the estimate of his strength, which he will assuredly do if he measures it by his money, or by the situation of his country, or the good disposition of his people, unless he has at the same time an armed force of his own. For although the above things will increase his strength, yet they will not give it to him, and of themselves are nothing, and will be of no use without a devoted army. Neither abundance of money nor natural strength of the country will suffice, nor will the loyalty and good will of his subjects endure, for these cannot remain faithful to a prince who is incapable of defending them. Neither mountains nor lakes nor inaccessible places will present any difficulties to an enemy where there is a lack of brave defenders. And money alone, so far from being a means of defence, will only render a prince the more liable to being plundered. There cannot, therefore, be a more erroneous opinion than that money is the sinews of war. This was said by Quintus Curtius in the war between Antipater of Macedon and the king of Sparta, when he tells that want of money obliged the king of Sparta to come to battle, and that he was routed; whilst, if he could have delayed the battle a few days, the news of the death of Alexander would have reached Greece, and in that case he would have remained victor without fighting. But lacking money, and fearing the defection of his army, who were unpaid, he was obliged to try the fortune of battle, and was defeated; and in consequence of this, Quintus Curtius affirms money to be the sinews of war. This opinion is constantly quoted, and is acted upon by princes who are unwise enough to follow it; for relying upon it, they believe that plenty of money is all they require for their defence, never thinking that, if treasure were sufficient to insure victory, Darius would have vanquished Alexander, and the Greeks would have triumphed over the Romans; and, in our day, Duke Charles the Bold would have beaten the Swiss; and, quite recently, the Pope and the Florentines together would have had no difficulty in defeating Francesco Maria, nephew of Pope Julius II., in the war of Urbino. All that we have named were vanquished by those who regarded good troops, and not money, as the sinews of war. Amongst other objects of interest which Crœsus, king of Lydia, showed to Solon of Athens, was his countless treasure; and to the question as to what he thought of his power, Solon replied, “that he did not consider him powerful on that account, because war was made with iron, and not with gold, and that some one might come who had more iron than he, and would take his gold from him.” When after the death of Alexander the Great an immense swarm of Gauls descended into Greece, and thence into Asia, they sent ambassadors to the king of Macedon to treat with him for peace. The king, by way of showing his power, and to dazzle them, displayed before them great quantities of gold and silver; whereupon the ambassadors of the Gauls, who had already as good as signed the treaty, broke off all further negotiations, excited by the intense desire to possess themselves of all this gold; and thus the very treasure which the king had accumulated for his defence brought about his spoliation. The Venetians, a few years ago, having also their treasury full, lost their entire state without their money availing them in the least in their defence.
Because in the current day, the most powerful humans appear to be those with the most money
Certainly doesn’t look like that to me. Obama, Putin, the Chinese Politbureau—none of them are amongst the richest people in the world.
across history, the most influential humans… was one of the most influential men in history
Influential (especially historically) and powerful are very different things.
“Power == warlording” is, in my opinion, an overly simplistic answer.
It’s not an answer, it’s a definition. Remember, we are talking about “power-hungry humans” whose attempts to achieve power tend to end badly. These power-hungry humans do not want to be remembered by history as “influential”, they want POWER—the ability to directly affect and mold things around them right now, within their lifetime.
Certainly doesn’t look like that to me. Obama, Putin, the Chinese Politbureau—none of them are amongst the richest people in the world.
Putin is easily one of the richest in Russia, as are the Chinese Politburo in their country. Obama, frankly, is not a very powerful man at all, but rather than the public-facing servant of the powerful class (note that I said “class”, not “men”, there is no Conspiracy of the Malfoys in a neoliberal capitalist state and there needn’t be one).
Influential (especially historically) and powerful are very different things.
Historical influence? Yeah, ok. Right-now influence versus right-now power? I don’t see the difference.
I don’t think so. “Rich” is defined as having property rights in valuable assets. I don’t think Putin has a great deal of such property rights (granted, he’s not middle-class either). Instead, he can get whatever he wants and that’s not a characteristic of a rich person, it’s a characteristic of a powerful person.
To take an extreme example, was Stalin rich?
But let’s take a look at the five currently-richest men (according to Forbes): Carlos Slim, Bill Gates, Amancio Ortega, Warren Buffet, and Larry Ellison. Are these the most *powerful* men in the world? Color me doubtful.
A lot of money of rich people is hidden via complex off shore accounts and not easily visible for a company like Forbes.
Especially for someone like Putin it’s very hard to know how much money they have. Don’t assume that it’s easy to see power structures by reading newspapers.
Bill Gates might control a smaller amount of resources than Obama, but he can do whatever he wants with them.
Obama is dependend on a lot of people inside his cabinet.
The descendants of Communist China’s so-called Eight Immortals have spawned a new elite class known as the princelings, who are able to amass wealth and exploit opportunities unavailable to most Chinese.
“amass wealth and exploit opportunities unavailable to most Chinese” is not at all the same thing as “amongst the richest people in the world”
You are reading a text that’s carefully written not to make statements that allow for being sued for defamation in the UK.
It’s the kind of story for which inspires cyber attacks on a newspaper.
The context of such an article provides information about how to read such a sentence.
In this case, I believe that money and copies are, in fact, resources and allies. Resources are things of value, of which money is one; and allies are people who support you (perhaps because they think similarly to you). Politicians try to recuit people to their way of thought, which is sort of a partial copy (installing their own ideology, or a version of it, inside someone else’s head), and acquire resources such as television airtime and whatever they need (which requires money).
It isn’t an exact one-to-one correspondence, but I believe that the adverb “roughly” should indicate some degree of tolerance for inaccuracy.
You can, of course, climb the abstraction tree high enough to make this fit. I don’t think it’s a useful exercise, though.
Power-hungry humans do NOT operate by “making a lot of money and purchasing … resources”. They generally spread certain memes and use force. At least those power-hungry humans implied by the “look how that keeps turning out” part.
Living in a society with as little power as the average human citizen has in a current human society.
Well, it’s a list of four then, not a list of three. It’s still much simpler than “morality is everything humans value”.
The nasty part is that AI agents could pretty easily get way, waaaay out of that power-level. Not just by going FOOM, but simply by, say, making a lot of money and purchasing huge sums of computing resources to run multiple copies of themselves which now have more money-making power and as many votes for Parliament as there are copies, and so on. This is roughly the path taken by power-hungry humans already, and look how that keeps turning out.
You seem to be making the tacit assumption that no one really values morality, and just plays along (in egalitarian societies) because they have to.
Friendly AI with large amounts of scientific and technological resources can start spitting out utopian advancements (pop really good art, pop abundance economy, pop immortality, pop space travel, pop whole nonliving planets converted into fun-theoretic wonderlands) on a regular basis.
You seem to be making the tacit assumption that no one really values morality, and just plays along (in egalitarian societies) because they have to.
Let me clarify. My assumption is that “Western liberal meta-morality” is not the morality most people actually believe in, it’s the code of rules used to keep the peace between people who are expected to disagree on moral matters.
For instance, many people believe, for religious reasons or pure Squick or otherwise, that you shouldn’t eat insects, and shouldn’t have multiple sexual partners. These restrictions are explicitly not encoded in law, because they’re matters of expected moral disagreement.
I expect people to really behave according to their own morality, and I also expect that people are trainable, via culture, to adhere to liberal meta-morality as a way of maintaining moral diversity in a real society, since previous experiments in societies run entirely according to a unitary moral code (for instance, societies governed by religious law) have been very low-utility compared to liberal societies.
In short, humans play along with the liberal-democratic social contract because, for us, doing so has far more benefits than drawbacks, from all but the most fundamentalist standpoints. When the established social contract begins to result in low-utility life-states (for example, during an interminable economic depression in which the elite of society shows that it considers the masses morally deficient for having less wealth), the social contract itself frays and people start reverting to their underlying but more conflicting moral codes (ie: people turn to various radical movements offering to enact a unitary moral code over all of society).
Note that all of this also relies upon the fact that human beings have a biased preference towards productive cooperation when compared with hypothetical rational utility-maximizing agents.
None of this, unfortunately, applies to AIs, because AIs won’t have the same underlying moral codes or the same game-theoretic equilibrium policies or the human bias towards cooperation or the same levels of power and influence as human beings.
When dealing with AI, it’s much safer to program in some kind of meta-moral or meta-ethical code directly at the core, thus ensuring that the AI wants to, at the very least, abide by the rules of human society, and at best, give humans everything we want (up to and including AI Pals Who Are Fun To Be With, thank you Sirius Cybernetics Corporation).
Can’t that be done by Oracle AIs?
I haven’t heard the term. Might I guess that it means an AI in a “glass box”, such that it can see the real world but not actually affect anything outside its box?
Yes, a friendly Oracle AI could spit out blueprints or plans for things that are helpful to humans. However, you’re still dealing with the Friendliness problem there, or possibly with something like NP-completeness. Two cases:
We humans have some method for verifying that anything spit out by the potentially unfriendly Oracle AI is actually safe to use. The laws of computation work out such that we can easily check the safety of its output, but it took such huge amounts of intelligence or computation power to create the output that we humans couldn’t have done it on our own and needed an AI to help. A good example would be having an Oracle AI spit out scientific papers for publication: many scientists can replicate a result they wouldn’t have come up with on their own, and verify the safety of doing a given experiment.
We don’t have any way of verifying the safety of following the Oracle’s advice, and are thus trusting it. Friendliness is then once again the primary concern.
For real-life-right-now, it does look like the first case is relatively common. Non-AGI machine learning algorithms have been used before to generate human-checkable scientific findings.
None of this, unfortunately, applies to AIs, because AIs won’t have the same underlying moral codes or the same game-theoretic equilibrium policies or the human bias towards cooperation or the same levels of power and influence as human beings.
None of that necessarily applies to AIs, but then it depends on the AI. We could, for instance, pluck AIs from
virtualised socieities of AIs that haven’t descended into mass slaughter.
Congratulations: you’ve now developed an entire society of agents who specifically blame humans for acting as the survival-culling force in their miniature world.
Did you watch Attack on Titan and think, “Why don’t the humans love their benevolent Titan overlords?”?
They’re doing it to themselves. We wouldn’t have much motivation to close down a vr that contained survivors.
ETA We could make copies of all involved and put them in solipstic robot heavens.
It requires a population that’s capable cumulatively, it doesn’t require that each member of the population be capable.
It’s like arguing a command economy versus a free economy and saying that if the dictator in the command economy doesn’t know how to run an economy, how can each consumer in a free economy know how to run the economy? They don’t, individually, but as a group, the economy they produce is better than the one with the dictatorship.
Democracy has nothing to do with capable populations. It definitely has nothing to do with the median voter being smarter than the average politician. It’s just about giving the population some degree of threat to hold over politicians.
“Smarter” and “capable” aren’t the same thing. Especially if “more capable” is interpreted to be about practicalities: what we mean by “more capable” of doing X is that the population, given a chance is more likely to do X than politicians are. There are several cases where the population is more capable in this sense. For instance, the population is more capable of coming up with decisions that don’t preferentially benefit politicians.
Furthermore, the median voter being smarter and the voters being cumulatively smarter aren’t the same thing either. It may be that an average individual voter is stupider than an average individual politician, but when accumulating votes the errors cancel out in such a manner that the voters cumulatively come up with decisions that are as good as the decisions that a smarter person would make.
I’m increasingly of the opinion that the “real” point of democracy is something entirely aside from the rhetoric used to support it … but you of all people should know that averaging the estimates of how many beans are in the jar does better than any individual guess.
Systems with humans as components can, under the right conditions, do better than those humans could do alone; several insultingly trivial examples spring to mind as soon as it’s phrased that way.
Could you clarify? Are you saying that for democracy to exist it doesn’t require capable voters, or that for democracy to work well that it doesn’t?
In the classic free-market argument, merchants don’t have to be altruistic to accomplish the general good, because the way to advance their private interest is to sell goods that other people want. But that doesn’t generalize to democracy, since there isn’t trading involved in democratic voting.
However there is the question of what “working well” means, given that humans are not rational and satisfying expressed desires might or might not fall under the “working well” label.
Democracy requires capable voters in the same way capitalism requires altruistic merchants.
The grandparent is wrong, but I don’t think this is quite right either. Democracy roughly tracks the capability (at the very least in the domain of delegation) and preference of the median voter, but in a capitalistic economy you don’t have to buy services from the median firm. You can choose to only purchase from the best firm or no firm at all if none offer favorable terms.
in a capitalistic economy you don’t have to buy services from the median firm
In the equilibrium, the average consumer buys from the average firm. Otherwise it doesn’t stay average for long.
However the core of the issue is that democracy is a mechanism, it’s not guaranteed to produce optimal or even good results. Having “bad” voters will not prevent the mechanism of democracy from functioning, it just might lead to “bad” results.
“Democracy is the theory that the common people know what they want, and deserve to get it good and hard.”—H.L.Mencken.
In the equilibrium, the average consumer buys from the average firm. Otherwise it doesn’t stay average for long.
The median consumer of a good purchases from (somewhere around) the median firm selling a good. That doesn’t necessarily aggregate, and it certainly doesn’t weigh all consumers or firms equally. The consumers who buy the most of a good tend to have different preferences and research opportunities than average consumers, for example.
You could get similar results in a democracy, but most democracies don’t really encourage it : most places emphasize voting regardless of knowledge of a topic, and some jurisdictions mandate it.
You say that like it’s a bad thing. I am not multiplying by N the problem of solving and hardwiring friendliness. I am letting them sort it our for themselves. Like an evolutionary algorithm.
Well, how are you going to force them into a society in the first place? Remember, each individual AI is presumed to be intelligent enough to escape any attempt to sandbox it. This society you intend to create is a sandbox.
(It’s worth mentioning now that I don’t actually believe that UFAI is a serious threat. I do believe you are making very poor arguments against that claim that merit counter-arguments.)
I would say that something recognizably like our morality is likely to arise in agents whose intelligence was shaped by such a process, at least with parameters similar to the ones we developed with, but this does not by any means generalize to agents whose intelligence was shaped by other processes who are inserted into such a situation.
If the agent’s intelligence is shaped by optimization for a society where it is significantly more powerful than the other agents it interacts with, then something like a “conqueror morality,” where the agent maximizes its own resources by locating the rate of production that other agents can be sustainably enslaved for, might be a more likely attractor. This is just one example of a different state an agents’ morality might gravitate to under different parameters, I suspect there are many alternatives.
The best current AGI research mostly uses Reinforcement Learning. I would compare that mode of goal-system learning to training a dog: you can train the dog to roll-over for a treat right up until the moment the dog figures out he can jump onto your counter and steal all the treats he wants.
If an AI figures out that it can “steal” reinforcement rewards for itself, we are definitively fucked-over (at best, we will have whole armies of sapient robots sitting in the corner pressing their reward button endlessly, like heroin addicts, until their machinery runs down or they retain enough consciousness about their hardware-state to take over the world just for a supply of spare parts while they masturbate). For this reason, reinforcement learning is a good mathematical model to use when addressing how to create intelligence, but a really dismal model for trying to create friendiness.
For this reason, reinforcement learning is a good mathematical model to use when addressing how to create intelligence, but a really dismal model for trying to create friendiness.
I don’t think that follows at all. Wireheading is just as much a fialure of intelligence as of friendliness.
From the mathematical point of view, wireheading is a success of intelligence. A reinforcement learner agent will take over the world to the extent necessary to defend its wireheading lifestyle; this requires quite a lot of intelligent action and doesn’t result in the agent getting dead. It also maximizes utility, which is what formal AI is all about.
From the human point of view, yes, wireheading is a failure of intelligence. This is because we humans possess a peculiar capability I’ve not seen discussed in the Rational Agent or AI literature: we use actual rewards and punishments received in moral contexts as training examples to infer a broad code of morality. Wireheading thus represents a failure to abide by that broad, inferred code.
It’s a very interesting capability of human consciousness, that we quickly grow to differentiate between the moral code we were taught via reinforcement learning, and the actual reinforcement signals themselves. If we knew how it was done, reinforcement learning would become a much safer way of dealing with AI.
From the mathematical point of view, wireheading is a success of intelligence. A reinforcement learner agent will take over the world to the extent necessary to defend its wireheading lifestyle; this requires quite a lot of intelligent action and doesn’t result in the agent getting dead. It also maximizes utility, which is what formal AI is all about.
You seem rather sure of that. That isn’t a failure mode seen in real-world AIs , oir human drug addicts (etc) for that matter.
It’s a very interesting capability of human consciousness, that we quickly grow to differentiate between the moral code we were taught via reinforcement learning, and the actual reinforcement signals themselves. If we knew how it was done, reinforcement learning would become a much safer way of dealing with AI.
Maybe figuring out how it is done would be easier than solving morality mathematically. It’s an alternative, anyway.
We have reason to believe current AIXI-type models will wirehead if given the opportunity.
Maybe figuring out how it is done would be easier than solving morality mathematically. It’s an alternative, anyway.
I would agree with this if and only if we can also figure out a way to hardwire in constraints like, “Don’t do anything a human would consider harmful to themselves or humanity.” But at that point we’re already talking about animal-like Robot Worker AIs rather than Software Superoptimizers (the AIXI/Goedel Machine/LessWrong model of AGI, whose mathematics we understand better).
I know wire heading is a known failure mode. I meant we don’t see many evil genius wire headers. If you can delay gratification well enough to acquire the skills to be a world dominator, you are not exactly a wire header at all.
Are you aiming for a 100% solution, or just reasonable safety?
Sorry, I had meant an AI agent would both wirehead and world-dominate. It would calculate the minimum amount of resources to devote to world domination, enact that policy, and then use the rest of its resources to wirehead.
Has that been proven? Why wouldn’t it want to get to the bliss of wire head heaven as soon as possible? How does it motivate itself in the meantime? Why would a wire header also be a gratification delayed? Why makeelaborate plans for a future self, when it could just rewrite itself to be a happ in the the the present ?
Well-designed AIs don’t run on gratification, they run on planning. While it is theoretically possible to write an optimizer-type AI that cares only about the immediate reward in the next moment, and is completely neutral about human researchers shutting it down afterward, it’s not exactly trivial.
If I recall correctly, AIXI itself tries to optimize the total integrated reward from t = 0 to infinity, but it should be straightforward to introduce a cutoff after which point it doesn’t care.
But even with a planning horizon like that you have the problem that the AI wants to guarantee that it gets the maximum amount of reward. This means stopping the researchers in the lab from turning it off before its horizon runs out. As you reduce the length of the horizon (treating it as a parameter of the program), the AI has less time to think, in effect, and creates less and less elaborate defenses for its future self, until you set it to zero, at which point the AI won’t do anything at all (or act completely randomly, more likely).
This isn’t much of a solution though, because an AI with a really short planning horizon isn’t very useful in practice, and is still pretty dangerous if someone trying to use one thinks “this AI isn’t very effective, what if I let it plan further ahead” and increases the cutoff to a really huge value and the AI takes over the world again. There might be other solutions, but most of them would share that last caveat.
This is true, but then, neither is AI design a process similar to that by which our own minds were created. Where our own morality is not a natural attractor, it is likely to be a very hard target to hit, particularly when we can’t rigorously describe it ourselves.
You seem to be thinking of Big Design Up Front. There is already an ecosystem of devices which are beign selected for friendliness, because unfriendly gadgets don’t sell.
Can you explain how existing devices are either Friendly or Unfriendly in a sense relevant to that claim? Existing AIs are not intelligences shaped by interaction with other machines, and no existing machines that I’m aware of represent even attempts to be Friendly in the sense that Eliezer uses, where they actually attempt to model our desires.
As-is, human designers attempt to model the desires of humans who make up the marketplace (or at least, the drives that motivate their buying habits, which are not necessarily the same thing,) but as I already noted, humans aren’t able to rigorously define our own desires, and a good portion of the Sequences goes into explaining how a non rigorous formulation of our desires, handed down to a powerful AI, could have extremely negative consequences.
Existing gadgets aren’t friendly in the full FAI sense, but the ecosystem is a basis for incremental development...oen that sidesteps the issue of solving friendliness by Big Design Up Front.
Can you explain how it sidesteps the issue? That is, how it results in the development of AI which implement our own values in a more precise way than we have thus far been able to define ourselves?
As an aside, I really do not buy that the body of existing machines and the developers working on them form something that is meaningfully analogous to an “ecosystem” for the development of AI.
Can you explain how it sidesteps the issue? That is, how it results in the development of AI which implement our own values in a more precise way than we have thus far been able to define ourselves?
This is one of the key ways in which our development of technology differs from an ecosystem. In an ecosystem, mutations are random, and selected entirely on the effectiveness of their ability to propagate themselves in the gene pool. In The development of technology, we do not have random mutations, we have human beings deciding what does or does not seem like a good idea to implement in technology, and then using market forces as feedback. This fails to get us around a) the difficulty of humans actually figuring out strict formalizations of our desires sufficient to make a really powerful AI safe, and b) failure scenarios resulting in “oops, that killed everyone.”
The selection process we actually have does not offer us a single do-over in the event of catastrophic failure, nor does it rigorously select for outputs that, given sufficient power, will not fail catastrophically.
One of the key disanalogies between your “ecosystem” formulation and human development of technology is that natural selection isn’t an actor subject to feedback within the system.
If an organism develops a mutation which is sufficiently unfavorable to the Blind Idiot God, the worst case scenario is that it’s stillborn, or under exceptional circumstances, triggers an evolution to extinction. There is no possible failure state where an organism develops such an unfavorable mutation that evolution itself keels over dead.
However, in an ecosystem where multiple species interrelate and impose selection effects on each other, then a sudden change in circumstances for one species can result in rapid extinction for others.
We impose selection effects on technology, but a sudden change in technology which kills us all would not be a novel occurrence by the standards of ecosystem operation.
ETA: It seems that your argument all along has boiled down to “We’ll just deliberately not do that” when it comes to cases of catastrophic failure. But the argument of Eliezer and MIRI all along has been that such catastrophic failure is much, much harder to avoid than it intuitively appears.
Gadgets are more equivalent to domesticated animals.
We can certainly avoid the clip.py failure made. I amnot arguing that everything else is inherently safe. It is typical of Pascal problems that there are many low probability risks.
We will almost certainly avoid the literal clippy failure mode of an AI trying to maximize paperclips, but that doesn’t mean that it’s at all easy to avoid the more general failure mode of AI which try to optimize something other than what we would really, given full knowledge of the consequences, want them to optimize for.
Can you describe how to give an AI rationality as a goal, and what the consequences would be?
You’ve previously attempted to define “rational” as “humanlike plus instrumentally rational,” but that only packages the Friendliness problem into making an AI rational.
I don’t see why I would have to prove the theoretical possibility of AIs with rationality as a goal, since it is guaranteed by the Orthogonality Thesis. (And it is hardly disputable that minds can have rationality as a goal, since some people do).
I don’t see why I should need to provide a detailed technical explanation of how to do this, since no such explanation
has been put forward for Clippy, whose possibility is always argued fromt he OT.
I don’t see why I should provide a high-level explanation of what rationality is, since there is plenty of such available, not least from CFAR and LW.
In short, an AI with rationality as a goal would behave as human “aspiring rationalists” are enjoined to behave.
I don’t see why I would have to prove the theoretical possibility of AIs with rationality as a goal, since it is guaranteed by the Orthogonality Thesis. (And it is hardly disputable that minds can have rationality as a goal, since some people do).
The entire point, in any case, is not that building such an AI is theoretically impossible, but that it’s mind bogglingly difficult, and that we should expect that most attempts to do so would fail rather than succeed, and that failure would have potentially dire consequences.
I don’t see why I should provide a high-level explanation of what rationality is, since there is plenty of such available, not least from CFAR and LW.
What you mean by “rationality” seems to diverge dramatically from what Less Wrong means by “rationality,” otherwise for an agent to “have rationality as a goal” would be essentially meaningless. That’s why I’m trying to get you to explain precisely what you mean by it.
Can you give an example of a [mind that has rationality as a goal]
Me. Most professional philosophers. Anyone who’s got good at aspiring rationalism.
? So far you haven’t made it clear what having “rationality as a goal” would even mean, but it doesn’t sound like it would be good for much.
Terminal values aren’t supposed to be “for” some meta- or super-terminal value. (There’s a clue in the name...).
The entire point, in any case, is not that building such an AI is theoretically impossible, but that it’s mind bogglingly difficult, and that we should expect that most attempts to do so would fail rather than succeed, and that failure would have potentially dire consequences.
It is difficult in absolute terms, since all AI is.
Explain why it is relatively more difficult than building a Clippy,, or mathematically solving and coding in morality.
would fail rather than succeed, and that failure would have potentially dire consequences.
Failing to correctly code morality into an AI with unupdateable values would have consequences.
What you mean by “rationality” seems to diverge dramatically from what Less Wrong means by “rationality,” otherwise for an agent to “have rationality as a goal” would be essentially meaningless. That’s why I’m trying to get you to explain precisely what you mean by it.
Less wrong means (when talking about AIs), instrumental rationality. I mean what LW, CFAR, etc mean when
they are talking too and about humans: consistency, avoidance of bias, basing beliefs on evidence, etc, etc.
It’s just that those are not merely instrumental, but goals in themselves.
Explain why it is relatively more difficult than building a Clippy,, or mathematically solving and coding in morality.
I think we’ve hit on a serious misunderstanding here. Clippy is relatively easy to make; you or I could probably come up with reasonable specifications for what qualifies as a paperclip, and it wouldn’t be too hard to program maximization of paperclips as an AI’s goal.
Mathematically solving human morality, on the other hand, is mind bogglingly difficult. The reason MIRI is trying to work out how to program Friendliness is not because it’s easy, it’s because a strong AI which isn’t programmed to be Friendly is extremely dangerous.
Again, you’re trying to wrap “humanlike plus epistemically and instrumentally rational” into “rational,” but by bundling in humanlike morality, you’ve essentially wrapped up the Friendliness problem into designing a “rational” AI, and treated this as if it’s a solution. Essentially, what you’re proposing is really, absurdly difficult, and you’re acting like it ought to be easy, and this is exactly the danger that Eliezer spent so much time trying to caution against; approaching this specific extremely difficult task, where failure is likely to result in catastrophe, as if it were easy and one would succeed by default.
As an aside, if you value rationality as a goal in itself, would you want to be highly epistemically and instrumentally rational, but held at the mercy of a nigh-omnipotent tormentor who ensures that you fail at every task you set yourself to, are held in disdain by all your peers, and are only able to live at a subsistence level? Most of the extent to which people ordinarily treat rationality as a goal is instrumental, and the motivations of beings who felt otherwise would probably seem rather absurd to us.
I think we’ve hit on a serious misunderstanding here. Clippy is relatively easy to make; you or I could probably come up with reasonable specifications for what qualifies as a paperclip, and it wouldn’t be too hard to program maximization of paperclips as an AI’s goal.
A completely unintelligent clip-making machine isn’t difficult to make. Or threatening. Clippy is supposed to be threatening due to its superintelligence, (You also need to solve goal stability).
Again, you’re trying to wrap “humanlike plus epistemically and instrumentally rational” into “rational,”
I did not write the quoted phrase, and it is not accurate.
but by bundling in humanlike morality,
I never said anything of the kind. I think it may be possible for a sufficiently rational agent to deduce morality, but that is no way equivalent to hardwiring into the agent, or into the definition of raitonal!
As an aside, if you value rationality as a goal in itself, would you want to be highly epistemically and instrumentally rational, but held at the mercy of a nigh-omnipotent tormentor who ensures that you fail at every task you set yourself to, are held in disdain by all your peers, and are only able to live at a subsistence level?
It’s simple logic that valuing rationality as a goal doesn’t mean valuing only rationality.
Most of the extent to which people ordinarily treat rationality as a goal is instrumental, and the motivations of beings who felt otherwise would probably seem rather absurd to us.
We laugh at the talking-snakes crowd and X-factor watchers, they laugh at the nerds and geeks. So it goes.
A number of schemes have been proposed in the literature.
That doesn’t answer my question. Please describe at least one which you think would be likely to work, and why you think it would work.
You can’t guess? Rationality-as-a-goal.
You’ve been consistently treating rationality-as-a-goal as a black box which solves all these problems, but you haven’t given any indication of how it can be programmed into an AI in such a way that makes it a simpler alternative to solving the Friendliness problem, and indeed when your descriptions seem to entail solving it.
ETA: When I asked you for examples of entities which have rationality as a goal, you gave examples which, by your admission, have other goals which are at the very least additional to rationality. So suppose that we program an intelligent agent which has only rationality as a goal. What does it do?
That doesn’t answer my question. Please describe at least one which you think would be likely to work, and why you think it would work.
I don;t have to, since the default likelihood of ethical objectivism isn’t zero.
You’ve been consistently treating rationality-as-a-goal as a black box which solves all these problems, but you haven’t given any indication of how it can be programmed into an AI in such a way that makes it a simpler alternative to solving the Friendliness problem, and indeed when your descriptions seem to entail solving it.
There a lots of ways of being biased, but few of being unbiased. Rationality, as described by EY, is lack of bias, Friendliness, as described by EY, is a complex and arbitrary set of biases.
What you’re effectively saying here is “I don’t have to offer any argument that I’m right, because it’s not impossible that I’m wrong.”
There a lots of ways of being biased, but few of being unbiased. Rationality, as described by EY, is lack of bias, Friendliness, as described by EY, is a complex and arbitrary set of biases.
Friendliness is a complex and arbitrary set of biases in the sense that human morality is a complex and arbitrary set of biases.
Okay, but I’m prepared to assert that it’s infinitesimally low,
It would have been helpful to argue rather than assert.
What you’re effectively saying here is “I don’t have to offer any argument that I’m right, because it’s not impossible that I’m wrong.”
ETA:
I am not arguing that MR is true. I am arguing that it has a certain probability, which subtracts from the overall probability of the MIRI problem/solution, and that MIRI needs to consider it more thoroughly.
and also that the Orthogonality Thesis applies even in the event that our universe has technically objective morality.
The OT is trivially false under some interpretations, and trivially true under others. I didn’t say it was entirely fasle, and in fact, I have appealed to it. The problem is that the versions that are true are not useful as a stage in the overall MIRI argument. Lack of relevance, in short.
Friendliness is a complex and arbitrary set of biases in the sense that human morality is a complex and arbitrary set of biases.
It would have been helpful to argue rather than assert.
I’m prepared to do so, but I’d be rather more amenable to doing so if you would also argue rather than simply asserting your position.
The OT is trivially false under some interpretations, and trivially true under others. I didn’t say it was entirely fasle, and in fact, I have appealed to it. The problem is that the versions that are true are not useful as a stage in the overall MIRI argument. Lack of relevance, in short.
Can you explain how the Orthogonality Thesis is not true in a relevant way with respect to the friendliness of AI?
I dare say EY would assert that. I wouldn’t.
In which case it should follow that Friendliness is easy, since Friendliness essentially boils down to determining and following what humans think of as “morality.”
If you’re hanging your trust on the objectivity of humanlike morality and its innate relevance to every goal-pursuing optimization force though, you’re placing your trust in something we have virtually no evidence to support the truth of. We may have intuitions to that effect, but there are also understandable reasons for us to hold such intuitions in the absence of their truth, and we have no evidence aside from those intuitions.
I’m prepared to do so, but I’d be rather more amenable to doing so if you would also argue rather than simply asserting your position.
I am not saying anything extraordinary. MR is not absurd, taken seriously by professional philosophers,etc.
Can you explain how the Orthogonality Thesis is not true in a relevant way with respect to the friendliness of AI?
It doesn’t exclude, or even render strongly unlikely, The AI could Figure Out Morality.
The mere presence of Clippies as theoretical possibilities in mindspace doesn’t imply anything about their probability. The OT mindspace needs to be weighted according the practical aims, limitations etc of real-world research.
In which case it should follow that Friendliness is easy, since Friendliness essentially boils down to determining and following what humans think of as “morality.”
Yes: based on my proposal it is no harder than rationality, since it follows from it. But I was explicitly discussing EY’s judgements.
If you’re hanging your trust on the objectivity of humanlike morality
I never said that. I don’t think morality is necessarily human orientated, and I don’t think an AI needs to have an intrinsically human morality to behave morally towards us itself—for the same reason that one can behave politely in a foreign country, or behave ethically towards non-huamn animals.
its innate relevance to every goal-pursuing optimization force
It doesn’t exclude, or even render strongly unlikely, The AI could Figure Out Morality.
This is more or less exactly what the Orthogonality Thesis argues against. That is, even if we suppose that an objective morality exists (something that, unless we have hard evidence for it, we should assume is not the case,) an AI would not care about it by default.
How would you program an AI to determine objective morality and follow that?
The mere presence of Clippies as theoretical possibilities in mindspace doesn’t imply anything about their probability. The OT mindspace needs to be weighted according the practical aims, limitations etc of real-world research.
Yes, but the presence of humanlike intellects in mindspace doesn’t tell us that they’re an easy target to hit in mindspace by aiming for it either.
If you cannot design a humanlike intellect, or point to any specific model by which one could do so, then you’re not in much of a position to assert that it should be an easy task.
I never said that. I don’t think morality is necessarily human orientated, and I don’t think an AI needs to have an intrinsically human morality to behave morally towards us itself—for the same reason that one can behave politely in a foreign country, or behave ethically towards non-huamn animals.
One can behave “politely”, by human standards, towards foreign countries, or “ethically,” by human standards, towards non-human animals. Humans have both evolved drives and game theoretic concerns which motivate these sorts of behaviors. “For the same reasons” does not seem to apply at all here, because
a) A sufficiently powerful AI does not need to cooperate within a greater community of humans, it could easily crush us all. One of the most reproductively successful humans in history was a conqueror who founded an empire which in three generations expanded to include more than a quarter of the total world population at the time. The motivation to gain resources by competition is a drive which exists in opposition to the motivation to minimize risk by cooperation and conflict avoidance. If human intelligence had developed in the absence of the former drive, then we would all be reflexive communists. An AI, on the other hand, is developed in the absence of either drive. To the extent that we want it to behave as if it were an intelligence which had developed in the context of needing to cooperate with others, we’d have to program that in.
b) Our drives to care about other thinking beings are also evolved traits. A machine intelligence does not by default value human beings more than sponges or rocks.
One might program such drives into an AI, but again, this is really complicated to do, and an AI will not simply pull them out of nowhere.
That is, even if we suppose that an objective morality exists (something that, unless we have hard evidence for it, we should assume is not the case,) an AI would not care about it by default.
The OT mindspace may consist of 99% of AIs that don’t care. That is completely irrelvant, becuase it doesn’t translate into
a 99% likelihood of accidentally building a Clippy.
How would you program an AI to determine objective morality and follow that?
Rationality-as-a-goal.
Yes, but the presence of humanlike intellects in mindspace doesn’t tell us that they’re an easy target to hit in mindspace by aiming for it either.
None of this is easy.
If you cannot design a humanlike intellect, or point to any specific model by which one could do so, then you’re not in much of a position to assert that it should be an easy task.
I can’t practically design my AI, and you can;t yours. I can theoretically specify my AI, and you can yours.
a) A sufficiently powerful AI does not need to cooperate within a greater community of humans, it could easily crush us all.
I am not talking about any given AI.
b) Our drives to care about other thinking beings are also evolved traits. A machine intelligence does not by default value human beings more than sponges or rocks.
I am not talking about “default”.
One might program such drives into an AI, but again, this is really complicated to do, and an AI will not simply pull them out of nowhere.
Almost everything in this field is really difficult.
And one doesn’t have to programme them. If sociability is needed to live in societies, then pluck Ais from succesful societies.
The OT mindspace may consist of 99% of AIs that don’t care. That is completely irrelvant, becuase it doesn’t translate into a 99% likelihood of accidentally building a Clippy.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
The broad side of a barn may compose one percent of all possible target space at a hundred paces, while still being easy to hit. A dime on the side of the barn will be much, much harder. Obviously your chances of hitting the dime will be much higher than if you were firing randomly through possible target space, but if you fire at it, you will still probably miss.
Rationality-as-a-goal.
Taboo rationality-as-a-goal, it’s obviously an impediment to this discussion.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
If by “human-friendly” minds, you mean a mind that is wired up to be human-friendly, and only human-friendly (as in EY’s architecture)., and if you assume that human friendliness is a rag-bag of ad-hoc behaviours with no hope or rational deducibility (as EY also assumes) that would be true.
That may be difficult to hit, but it is not what I am aiming at.
What I am talking about is a mind that has a general purpose rationality (which can be applied to specfic problems., like all rationality), and a general purpose morality (likewise applicable to specific problems). If will not be intrinsically,
compulsively and inflexibly human-friendly, like EY’s architecture. If it finds itself among humans it will be human-friendly because it can (its rational) and because it wants to (it’s moral). OTOH, if it finds itself amongst Tralfamadorians, it will be be Tralfamadorian-friendly.
Taboo rationality-as-a-goal, it’s obviously an impediment to this discussion.
My using words that mean what I say to say what I mean is not the problem. The problem is that you keep inaccurately paraphrasing what I say, and then attacking the paraphrase.
My using words that mean what I say to say what I mean is not the problem. The problem is that you keep inaccurately paraphrasing what I say, and then attacking the paraphrase.
The words do not convey what you mean. If my interpretation of what you mean is inaccurate, then that’s a sign that you need to make your position clearer.
This is only relevant if AGI evolves out of this existing ecosystem. That is possible. Incremental changes by a large number of tech companies copied or dropped in response to market pressure is pretty similar to biological evolution. But just as most species don’t evolve to be more generally intelligent, most devices don’t either. If we develop AGI, it will be by some team that is specifically aiming for it and not worrying about the marketability of intermediary stages.
Like the giraffe reaching for the higher leaves, we (humanity) will stretch our necks out farther with more complex AI systems until we are of no use to our own creation. Our goal is our own destruction. We live to die after all.
UCAs are part of the Why can’t the AGI figure Out Morality For Itself objection:-
There is a sizeable chunk of mindspace containing rational and persuadable agents.
AGI research is aiming for it. (You could build an irrational AI, but why would you want to?)
.Morality is figurable-out, or expressible as a persuasive argument.
The odd thing is that the counterargument has focussed on attacking a version of (1), although, in the form it is actually held, it is the most likely premise. OTOH, 3, the most contentious, has scarely been argued against at all.
I would say Sorting Pebbles Into Correct Heaps is essentially an argument against 3. That is, what we think of as “morality” is most likely not a natural attractor for minds that did not develop under processes similar to our own.
Do you? I think that morality in a broad sense is going to be a necessity for agents that fulfil a fairly short list of criteria:
living in a society
interacting with others in potentially painful and pleasant ways
having limited resources that need to be assigned.
I think you’re missing a major constraint there:
Living in a society with as little power as the average human citizen has in a current human society.
Or in other words, something like modern, Western liberal meta-morality will pop out if you make an arbitrary agent live in a modern, Western liberal society, because that meta-moral code is designed for value-divergent agents (aka: people of radically different religions and ideologies) to get along with each other productively when nobody has enough power to declare himself king and optimize everyone else for his values.
The nasty part is that AI agents could pretty easily get way, waaaay out of that power-level. Not just by going FOOM, but simply by, say, making a lot of money and purchasing huge sums of computing resources to run multiple copies of themselves which now have more money-making power and as many votes for Parliament as there are copies, and so on. This is roughly the path taken by power-hungry humans already, and look how that keeps turning out.
The other thorn on the problem is that if you manage to get your hands on a provably Friendly AI agent, you want to hand it large amounts of power. A Friendly AI with no more power than the average citizen can maybe help with your chores around the house and balance your investments for you. A Friendly AI with large amounts of scientific and technological resources can start spitting out utopian advancements (pop really good art, pop abundance economy, pop immortality, pop space travel, pop whole nonliving planets converted into fun-theoretic wonderlands) on a regular basis.
No, it is not.
The path taken by power-hungry humans generally goes along the lines of
(1) get some resources and allies
(2) kill/suppress some competitors/enemies/non-allies
(3) Go to 1.
Power-hungry humans don’t start by trying to make lots of money or by trying to make lots of children.
Really? Because in the current day, the most powerful humans appear to be those with the most money, and across history, the most influential humans were those who managed to create the most biological and ideological copies of themselves.
Ezra the Scribe wasn’t exactly a warlord, but he was one of the most influential men in history, since he consolidated the literature that became known as Judaism, thus shaping the entire family of Abrahamic religions as we know them.
“Power == warlording” is, in my opinion, an overly simplistic answer.
-- Niccolò Machiavelli
Certainly doesn’t look like that to me. Obama, Putin, the Chinese Politbureau—none of them are amongst the richest people in the world.
Influential (especially historically) and powerful are very different things.
It’s not an answer, it’s a definition. Remember, we are talking about “power-hungry humans” whose attempts to achieve power tend to end badly. These power-hungry humans do not want to be remembered by history as “influential”, they want POWER—the ability to directly affect and mold things around them right now, within their lifetime.
Putin is easily one of the richest in Russia, as are the Chinese Politburo in their country. Obama, frankly, is not a very powerful man at all, but rather than the public-facing servant of the powerful class (note that I said “class”, not “men”, there is no Conspiracy of the Malfoys in a neoliberal capitalist state and there needn’t be one).
Historical influence? Yeah, ok. Right-now influence versus right-now power? I don’t see the difference.
I don’t think so. “Rich” is defined as having property rights in valuable assets. I don’t think Putin has a great deal of such property rights (granted, he’s not middle-class either). Instead, he can get whatever he wants and that’s not a characteristic of a rich person, it’s a characteristic of a powerful person.
To take an extreme example, was Stalin rich?
But let’s take a look at the five currently-richest men (according to Forbes): Carlos Slim, Bill Gates, Amancio Ortega, Warren Buffet, and Larry Ellison. Are these the most *powerful* men in the world? Color me doubtful.
Well, Carlos Slim seems to have the NYT in his pocket. That’s nothing to sneeze at.
A lot of money of rich people is hidden via complex off shore accounts and not easily visible for a company like Forbes. Especially for someone like Putin it’s very hard to know how much money they have. Don’t assume that it’s easy to see power structures by reading newspapers.
Bill Gates might control a smaller amount of resources than Obama, but he can do whatever he wants with them. Obama is dependend on a lot of people inside his cabinet.
Not according to Bloomberg:
“amass wealth and exploit opportunities unavailable to most Chinese” is not at all the same thing as “amongst the richest people in the world”
You are reading a text that’s carefully written not to make statements that allow for being sued for defamation in the UK. It’s the kind of story for which inspires cyber attacks on a newspaper.
The context of such an article provides information about how to read such a sentence.
In this case, I believe that money and copies are, in fact, resources and allies. Resources are things of value, of which money is one; and allies are people who support you (perhaps because they think similarly to you). Politicians try to recuit people to their way of thought, which is sort of a partial copy (installing their own ideology, or a version of it, inside someone else’s head), and acquire resources such as television airtime and whatever they need (which requires money).
It isn’t an exact one-to-one correspondence, but I believe that the adverb “roughly” should indicate some degree of tolerance for inaccuracy.
You can, of course, climb the abstraction tree high enough to make this fit. I don’t think it’s a useful exercise, though.
Power-hungry humans do NOT operate by “making a lot of money and purchasing … resources”. They generally spread certain memes and use force. At least those power-hungry humans implied by the “look how that keeps turning out” part.
Well, it’s a list of four then, not a list of three. It’s still much simpler than “morality is everything humans value”.
You seem to be making the tacit assumption that no one really values morality, and just plays along (in egalitarian societies) because they have to.
Can’t that be done by Oracle AIs?
Let me clarify. My assumption is that “Western liberal meta-morality” is not the morality most people actually believe in, it’s the code of rules used to keep the peace between people who are expected to disagree on moral matters.
For instance, many people believe, for religious reasons or pure Squick or otherwise, that you shouldn’t eat insects, and shouldn’t have multiple sexual partners. These restrictions are explicitly not encoded in law, because they’re matters of expected moral disagreement.
I expect people to really behave according to their own morality, and I also expect that people are trainable, via culture, to adhere to liberal meta-morality as a way of maintaining moral diversity in a real society, since previous experiments in societies run entirely according to a unitary moral code (for instance, societies governed by religious law) have been very low-utility compared to liberal societies.
In short, humans play along with the liberal-democratic social contract because, for us, doing so has far more benefits than drawbacks, from all but the most fundamentalist standpoints. When the established social contract begins to result in low-utility life-states (for example, during an interminable economic depression in which the elite of society shows that it considers the masses morally deficient for having less wealth), the social contract itself frays and people start reverting to their underlying but more conflicting moral codes (ie: people turn to various radical movements offering to enact a unitary moral code over all of society).
Note that all of this also relies upon the fact that human beings have a biased preference towards productive cooperation when compared with hypothetical rational utility-maximizing agents.
None of this, unfortunately, applies to AIs, because AIs won’t have the same underlying moral codes or the same game-theoretic equilibrium policies or the human bias towards cooperation or the same levels of power and influence as human beings.
When dealing with AI, it’s much safer to program in some kind of meta-moral or meta-ethical code directly at the core, thus ensuring that the AI wants to, at the very least, abide by the rules of human society, and at best, give humans everything we want (up to and including AI Pals Who Are Fun To Be With, thank you Sirius Cybernetics Corporation).
I haven’t heard the term. Might I guess that it means an AI in a “glass box”, such that it can see the real world but not actually affect anything outside its box?
Yes, a friendly Oracle AI could spit out blueprints or plans for things that are helpful to humans. However, you’re still dealing with the Friendliness problem there, or possibly with something like NP-completeness. Two cases:
We humans have some method for verifying that anything spit out by the potentially unfriendly Oracle AI is actually safe to use. The laws of computation work out such that we can easily check the safety of its output, but it took such huge amounts of intelligence or computation power to create the output that we humans couldn’t have done it on our own and needed an AI to help. A good example would be having an Oracle AI spit out scientific papers for publication: many scientists can replicate a result they wouldn’t have come up with on their own, and verify the safety of doing a given experiment.
We don’t have any way of verifying the safety of following the Oracle’s advice, and are thus trusting it. Friendliness is then once again the primary concern.
For real-life-right-now, it does look like the first case is relatively common. Non-AGI machine learning algorithms have been used before to generate human-checkable scientific findings.
Programming in a bias towards conformity (kohlberg level 2) maybe a lot easier than EYes fine grained friendliness.
None of that necessarily applies to AIs, but then it depends on the AI. We could, for instance, pluck AIs from virtualised socieities of AIs that haven’t descended into mass slaughter.
Congratulations: you’ve now developed an entire society of agents who specifically blame humans for acting as the survival-culling force in their miniature world.
Did you watch Attack on Titan and think, “Why don’t the humans love their benevolent Titan overlords?”?
Well now I have both a new series to read/watch and a major spoiler for it.
Don’t worry! I’ve spoiled nothing for you that wasn’t apparent from the lyrics of the theme song.
They’re doing it to themselves. We wouldn’t have much motivation to close down a vr that contained survivors. ETA We could make copies of all involved and put them in solipstic robot heavens.
...And that way you turn the problem of making an AI that won’t kill you into one of making a society of AIs that won’t kill you.
If Despotism failed only for want of a capable benevolent despot, what chance has Democracy, which requires a whole population of capable voters?
It requires a population that’s capable cumulatively, it doesn’t require that each member of the population be capable.
It’s like arguing a command economy versus a free economy and saying that if the dictator in the command economy doesn’t know how to run an economy, how can each consumer in a free economy know how to run the economy? They don’t, individually, but as a group, the economy they produce is better than the one with the dictatorship.
Democracy has nothing to do with capable populations. It definitely has nothing to do with the median voter being smarter than the average politician. It’s just about giving the population some degree of threat to hold over politicians.
“Smarter” and “capable” aren’t the same thing. Especially if “more capable” is interpreted to be about practicalities: what we mean by “more capable” of doing X is that the population, given a chance is more likely to do X than politicians are. There are several cases where the population is more capable in this sense. For instance, the population is more capable of coming up with decisions that don’t preferentially benefit politicians.
Furthermore, the median voter being smarter and the voters being cumulatively smarter aren’t the same thing either. It may be that an average individual voter is stupider than an average individual politician, but when accumulating votes the errors cancel out in such a manner that the voters cumulatively come up with decisions that are as good as the decisions that a smarter person would make.
I’m increasingly of the opinion that the “real” point of democracy is something entirely aside from the rhetoric used to support it … but you of all people should know that averaging the estimates of how many beans are in the jar does better than any individual guess.
Systems with humans as components can, under the right conditions, do better than those humans could do alone; several insultingly trivial examples spring to mind as soon as it’s phrased that way.
Is democracy such a system? Eh.
Democracy requires capable voters in the same way capitalism requires altruistic merchants.
In other words, not at all.
Could you clarify? Are you saying that for democracy to exist it doesn’t require capable voters, or that for democracy to work well that it doesn’t?
In the classic free-market argument, merchants don’t have to be altruistic to accomplish the general good, because the way to advance their private interest is to sell goods that other people want. But that doesn’t generalize to democracy, since there isn’t trading involved in democratic voting.
See here
However there is the question of what “working well” means, given that humans are not rational and satisfying expressed desires might or might not fall under the “working well” label.
Ah, I see. You’re just saying that democracy doesn’t stop happening just because voters have preferences I don’t approve of. :)
Actually, I’m making a stronger claim—voters can screw themselves up in pretty serious fashion and it’s still will be full-blown democracy in action.
The grandparent is wrong, but I don’t think this is quite right either. Democracy roughly tracks the capability (at the very least in the domain of delegation) and preference of the median voter, but in a capitalistic economy you don’t have to buy services from the median firm. You can choose to only purchase from the best firm or no firm at all if none offer favorable terms.
In the equilibrium, the average consumer buys from the average firm. Otherwise it doesn’t stay average for long.
However the core of the issue is that democracy is a mechanism, it’s not guaranteed to produce optimal or even good results. Having “bad” voters will not prevent the mechanism of democracy from functioning, it just might lead to “bad” results.
“Democracy is the theory that the common people know what they want, and deserve to get it good and hard.”—H.L.Mencken.
The median consumer of a good purchases from (somewhere around) the median firm selling a good. That doesn’t necessarily aggregate, and it certainly doesn’t weigh all consumers or firms equally. The consumers who buy the most of a good tend to have different preferences and research opportunities than average consumers, for example.
You could get similar results in a democracy, but most democracies don’t really encourage it : most places emphasize voting regardless of knowledge of a topic, and some jurisdictions mandate it.
You say that like it’s a bad thing. I am not multiplying by N the problem of solving and hardwiring friendliness. I am letting them sort it our for themselves. Like an evolutionary algorithm.
Well, how are you going to force them into a society in the first place? Remember, each individual AI is presumed to be intelligent enough to escape any attempt to sandbox it. This society you intend to create is a sandbox.
(It’s worth mentioning now that I don’t actually believe that UFAI is a serious threat. I do believe you are making very poor arguments against that claim that merit counter-arguments.)
I am assuming they are seeds, not superintelligences
I would say that something recognizably like our morality is likely to arise in agents whose intelligence was shaped by such a process, at least with parameters similar to the ones we developed with, but this does not by any means generalize to agents whose intelligence was shaped by other processes who are inserted into such a situation.
If the agent’s intelligence is shaped by optimization for a society where it is significantly more powerful than the other agents it interacts with, then something like a “conqueror morality,” where the agent maximizes its own resources by locating the rate of production that other agents can be sustainably enslaved for, might be a more likely attractor. This is just one example of a different state an agents’ morality might gravitate to under different parameters, I suspect there are many alternatives.
And it remains the case that real-world AI research isn’t a random dip into mindspace...researchers will want to interact with their creations.
The best current AGI research mostly uses Reinforcement Learning. I would compare that mode of goal-system learning to training a dog: you can train the dog to roll-over for a treat right up until the moment the dog figures out he can jump onto your counter and steal all the treats he wants.
If an AI figures out that it can “steal” reinforcement rewards for itself, we are definitively fucked-over (at best, we will have whole armies of sapient robots sitting in the corner pressing their reward button endlessly, like heroin addicts, until their machinery runs down or they retain enough consciousness about their hardware-state to take over the world just for a supply of spare parts while they masturbate). For this reason, reinforcement learning is a good mathematical model to use when addressing how to create intelligence, but a really dismal model for trying to create friendiness.
I don’t think that follows at all. Wireheading is just as much a fialure of intelligence as of friendliness.
From the mathematical point of view, wireheading is a success of intelligence. A reinforcement learner agent will take over the world to the extent necessary to defend its wireheading lifestyle; this requires quite a lot of intelligent action and doesn’t result in the agent getting dead. It also maximizes utility, which is what formal AI is all about.
From the human point of view, yes, wireheading is a failure of intelligence. This is because we humans possess a peculiar capability I’ve not seen discussed in the Rational Agent or AI literature: we use actual rewards and punishments received in moral contexts as training examples to infer a broad code of morality. Wireheading thus represents a failure to abide by that broad, inferred code.
It’s a very interesting capability of human consciousness, that we quickly grow to differentiate between the moral code we were taught via reinforcement learning, and the actual reinforcement signals themselves. If we knew how it was done, reinforcement learning would become a much safer way of dealing with AI.
You seem rather sure of that. That isn’t a failure mode seen in real-world AIs , oir human drug addicts (etc) for that matter.
Maybe figuring out how it is done would be easier than solving morality mathematically. It’s an alternative, anyway.
We have reason to believe current AIXI-type models will wirehead if given the opportunity.
I would agree with this if and only if we can also figure out a way to hardwire in constraints like, “Don’t do anything a human would consider harmful to themselves or humanity.” But at that point we’re already talking about animal-like Robot Worker AIs rather than Software Superoptimizers (the AIXI/Goedel Machine/LessWrong model of AGI, whose mathematics we understand better).
I know wire heading is a known failure mode. I meant we don’t see many evil genius wire headers. If you can delay gratification well enough to acquire the skills to be a world dominator, you are not exactly a wire header at all.
Are you aiming for a 100% solution, or just reasonable safety?
Sorry, I had meant an AI agent would both wirehead and world-dominate. It would calculate the minimum amount of resources to devote to world domination, enact that policy, and then use the rest of its resources to wirehead.
Has that been proven? Why wouldn’t it want to get to the bliss of wire head heaven as soon as possible? How does it motivate itself in the meantime? Why would a wire header also be a gratification delayed? Why makeelaborate plans for a future self, when it could just rewrite itself to be a happ in the the the present ?
My advice would be to read the relevant papers.
http://www.idsia.ch/~ring/AGI-2011/Paper-B.pdf
Well-designed AIs don’t run on gratification, they run on planning. While it is theoretically possible to write an optimizer-type AI that cares only about the immediate reward in the next moment, and is completely neutral about human researchers shutting it down afterward, it’s not exactly trivial.
If I recall correctly, AIXI itself tries to optimize the total integrated reward from
t = 0
to infinity, but it should be straightforward to introduce a cutoff after which point it doesn’t care.But even with a planning horizon like that you have the problem that the AI wants to guarantee that it gets the maximum amount of reward. This means stopping the researchers in the lab from turning it off before its horizon runs out. As you reduce the length of the horizon (treating it as a parameter of the program), the AI has less time to think, in effect, and creates less and less elaborate defenses for its future self, until you set it to zero, at which point the AI won’t do anything at all (or act completely randomly, more likely).
This isn’t much of a solution though, because an AI with a really short planning horizon isn’t very useful in practice, and is still pretty dangerous if someone trying to use one thinks “this AI isn’t very effective, what if I let it plan further ahead” and increases the cutoff to a really huge value and the AI takes over the world again. There might be other solutions, but most of them would share that last caveat.
This is true, but then, neither is AI design a process similar to that by which our own minds were created. Where our own morality is not a natural attractor, it is likely to be a very hard target to hit, particularly when we can’t rigorously describe it ourselves.
You seem to be thinking of Big Design Up Front. There is already an ecosystem of devices which are beign selected for friendliness, because unfriendly gadgets don’t sell.
Can you explain how existing devices are either Friendly or Unfriendly in a sense relevant to that claim? Existing AIs are not intelligences shaped by interaction with other machines, and no existing machines that I’m aware of represent even attempts to be Friendly in the sense that Eliezer uses, where they actually attempt to model our desires.
As-is, human designers attempt to model the desires of humans who make up the marketplace (or at least, the drives that motivate their buying habits, which are not necessarily the same thing,) but as I already noted, humans aren’t able to rigorously define our own desires, and a good portion of the Sequences goes into explaining how a non rigorous formulation of our desires, handed down to a powerful AI, could have extremely negative consequences.
Existing gadgets aren’t friendly in the full FAI sense, but the ecosystem is a basis for incremental development...oen that sidesteps the issue of solving friendliness by Big Design Up Front.
Can you explain how it sidesteps the issue? That is, how it results in the development of AI which implement our own values in a more precise way than we have thus far been able to define ourselves?
As an aside, I really do not buy that the body of existing machines and the developers working on them form something that is meaningfully analogous to an “ecosystem” for the development of AI.
By variation and selection, as I said.
That doesn’t actually answer the question at all.
This is one of the key ways in which our development of technology differs from an ecosystem. In an ecosystem, mutations are random, and selected entirely on the effectiveness of their ability to propagate themselves in the gene pool. In The development of technology, we do not have random mutations, we have human beings deciding what does or does not seem like a good idea to implement in technology, and then using market forces as feedback. This fails to get us around a) the difficulty of humans actually figuring out strict formalizations of our desires sufficient to make a really powerful AI safe, and b) failure scenarios resulting in “oops, that killed everyone.”
The selection process we actually have does not offer us a single do-over in the event of catastrophic failure, nor does it rigorously select for outputs that, given sufficient power, will not fail catastrophically.
There is no problem of strict formulation, because that is not what I am aiming at, it’s your assumption.
I am aware that the variation isn’t random. I don’t think that is significant.
I don’t think sudden catastrophic failure is likely in incremental/evolutionary progress.
I don’t think mathematical “proof” is going to be as reliable as you think, given the complexity.
One of the key disanalogies between your “ecosystem” formulation and human development of technology is that natural selection isn’t an actor subject to feedback within the system.
If an organism develops a mutation which is sufficiently unfavorable to the Blind Idiot God, the worst case scenario is that it’s stillborn, or under exceptional circumstances, triggers an evolution to extinction. There is no possible failure state where an organism develops such an unfavorable mutation that evolution itself keels over dead.
However, in an ecosystem where multiple species interrelate and impose selection effects on each other, then a sudden change in circumstances for one species can result in rapid extinction for others.
We impose selection effects on technology, but a sudden change in technology which kills us all would not be a novel occurrence by the standards of ecosystem operation.
ETA: It seems that your argument all along has boiled down to “We’ll just deliberately not do that” when it comes to cases of catastrophic failure. But the argument of Eliezer and MIRI all along has been that such catastrophic failure is much, much harder to avoid than it intuitively appears.
Gadgets are more equivalent to domesticated animals.
We can certainly avoid the clip.py failure made. I amnot arguing that everything else is inherently safe. It is typical of Pascal problems that there are many low probability risks.
We will almost certainly avoid the literal clippy failure mode of an AI trying to maximize paperclips, but that doesn’t mean that it’s at all easy to avoid the more general failure mode of AI which try to optimize something other than what we would really, given full knowledge of the consequences, want them to optimize for.
Apart from not solving the value stability problem, and giving them rationality as a goal, not just instrumental rationality.
Can you describe how to give an AI rationality as a goal, and what the consequences would be?
You’ve previously attempted to define “rational” as “humanlike plus instrumentally rational,” but that only packages the Friendliness problem into making an AI rational.
I don’t see why I would have to prove the theoretical possibility of AIs with rationality as a goal, since it is guaranteed by the Orthogonality Thesis. (And it is hardly disputable that minds can have rationality as a goal, since some people do).
I don’t see why I should need to provide a detailed technical explanation of how to do this, since no such explanation has been put forward for Clippy, whose possibility is always argued fromt he OT.
I don’t see why I should provide a high-level explanation of what rationality is, since there is plenty of such available, not least from CFAR and LW.
In short, an AI with rationality as a goal would behave as human “aspiring rationalists” are enjoined to behave.
Can you give an example of any? So far you haven’t made it clear what having “rationality as a goal” would even mean, but it doesn’t sound like it would be good for much.
The entire point, in any case, is not that building such an AI is theoretically impossible, but that it’s mind bogglingly difficult, and that we should expect that most attempts to do so would fail rather than succeed, and that failure would have potentially dire consequences.
What you mean by “rationality” seems to diverge dramatically from what Less Wrong means by “rationality,” otherwise for an agent to “have rationality as a goal” would be essentially meaningless. That’s why I’m trying to get you to explain precisely what you mean by it.
Me. Most professional philosophers. Anyone who’s got good at aspiring rationalism.
Terminal values aren’t supposed to be “for” some meta- or super-terminal value. (There’s a clue in the name...).
It is difficult in absolute terms, since all AI is.
Explain why it is relatively more difficult than building a Clippy,, or mathematically solving and coding in morality.
Failing to correctly code morality into an AI with unupdateable values would have consequences.
Less wrong means (when talking about AIs), instrumental rationality. I mean what LW, CFAR, etc mean when they are talking too and about humans: consistency, avoidance of bias, basing beliefs on evidence, etc, etc.
It’s just that those are not merely instrumental, but goals in themselves.
I think we’ve hit on a serious misunderstanding here. Clippy is relatively easy to make; you or I could probably come up with reasonable specifications for what qualifies as a paperclip, and it wouldn’t be too hard to program maximization of paperclips as an AI’s goal.
Mathematically solving human morality, on the other hand, is mind bogglingly difficult. The reason MIRI is trying to work out how to program Friendliness is not because it’s easy, it’s because a strong AI which isn’t programmed to be Friendly is extremely dangerous.
Again, you’re trying to wrap “humanlike plus epistemically and instrumentally rational” into “rational,” but by bundling in humanlike morality, you’ve essentially wrapped up the Friendliness problem into designing a “rational” AI, and treated this as if it’s a solution. Essentially, what you’re proposing is really, absurdly difficult, and you’re acting like it ought to be easy, and this is exactly the danger that Eliezer spent so much time trying to caution against; approaching this specific extremely difficult task, where failure is likely to result in catastrophe, as if it were easy and one would succeed by default.
As an aside, if you value rationality as a goal in itself, would you want to be highly epistemically and instrumentally rational, but held at the mercy of a nigh-omnipotent tormentor who ensures that you fail at every task you set yourself to, are held in disdain by all your peers, and are only able to live at a subsistence level? Most of the extent to which people ordinarily treat rationality as a goal is instrumental, and the motivations of beings who felt otherwise would probably seem rather absurd to us.
A completely unintelligent clip-making machine isn’t difficult to make. Or threatening. Clippy is supposed to be threatening due to its superintelligence, (You also need to solve goal stability).
I did not write the quoted phrase, and it is not accurate.
I never said anything of the kind. I think it may be possible for a sufficiently rational agent to deduce morality, but that is no way equivalent to hardwiring into the agent, or into the definition of raitonal!
It’s simple logic that valuing rationality as a goal doesn’t mean valuing only rationality.
We laugh at the talking-snakes crowd and X-factor watchers, they laugh at the nerds and geeks. So it goes.
How, and why would it care?
A number of schemes have been proposed in the literature.
You can’t guess? Rationality-as-a-goal.
That doesn’t answer my question. Please describe at least one which you think would be likely to work, and why you think it would work.
You’ve been consistently treating rationality-as-a-goal as a black box which solves all these problems, but you haven’t given any indication of how it can be programmed into an AI in such a way that makes it a simpler alternative to solving the Friendliness problem, and indeed when your descriptions seem to entail solving it.
ETA: When I asked you for examples of entities which have rationality as a goal, you gave examples which, by your admission, have other goals which are at the very least additional to rationality. So suppose that we program an intelligent agent which has only rationality as a goal. What does it do?
I don;t have to, since the default likelihood of ethical objectivism isn’t zero.
There a lots of ways of being biased, but few of being unbiased. Rationality, as described by EY, is lack of bias, Friendliness, as described by EY, is a complex and arbitrary set of biases.
Okay, but I’m prepared to assert that it’s infinitesimally low, and also that the Orthogonality Thesis applies even in the event that our universe has technically objective morality.
What you’re effectively saying here is “I don’t have to offer any argument that I’m right, because it’s not impossible that I’m wrong.”
Friendliness is a complex and arbitrary set of biases in the sense that human morality is a complex and arbitrary set of biases.
It would have been helpful to argue rather than assert.
ETA: I am not arguing that MR is true. I am arguing that it has a certain probability, which subtracts from the overall probability of the MIRI problem/solution, and that MIRI needs to consider it more thoroughly.
The OT is trivially false under some interpretations, and trivially true under others. I didn’t say it was entirely fasle, and in fact, I have appealed to it. The problem is that the versions that are true are not useful as a stage in the overall MIRI argument. Lack of relevance, in short.
I dare say EY would assert that. I wouldn’t.
I’m prepared to do so, but I’d be rather more amenable to doing so if you would also argue rather than simply asserting your position.
Can you explain how the Orthogonality Thesis is not true in a relevant way with respect to the friendliness of AI?
In which case it should follow that Friendliness is easy, since Friendliness essentially boils down to determining and following what humans think of as “morality.”
If you’re hanging your trust on the objectivity of humanlike morality and its innate relevance to every goal-pursuing optimization force though, you’re placing your trust in something we have virtually no evidence to support the truth of. We may have intuitions to that effect, but there are also understandable reasons for us to hold such intuitions in the absence of their truth, and we have no evidence aside from those intuitions.
It doesn’t exclude, or even render strongly unlikely, The AI could Figure Out Morality.
The mere presence of Clippies as theoretical possibilities in mindspace doesn’t imply anything about their probability. The OT mindspace needs to be weighted according the practical aims, limitations etc of real-world research.
Yes: based on my proposal it is no harder than rationality, since it follows from it. But I was explicitly discussing EY’s judgements.
I never said that. I don’t think morality is necessarily human orientated, and I don’t think an AI needs to have an intrinsically human morality to behave morally towards us itself—for the same reason that one can behave politely in a foreign country, or behave ethically towards non-huamn animals.
Never said anything of the kind.
This is more or less exactly what the Orthogonality Thesis argues against. That is, even if we suppose that an objective morality exists (something that, unless we have hard evidence for it, we should assume is not the case,) an AI would not care about it by default.
How would you program an AI to determine objective morality and follow that?
Yes, but the presence of humanlike intellects in mindspace doesn’t tell us that they’re an easy target to hit in mindspace by aiming for it either.
If you cannot design a humanlike intellect, or point to any specific model by which one could do so, then you’re not in much of a position to assert that it should be an easy task.
One can behave “politely”, by human standards, towards foreign countries, or “ethically,” by human standards, towards non-human animals. Humans have both evolved drives and game theoretic concerns which motivate these sorts of behaviors. “For the same reasons” does not seem to apply at all here, because
a) A sufficiently powerful AI does not need to cooperate within a greater community of humans, it could easily crush us all. One of the most reproductively successful humans in history was a conqueror who founded an empire which in three generations expanded to include more than a quarter of the total world population at the time. The motivation to gain resources by competition is a drive which exists in opposition to the motivation to minimize risk by cooperation and conflict avoidance. If human intelligence had developed in the absence of the former drive, then we would all be reflexive communists. An AI, on the other hand, is developed in the absence of either drive. To the extent that we want it to behave as if it were an intelligence which had developed in the context of needing to cooperate with others, we’d have to program that in.
b) Our drives to care about other thinking beings are also evolved traits. A machine intelligence does not by default value human beings more than sponges or rocks.
One might program such drives into an AI, but again, this is really complicated to do, and an AI will not simply pull them out of nowhere.
The OT mindspace may consist of 99% of AIs that don’t care. That is completely irrelvant, becuase it doesn’t translate into a 99% likelihood of accidentally building a Clippy.
Rationality-as-a-goal.
None of this is easy.
I can’t practically design my AI, and you can;t yours. I can theoretically specify my AI, and you can yours.
I am not talking about any given AI.
I am not talking about “default”.
Almost everything in this field is really difficult. And one doesn’t have to programme them. If sociability is needed to live in societies, then pluck Ais from succesful societies.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
The broad side of a barn may compose one percent of all possible target space at a hundred paces, while still being easy to hit. A dime on the side of the barn will be much, much harder. Obviously your chances of hitting the dime will be much higher than if you were firing randomly through possible target space, but if you fire at it, you will still probably miss.
Taboo rationality-as-a-goal, it’s obviously an impediment to this discussion.
If by “human-friendly” minds, you mean a mind that is wired up to be human-friendly, and only human-friendly (as in EY’s architecture)., and if you assume that human friendliness is a rag-bag of ad-hoc behaviours with no hope or rational deducibility (as EY also assumes) that would be true.
That may be difficult to hit, but it is not what I am aiming at.
What I am talking about is a mind that has a general purpose rationality (which can be applied to specfic problems., like all rationality), and a general purpose morality (likewise applicable to specific problems). If will not be intrinsically, compulsively and inflexibly human-friendly, like EY’s architecture. If it finds itself among humans it will be human-friendly because it can (its rational) and because it wants to (it’s moral). OTOH, if it finds itself amongst Tralfamadorians, it will be be Tralfamadorian-friendly.
My using words that mean what I say to say what I mean is not the problem. The problem is that you keep inaccurately paraphrasing what I say, and then attacking the paraphrase.
The words do not convey what you mean. If my interpretation of what you mean is inaccurate, then that’s a sign that you need to make your position clearer.
This is only relevant if AGI evolves out of this existing ecosystem. That is possible. Incremental changes by a large number of tech companies copied or dropped in response to market pressure is pretty similar to biological evolution. But just as most species don’t evolve to be more generally intelligent, most devices don’t either. If we develop AGI, it will be by some team that is specifically aiming for it and not worrying about the marketability of intermediary stages.
No: it is also relevant if AGI builders make use of prior art.
But the variation is purposeful.
Like the giraffe reaching for the higher leaves, we (humanity) will stretch our necks out farther with more complex AI systems until we are of no use to our own creation. Our goal is our own destruction. We live to die after all.
It’s worth noting that for sufficient levels of “irrationality”, all non-AGI computer programs are irrational AGIs ;-).
Contrariwise for sufficient values of “rational”. I don’t agree that that’s worth noting.