I don’t think intelligence explosion is imminent either. But I believe it’s certain to eventually happen, absent the end of civilization before that. And I believe that its outcome depends exclusively on the values of the agents driving it, hence we need to be ready, with good understanding of preference theory at hand when the time comes. To get there, we need to start somewhere. And right now, almost nobody is doing anything in that direction, and there is very poor level of awareness of the problem and poor intellectual standards of discussing the problem where surface awareness is present.
Either right now, or 50, or 100 years from now, a serious effort has to be taken on, but the later it starts, the greater the risk of being too late to guide the transition in a preferable direction. The problem itself, as a mathematical and philosophical challenge, sounds like something that could easily take at least 100 years to reach clear understanding, and that is the deadline we should worry about, starting 10 years too late to finish in time 100 years from now.
Vladimir, I agree with you that people should be thinking intelligence explosion, that there’s a very poor level of awareness of the problem, and that the intellectual standards for discourse about this problem in the general public are poor.
I have not been convinced but am open toward the idea that a paperclip maximizer is the overwhelmingly likely outcome if we create a superhuman AI. At present, my thinking is that if some care is taking in the creation of a superhuman AI, more likely than a paperclip maximizer is an AI which partially shares human values, that is, the dicotomy “paper clip maximizer vs. Friendly AI” seems like a false dicotomy—I imagine that the sort of AI that people would actually build would be somewhere in the middle. Any recommended reading on this point appreciated.
SIAI seems to have focused on the existential risk of “unfriendly intelligence explosion” and it’s not clear to me that this existential risk is greater than the risks coming from world war and natural resource shortage.
the dichotomy “paper clip maximizer vs. Friendly AI” seems like a false dichotomy—I imagine that the sort of AI that people would actually build would be somewhere in the middle. Any recommended reading on this point appreciated.
Mainly Complexity of value. There is no way for human values to magically jump inside the AI, so if it’s not specifically created to reflect them, it won’t have them, and whatever the AI ends up with won’t come close to human values, because human values are too complex to be resembled by any given structure that happens to be formed in the AI.
The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little). The falloff with imperfect reflection of values might be so sharp that any ad-hoc solution turns the future worthless. Or maybe not, with certain classes of values that contain a component of sympathy that reflects values perfectly while giving them smaller weight in the overall game, but then we’d want to technically understand this “sympathy” to have any confidence in the outcome.
The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little).
This depends on something like aggregative utilitarianism. If additional resources have diminishing marginal value in fulfilling human aims, that getting a little slice of the universe (in the course of negotiating terms of surrender with the inhuman AI, if it can make credible commitments, or because we serve as acausal bargaining chips with other civilizations elsewhere in the universe) may be enough. Is getting 100% of the lightcone a hundred times better than 1%?
Is getting 100% of the lightcone a hundred times better than 1%?
I think yes, if we take into account that the more of the lightcone we (our FAI) get, the more trading opportunities we would have with UFAI in other possible worlds. Diminishing marginal value shouldn’t apply across possible worlds, because otherwise it would imply gross violations of expected utility maximization.
Also, I suspect that there are possible worlds with much greater resources than our universe (perhaps with physics that allow hypercomputation, or just many orders of magnitude more total exploitable resources), and some of them would have potential trading partners who are willing to give us a small share of their world for a large share of ours. We may eventually achieve most of our value from trading with them. But of course such trade wouldn’t be possible if we didn’t have something to trade with!
Interesting. This suggests thinking about FAI not as using its control to produce terminal value in its own world, but as using its control to buy as much terminal value as it can, in various world-programs. Since it doesn’t matter where the value is produced, most of the value doesn’t have to be produced in the possible worlds with FAIs in them. Indeed, it sounds unlikely that specifically the FAI worlds will be optimal for FAI-value optimization. FAIs (and the worlds they control) act as instrumental leverage, a way of controlling the global mathematical universe into having more value for our preference.
Thus, more FAIs means stronger control over the mathematical universe, while more UFAIs mean that the mathematical universe is richer, and so the FAIs can get more value out of it with the same control. The metaphors of trade and comparative advantage start applying again, not on the naive level of cohabitation on the same world, but on the level of the global ontology. Mathematics grants you total control over your domain, so that your “atoms” can’t be reused for something else by another stronger agent, and so you do benefit from most superintelligent “aliens”.
Yes, assuming that trading across possible worlds can be done in the first place. One thing that concerns me is the combinatorial explosion of potential trading partners. How do they manage to “find” each other?
It’s the same combinatorial explosion as with the future possible worlds. Even though you can’t locate individual valuable future outcomes (through certain instrumental sequences of exact events), you can still make decisions about your actions leading to certain consequences “in bulk”, and I expect the trade between possible worlds can be described similarly (after all, it does work on exactly the same decision-making algorithm). Thus, you usually won’t know who are you trading with, exactly, but on the net estimate that your actions are in the right direction.
I currently agree it’s a bad analogy and I no longer endorse the position that global acausal trade is probably feasible, although its theoretical possibility seems to be a stable conclusion.
There are two distinct issues here: (1) how high would a human with original preference value a universe which only gives a small weight to their preference, and (2) how likely is the changed preference to give any weight whatsoever to the original preference, in other words to produce a universe to any extent valuable to the original preference, even if original preference values universes only weakly optimized in its direction.
Moving to a different preference is different from lowering weight of the original preference. A slightly changed (formal definition of) preference may put no weight at all on the preceding preference. The optimal outcome according to the modified preference can thus be essentially moral noise, paperclips, to the original preference. Giving a small slice of the universe, on the other hand, is what you get out of aggregation of preference, and a changed preference doesn’t necessarily have a form of aggregation that includes original preference. (On the other hand, there is a hope that human-like preferences include sympathy, which does make them aggregate preferences of other persons with some weight.)
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc). If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc).
Game-theoretic considerations are only relevant when you have non-trivial control, not when your atoms are used for something else. If singleton’s preference gives some weight to your preference, this is a case of having control directly through the singleton’s preference, but the origin of this control is not game-theoretic. If the singleton’s preference has sympathy for your preference, your explicit instantiation in the world doesn’t need to have any control, in order to win through the implicit control via singleton’s preference.
Game-theoretic aggregation, on the other hand, doesn’t work by influence on other agent’s preference. You only get your slice of the world because you already control it. Another agent may perform trade, but this is trade of control, rearranging what specifically each of you controls, without changing your preferences.
I assume that control will be winner-takes-all, so preferences of other agents existing at the time only matter if the winner’s preference directly pays to their preferences any attention, but not if they had some limited control from the start.
If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
My point is that inhuman AI may give no weight to our preference, while FAI may give at least some weight to everyone’s preference. Game-theoretic trade won’t matter here because agents other than the singleton have no control to bargain with. FAI gives weight to other preferences not because of trade, but by construction from the start, even if people it gives weight to don’t exist at all (FAI giving them weight in optimization might cause them to appear, or a better event at least as good from their perspective).
You only get your slice of the world because you already control it.
This isn’t obviously the most natural way to describe a scenario in which an AI thinks it has a 90% chance of winning a conflict with humanity, but also has the ability to jointly create (with humanity) agents to enforce an agreement (and can do this quickly enough to be relevant), so cuts a deal splitting up the resources of the light cone at a 9:1 ratio.
I assume that control will be winner-takes-all,
Given that there are plausible sets of parameter values where this assumption is false, we can’t use it to assess overall expected value to astronomical precision.
Game-theoretic considerations are only relevant when you have non-trivial control,
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers. Indeed, the AI might treat us well simply because of the chance that benevolent non-human aliens will respond positively if its algorithm has this output (as the benevolent aliens might be modeling the AI’s algorithm).
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers.
Yes, I forgot about that (though I remain uncertain about how well this argument works, not having worked out a formal model). To summarize the arguments for why future is still significantly more valuable than what we have now, even if we run into Unfriendly AI,
(1) if there is a non-negligible chance that we’ll have FAI in the future, or that we could’ve created FAI if some morally random facts in the past (such as the coin in counterfactual mugging) were different, then we can estimate the present expected value of the world as pretty high, as a factor of getting whole universes (counterfactually or probably) optimized towards your specific preference is present in the expected utility computation. The counterfactual value is present even if it’s certain that the future contains Unfriendly AI.
(2) It’s even better, because the unfriendly singletons will also optimize their worlds towards your preference a little for game-theoretic reasons, even if they don’t care at all about your preference. This game is not with you personally, a human that controls very little and whose control can’t compel a singleton to any significant extent, but with the counterfactual FAIs. The FAIs that could be created, but weren’t, can act as Omega in counterfactual mugging, making it profitable for the indifferent singletons to pay the FAI a little in FAI-favored kind of world-optimization.
(3) Some singletons that don’t follow your preference in particular, but have remotely human-like preference, will have a component of sympathy in their preference, and will dole your preference some fair portion of control in their world, that is much greater than the portion of control you held originally. This sympathy seems to be godshatter of game-theoretic considerations that compel even singletons with non-evolved (artificial, random) preferences according to arguments (1) and (2).
The conclusion to this seems to be that creating an Unfriendly AI is significantly better than ending up with no rational singleton at all (existential disaster that terminates civilization), but significantly worse than a small chance of FAI.
Your comments are mostly good, but I dispute the final assumption that no singleton ⇒ disaster. There has as yet been no investigation into the merits of singleton vs. an economy (or ecosystem) of independent agents.
If we were living in the 18th century, it would be reasonable to suppose that the only stable situation is one where one agent is king. But we are not.
So there’s the utility difference between business-as-usual (no AI), and getting a small share of resources optimized for your preference, and the utility difference between getting small and large shares of resources. If the second difference is much larger than the first, then (1) is crucial, and (2) and (3) are not so good. But if the first difference is much bigger than the second, the pattern is the reverse.
And if we’re comparing expected utility conditioning on no local FAI here and EU conditioning on FAI here, moderate credences can suffice (depending on the shape of your utility function).
Whether FAI is local or not can’t matter, whether something is real or counterfactual is morally irrelevant. If we like small control, it means that the possible worlds with UFAI are significantly valuable, just as the worlds with FAI, provided there are enough worlds with FAI to weakly control the UFAIs; and if we like only large control, it means that the possible worlds with UFAI are not as valuable, and it’s mostly the worlds with FAI that matter.
But if the first difference is much bigger than the second, the pattern is the reverse.
It’s not literally the reverse, because if you don’t create those FAIs, nobody will, and so the UFAIs won’t have the incentive to give you your small share. It’s never good to increase probability of UFAI at the expense of probability of FAI. I’m not sure whether there is any policy guideline suggested by these considerations, conditional on the pattern in utility you discuss. What should we do differently depending on how much we value small vs. large control? It’s still clearly preferable to have UFAI to having no future AI, and to have FAI to having UFAI, in both cases.
There is no way for human values to magically jump inside the AI, so if it’s not specifically created to reflect them, it won’t have them, and whatever the AI ends up with won’t come close to human values, because human values are too complex to be resembled by any given structure that happens to be formed in the AI.
I’m not convinced by the claim that human values have high Kolmogorov complexity.
In particular, Eliezer’s article Not for the Sake of Happiness Alone is totally at odds with my own beliefs. In my mind, it’s incoherent to give anything other than subjective experiences ethical consideration. My own preference for real science over imagined science is entirely instrumental and not at all terminal.
Now, maybe Eliezer is confused about what his terminal values are, or maybe I’m confused about what my terminal values are, or maybe our terminal values are incompatible. In any case, it’s not obvious that an AI should care about anything other than the subjective experiences of sentient beings.
Suppose that it’s okay for an AI to exclude everything but subjective experience from ethical consideration. Is there then still reason to expect that human values have high Kolmogorov complexity?
I don’t have a low complexity description to offer, but it seems to me that one can get a lot of mileage out of the principles “if an individual prefers state A to state B whenever he/she/it is in either of state A or state B, then state A is superior for that individual to state B” and “when faced with two alternatives, the moral alternative is the one that you would prefer if you were going to live through the lives of all sentient beings involved.”
Of course “sentient being” is ill-defined and one would have to do a fair amount of work frame the things that I just said in more formal terms, but anyway, it’s not clear to me that there’s a really serious problem here.
The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little).
I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
Have you read the Heaven post by denisbider and the twofollow-ups constituting a mini-wireheading series? There have been other posts on the difference between wanting and liking; but it illustrates a fairly strong problem with wireheading: Even if all we’re worried about is “subjective states,” many people won’t want to be put in that subjective state, even knowing they’ll like it. Forcing them into it or changing their value system so they do want it are ethically suboptimal solutions.
So, it seems to me that if anything other than maximized absolute wireheading for everyone is the AI’s goal, it’s gonna start to get complicated.
Thanks for the references to the posts which I had not seen before and which I find relevant. I’m sympathetic toward denisbider’s view, but will read the comments to see if I find diverging views compelling.
But I would qualify the last sentence of my reply by saying that the best way to get a superhuman AI to be as friendly as possible may not be to work on friendly AI or advocate for friendly AI. For example, it may be best to work toward geopolitical stability to minimize the chances of some country rashly creating a potentially unsafe AI out of a sense of desperation during wartime.
I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
Yes, I was agreeing with what I inferred your attitude to be rather than agreeing with something that you said. (I apologize if I distorted your views—if you’d like I can edit my comment to remove the suggestion that you hold the position that I attributed to you.)
I don’t believe that we “should focus all of our resources” on FAI, as there are many other worthy activities to focus on. The argument is that this particular problem gets disproportionally little attention, and while with other risks we can in principle luck out even if they get no attention, it isn’t so for AI. Failing to take FAI seriously is fatal, failing to take nanotech seriously isn’t necessarily fatal.
Thus, although strictly speaking I agree with your implication, I don’t see its condition plausible, and so implication as whole relevant.
Re: “Is there then still reason to expect that human values have high Kolmogorov complexity?”
Human values are mosly a product of their genes and their memes. There is an awful lot of information in those. However, it is true that you can fairly closely approximate human values—or those of any other creature—by the directive to make as many grandchildren as possible—which seems reasonably simple.
Most of the arguments for humans having complex values appear to list a whole bunch of proximate goals—as though that constitutes evidence.
I disagree. You need to know much more than just the drive for grandchildren, given the massively diverse ways we observe even in our present world for species to propagate, all of which correspond to different articulable values once they reach human intelligence.
Human values should be expected to have a high K-complexity because you would need to specify both the genes/early environment, and the precise place in history/Everett branches where humans are now.
The idea was to “approximate human values”—not to express them in precise detail: nobody cares much if Jim likes strawberry jam more than he likes raspberry jam.
The idea was to “approximate human values”—not to express them in precise detail
Sure, but I take “approximation” to mean something like getting you within 10 or so bits of the true distribution, but the heuristic you gave still leaves you maybe 500 or so bits away, which is huge, and far more than you implied.
The environment mostly drops out of the equation—because most of it is shared between the agents involved—and because of the phenomenon of Canalisation
That would help you on message length if you had already stored one person’s values and were looking to store a second person’s. It does not for describing the first person’s value, or some aggregate measure of humans’ values.
The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
10 bits short of the needed message, not a 10-bit message. I mean that e.g. an approximation gives 100 bits when full accuracy would be 110 bits (and 10 bits is an upper bound).
The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.
Re: “That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.”
To specify the environment, choose the universe, galaxy, star, planet, lattiude, longitude and time. I am not pretending that information is simple, just that it is already there, if your project is building an intelligent agent.
Yes, I got that the first time. I don’t think you are appreciating the difficulty of coding even relatively simple utility functions. A couple of ASCII characters is practically nothing!
ASCII characters aren’t a relevant metric here. Getting within 10 bits of the correct answer means that you’ve narrowed it down to 2^10 = 1024 distinct equiprobable possibilities [1], one of which is correct. Sounds like an approximation to me! (if a bit on the lower end of the accuracy expected out of one)
[1] or probability distribution with the same KL divergence from the true governing distribution
SIAI seems to have focused on the existential risk of “unfriendly intelligence explosion” and it’s not clear to me that this existential risk is greater than the risks coming from world war and natural resource shortage.
Not clear to me either that unfriendly AI is the greatest risk, in the sense of having the most probability of terminating the future (though “resource shortage” as existential risk sounds highly implausible—we are talking about extinction risks, not merely potential serious issues; and “world war” doesn’t seem like something particularly relevant for the coming risks, dangerous technology doesn’t need war to be deployed).
But Unfriendly AI seems to be the only unavoidable risk, something we’d need to tackle in any case if we get through the rest. On other problems we can luck out, not on this one. Without solving this problem, the efforts to solve the rest are for naught (relatively speaking).
“resource shortage” as existential risk sounds highly implausible—we are talking about extinction risks, not merely potential serious issues;
I mean “existential risk” in a broad sense.
Suppose we run out of a source of, oh, say, electricity too fast to find a substitute. Then we would be forced to revert to a preindustrial society. This would be a permanent obstruction to technological progress—we would have no chance of creating a transhuman paradise or populating the galaxy with happy sentient machines and this would be an astronomical waste.
Similarly if we ran out of any number of things (say, one of the materials that’s currently needed to build computers) before finding an adequate substitute.
“world war” doesn’t seem like something particularly relevant for the coming risks, dangerous technology doesn’t need war to be deployed.
My understanding is that a large scale nuclear war could seriously damage infrastructure. I could imagine this preventing technological development as well.
But Unfriendly AI seems to be the only unavoidable risk, something we’d need to tackle in any case if we get through the rest. On other problems we can luck out, not on this one. Without solving this problem, the efforts to solve the rest are for naught (relatively speaking).
On the other hand, it’s equally true that if another existential risk hits us before we friendly AI, all of our friendly AI directed efforts will be for naught.
Suppose we run out of a source of, oh, say, electricity too fast to find a substitute.
That’s not how economics works. If one source of electricity becomes scarce, that means it’s more expensive, so people will switch to cheaper alternatives. All the energy we use ultimately comes from either decaying isotopes (fission, geothermal) or the sun; neither of those will run out in the next thousand years.
Modern computer chips are doped silicon semiconductors. We’re not going to run out of sand any time soon, either. Of course, purification is the hard part, but people have been thinking up clever ways to purify stuff since before they stopped calling it ‘alchemistry.’
The energy requirements for running modern civilization aren’t just a scalar number—we need large amounts of highly concentrated energy, and an infrastructure for distributing it cheaply. The normal economics of substitution don’t work for energy.
A “tradeoff” exists between using resources (including energy and material inputs of fossil origin) to feed the growth of material production (industry and agriculture) and to support the economy’s structural transformation.
As the substitution of renewable for nonrenewable (primarily fossil) energy continues, nature exerts resistance at some point; the scale limit begins to bind. Either economic growth or transition must halt. Both alternatives lead to severe disequilibrium. The first because increased pauperization and the apparent irreducibility of income differentials would endanger social peace. Also, since an economic order built on competition among private firms cannot exist without expansion, the free enterprise system would flounder.
The second alternative is equally untenable because the depletion of nonrenewable resources, proceeding along a rising marginal cost curve or, equivalently, along a descending Energy Return on Energy Invested (EROI) schedule, increases production costs across the entire spectrum of activities. Supply curves shift upwards.
It’s entirely possible that failure to create a superintelligence before the average EROI drops too low for sustainment would render us unable to create one for long enough to render other existential risks inevitabilities.
“Substitution economics” seems unlikely to stop us eventually substituting fusion power and biodesiel for oil. Meanwhile, we have an abundance of energy in the form of coal—more than enough to drive progress for a loooog while yet. The “energy apocalypse”-gets-us-first scenario is just very silly.
Energy economics is interconnected enough with politics to make me
lower my expectation of rationality from both of us for the remainder
of the discussion due to reference class forecasting. Also, we
are several inferential steps away from each other, so any discussion is going to be long and full of details. Regardless, I’m
going to go ahead, assuming agreement that market forces cannot
necessarily overcome resource shortages (or the Easter Islanders would
still be with us).
Historically, the world switched from coal to petroleum before
developing any technologies we’d regard as modern. The reason, unlike
so much else in economics, is simple: the energy density of
coal is 24 MJ/kg;
the energy density of
gasoline
is 44 MJ/kg. Nearly doubling the energy density makes many things
practical that wouldn’t otherwise be, like cars, trucks, airplanes,
etc. Coal cannot be converted into a higher energy density fuel except at high expense and with large losses, making the expected reserves much smaller. The fuels it can be converted to require significant modifications to engines and fuel storage.
Coal is at least plausible, although a stop-gap measure with many drawbacks. It’s your hopes for fusion that really show the wishful thinking. Fusion is 20 years away from being a practical energy source, just like it was in 1960. The NIF has yet to reach break-even; economically practical power generation is far beyond that point; assuming a substantial portion of US energy generation needs is farther still. It’d be nice if Polywell/Bussard fusion proved practical, but that’s barely a speck on the horizon, getting its first big basic research grant from the US Navy. And nothing but Mr. Fusion will help unless someone makes an order of magnitude improvement in battery or ultracapacitor energy density.
No matter which of the alternatives you plan to replace the energy infrastructure with, you needed to start about 20 years ago. World petroleum production is no longer sufficient to sustain economic growth and infrastructure transition simultaneously. Remember, the question isn’t whether it’s theoretically possible to substitute more plentiful energy sources for the ones that are getting more difficult to extract, it’s whether the declining EROI of current energy sources will remain high enough for the additional economic activity of converting infrastructure to other sources while still feeding people, let alone indulging in activities with no immediate payoff like GAI research.
We seem to be living in a world where the EROI is declining faster than willingness to devote painful amounts of the GDP to energy source conversion is increasing. This doesn’t mean an immediate biker zombie outlaw apocalypse, but it does mean a slow, unevenly distributed “catabolic collapse” of decreasing standards of living, security, and stability.
Energy economics is interconnected enough with politics to make me lower my expectation of rationality from both of us for the remainder of the discussion due to reference class forecasting.
but I appreciate the analysis. (I am behind on reading comments, so I will be continuing downthread now.)
And nothing but Mr. Fusion will help unless someone makes an order of magnitude improvement in battery or ultracapacitor energy density.
I don’t know why you focus so much on fusion although I agree it isn’t practical at this point. But note that batteries and ultracapacitors are just energy storage devices. Even if they become far more energy dense they don’t provide a source of energy.
Unfortunately, that appears to be part of the bias I’d expected in myself—since timtyler mentioned fusion, biofuels, and coal; I was thinking about refuting his arguments instead of laying out the best view of probable futures that I could.
The case for wind, solar, and other renewables failing to take up petroleum’s slack before it’s too late is not as overwhelmingly probable as fusion’s, but it takes the same form—they form roughly 0.3% of current world power generation, and even if the current exponential growth curve is somehow sustainable indefinitely they won’t replace current capacity until the late 21st century.
With the large-scale petroleum supply curve, that leaves a large gap between 2015 and 2060 where we’re somehow continuing to build renewable energy infrastructure with a steadily diminishing total supply of energy. I expect impoverished people to loot energy infrastructure for scrap metal to sell for food faster than other impoverished people can keep building it.
That we will eventually substitute fusion power and biodesiel for oil seems pretty obvious to me. You are saying it represents “wishful thinking”—because of the possibility of civilisation not “making it” at all? If so, be aware that I think that the chances of that happening seem to grossly exaggerated around these parts.
It seem very doubtful that we’ll have practical fusion power any time soon or necessarily ever. The technical hurdles are immense. Note that any form of fusion plant will almost certainly be using deuterium-tritium fusion. That means you need tritium sources. This also means that the internal structure will undergo constant low-level neutron bombardment which seriously reduces the lifespan of basic parts such as the electromagnets used. If we look at he form of proposed fusion that has had the most work and has the best chance of success, tokamaks, then we get to a number of other serious problems such as plasma leaks. Other forms of magnetic containment have also not solved the plasma leak problem. Forms of reactors that don’t use magnetic containment suffer from other similarly serious problems. For example, the runner up to magnetic containment is laser confinement but no one hasa good way to actually get energy out of laser confinement.
That said, I think that there are enough other potential sources of energy (nuclear fission, solar (and space based solar especially), wind, and tidal to name a few) that this won’t be an issue.
...the runner up to magnetic containment is laser confinement but no one has a good way to actually get energy out of laser confinement...
Um.. not sure what you mean. The energy out of inertial (i.e., laser) confinement is thermal. You implode and heat a ball of D-T, causing fusion, releasing heat energy, which is used to generate steam for a turbine.
Fusion has a bad rap, because the high benefits that would accrue if it were accomplished encourage wishful thinking. But that doesn’t mean it’s all wishful thinking. Lawrence Livermore has seen some encouraging results, for example.
Yeah, but a lot of that energy that is released isn’t in happy forms. D-T releases not just high energy photons but also neutrons which are carrying away a lot of the energy. So what you actually need is something that can absorb the neutrons in a safe fashion and convert that to heat. Lithium blankets are a commonly suggested solution since a lot of the time lithium will form tritium after you bombard it with neutrons (so you get more tritium as a result). There’s also the technically simpler solution of just using paraffin. But the conversion of the resulting energy into heat for steam is decidedly non-trivial.
Imagine what people must have thought in 1910 about the feasibility of getting to the Moon or generating energy by artificially splitting atoms (especially within the 20th century).
Imagine what people must have thought in 1910 about the feasibility of getting to the Moon or generating energy by artificially splitting atoms (especially within the 20th century).
Two problems with that sort of comparison: First, something like going to the Moon is a goal, not a technology. Thus, if we have other sources of power, the incentive to work out the details for fusion becomes small. Second, one shouldn’t forget how many technologies have been tried and have fallen by the wayside as not very practical or not at all practical. A good way of getting a handle on this is to read old issue of something like Scientific American from the 1950s and 1960s. Or read scifi from that time period. One of example of historical technology that never showed up on any substantial scale is nuclear powered airplanes, despite a lot of research in the 1950s about them. Similarly, nuclear thermal rockets have not been made. This isn’t because they are impossible, but because they are extremely impractical compared to other technologies. It seems likely that fusion power will fall into the same category. See this article about Project Pluto for example.
These are perfectly valid arguments and I admit that I share your skepticism concerning the economic competitiveness of the fusion technology. I admit, if I had a decision to make about buying some security, the payout of which would depend on the amount of energy produced by fusion power within 30 years, I would not hurry to place any bet.
What I lack is your apparent confidence in ruling out the technology based on the technological difficulties we face at this point in time.
I am always surprised how the opinion of so called experts diverges when it comes to estimating the feasibility and cost of different energy production options (even excluding fusion power). For example there is recent TED video where people discuss the pros and cons of nuclear power. The whole discussion boils down to the question: What are the resources we need in order to produce X amount of energy using
nuclear
wind
solar
biofuel
geothermal
power. For me, the disturbing thing was that the statements about the resource usage (e.g. area consumption, but also risks) of the different technologies were sometimes off by magnitudes.
If we lack the information to produce numbers in the same ballpark even for technologies that we have been using for decades (if not longer), then how much confidence can we have about the viability, costs, risks and competitiveness of a technology, like fusion, that we have not even started to tap.
Re: “Second, one shouldn’t forget how many technologies have been tried and have fallen by the wayside as not very practical or not at all practical. [...] It seems likely that fusion power will fall into the same category.”
Er, not to the governments that have already invested many billions of dollars in fusion research it doesn’t! They have looked into the whole issue of the chances of success.
It seem very doubtful that we’ll have practical fusion power any time soon or necessarily ever. [...] This also means that the internal structure will undergo constant low-level neutron bombardment which seriously reduces the lifespan of basic parts such as the electromagnets used.
Automatically self-repairing nanotech construction? (To suggest a point where a straightforward way of dealing with this becomes economically viable.)
You would need not only self-repairing nanotech but such technology that could withstand both large amounts of radiation as well as strong magnetic fields. Of the currently proposed major methods of nanotech I’m not aware of any that has anything resembling a chance to meet those criteria (with the disclaimer that I’m not a chemist.) If we had nanotech that was that robust it would bump up so many different technologies that fusion would look pretty unnecessary. For example the main barrier to space elevators is efficient reliable synthesis of long chains of carbon nanotubes that could be placed in a functional composite (see this NASA Institute for Advanced Concepts Report for a discussion of these and related issues). We’d almost certainly have that technology well before anything like self-repairing nanotech that stayed functional in high radiation environments. And if you have functional space elevators then you get cheap solar power because it becomes very easy to launch solar power satellites.
I’m not talking about plausible now, but plausible some day, as a reply to your “It seem very doubtful … any time soon or necessarily ever”. The sections being repaired could be offline. “Self-repair” doesn’t assume repair within volume of an existing/operating structure, it could be all cleared out and rebuilt anew, for example. That it’s done more or less automatically is the economic requirement. Any other methods of relatively cheap and fast production, assembly and recycling will work too.
Ah ok. That’s a lot more plausible. There’s still the issue that once you have cheap solar the resources it takes to make fusion power will simply cost so much more as to likely not be worth it. But if it could be substantially more efficient than straight fission then maybe it would get used for stuff not directly on Earth if/when we have large installations that aren’t the inner solar system.
Estimating feasibility using exploratory engineering is much simpler than estimating what will actually happen. I’m only arguing that this technology will almost certainly be feasible on human level in not absurdly distant future, not that it’ll ever be actually used.
That’s not how economics works. If one source of electricity becomes scarce, that means it’s more expensive, so people will switch to cheaper alternatives.
I would have thought that those ‘cheaper alternatives’ could still be more expensive than the initial cost of the original source of electricity...? In which case losing that original source of electricity could still bite pretty hard (albeit maybe not to the extent of being an existential risk).
On the other hand, it’s equally true that if another existential risk hits us before we friendly AI, all of our friendly AI directed efforts will be for naught.
But Unfriendly AI seems to be the only unavoidable risk, something we’d need to tackle in any case if we get through the rest.
A stably benevolent stable world government/singleton could take its time solving AI, or inching up to it with biological and culture intelligence enhancement. From our perspective we should count that as almost a maximal win in terms of existential risks.
I don’t see your point. It would take an unrealistic world dictatorship (whether it’s “benevolent” seems like irrelevant hair-splitting at that point) to stop the risks (stop the technological progress in the wild!) and allow more time for development of FAI. And in the end, solving FAI still remains a necessary step, even if done by modified/improved people, even if given a safe environment to work in.
I don’t see your point. It would take an unrealistic world dictatorship (whether it’s “benevolent” seems like irrelevant hair-splitting at that point) to stop the risks (stop the technological progress in the wild!) and allow more time for development of FAI.
You were talking about hundred year time scales. That’s time enough for neuroscience lie detectors, whole brain emulation, democratization in authoritarian countries, continued expansion of EU-like arrangements, and many other things to occur. That’s time for lie detectors/neuroscience to advance a lot, whole brain emulation to take off
And in the end, solving FAI still remains a necessary step, even if done by modified/improved people, even if given a safe environment to work in.
But from our perspective, if we can get the benevolent non-AI (but perhaps WBE) singleton, it can do the FAI work at leisure and we don’t need to. So the relative marginal impact of our working on say, FAI theory or institutional arrangements for WBE, need to be weighed against one another.
You were talking about hundred year time scales. That’s time enough for neuroscience lie detectors, whole brain emulation, democratization in authoritarian countries, continued expansion of EU-like arrangements, and many other things to occur. That’s time for lie detectors/neuroscience to advance a lot, whole brain emulation to take off
It’s also time enough for any of the huge number of other outcomes. It’s not outright impossible, but pretty improbable, that the world will go this exact road. And don’t underestimate how crazy people are.
But from our perspective, if we can get the benevolent non-AI (but perhaps WBE) singleton, it can do the FAI work at leisure and we don’t need to. So the relative marginal impact of our working on say, FAI theory or institutional arrangements for WBE, need to be weighed against one another.
After the change of mind about value of drifted human preference, I agree that WBE/intelligence enhancement is a viable road. Here’re my arguments about the impact of these paths at this point.
WBE is still at least decades away, probably more than a hundred years if you take planning fallacy into account, and depends on the development of global technological efforts that are not easily influenced. Value of any “institutional arrangements” and viability of arguing for them given the remoteness (hence irrelevance at present) and implausibility (to most people) of WBE, also seems doubtful at present. This in my mind makes the marginal value on any present effort related to WBE relatively small. This will go up sharply as WBE tech gets closer
I suspect that FAI theory, once understood, will still be simple enough (if any general theory is possible), and can be developed by vanilla humans (on unknown timescale, probably decades to hundreds of years, but at some point WBEs overtake the timescale estimates). By the time WBE becomes viable, the risk situation will be already very explosive, so if we can get a good understanding earlier, we could possibly avoid that risky period entirely. Also, having a viable technical Friendliness programme might give academic recognition to the problem (that these risks are as unavoidable as laws of physics, and not just something to talk with your friends about, like politics or football), which might spread awareness of the AI risks on an otherwise unachievable level, helping with institutional change promoting measures against wild AI and other existential risks. On the other hand, I won’t underestimate human craziness on this point as well—technical recognition of the problem may still live side to side with global indifference.
I have not been convinced but am open toward the idea that a paperclip maximizer is the overwhelmingly likely outcome if we create a superhuman AI. At present, my thinking is that if some care is taking in the creation of a superhuman AI, more likely than a paperclip maximizer is an AI which partially shares human values, that is, the dicotomy “paper clip maximizer vs. Friendly AI” seems like a false dicotomy—I imagine that the sort of AI that people would actually build would be somewhere in the middle. Any recommended reading on this point appreciated.
I believed similarly until I read Steve Omohundro’s The Basic AI Drives. It convinced me that a paperclip maximizer is the overwhelmingly likely outcome of creating an AGI.
That paper makes a convincing case that the ‘generic’ AI (some distribution of AI motivations weighted by our likelihood of developing them) will most prefer outcomes that rank low in our preference ordering, i.e. the free energy and atoms needed to support life as we know it or would want it will get reallocated to something else. That means that an AI given arbitrary power (e.g. because of a very hard takeoff, or easy bargaining among AIs but not humans, or other reasons) would be lethal. However, the situation seems different and more sensitive to initial conditions when we consider AIs with limited power that must trade off chances of conquest with a risk of failure and retaliation. I’m working on a write up of those issues.
But I believe it’s certain to eventually happen, absent the end of civilization before that.
And I will live 1000 years, provided I don’t die first.
(As opposed to gradual progress, of course. I could make a case with your analogy facing an unexpected distinction also, as in what happens if you got overrun by a Friendly intelligence explosion, and persons don’t prove to be a valuable pattern, but death doesn’t adequately describe the transition either, as value doesn’t get lost.)
I don’t think intelligence explosion is imminent either. But I believe it’s certain to eventually happen, absent the end of civilization before that. And I believe that its outcome depends exclusively on the values of the agents driving it, hence we need to be ready, with good understanding of preference theory at hand when the time comes. To get there, we need to start somewhere. And right now, almost nobody is doing anything in that direction, and there is very poor level of awareness of the problem and poor intellectual standards of discussing the problem where surface awareness is present.
Either right now, or 50, or 100 years from now, a serious effort has to be taken on, but the later it starts, the greater the risk of being too late to guide the transition in a preferable direction. The problem itself, as a mathematical and philosophical challenge, sounds like something that could easily take at least 100 years to reach clear understanding, and that is the deadline we should worry about, starting 10 years too late to finish in time 100 years from now.
Vladimir, I agree with you that people should be thinking intelligence explosion, that there’s a very poor level of awareness of the problem, and that the intellectual standards for discourse about this problem in the general public are poor.
I have not been convinced but am open toward the idea that a paperclip maximizer is the overwhelmingly likely outcome if we create a superhuman AI. At present, my thinking is that if some care is taking in the creation of a superhuman AI, more likely than a paperclip maximizer is an AI which partially shares human values, that is, the dicotomy “paper clip maximizer vs. Friendly AI” seems like a false dicotomy—I imagine that the sort of AI that people would actually build would be somewhere in the middle. Any recommended reading on this point appreciated.
SIAI seems to have focused on the existential risk of “unfriendly intelligence explosion” and it’s not clear to me that this existential risk is greater than the risks coming from world war and natural resource shortage.
Mainly Complexity of value. There is no way for human values to magically jump inside the AI, so if it’s not specifically created to reflect them, it won’t have them, and whatever the AI ends up with won’t come close to human values, because human values are too complex to be resembled by any given structure that happens to be formed in the AI.
The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little). The falloff with imperfect reflection of values might be so sharp that any ad-hoc solution turns the future worthless. Or maybe not, with certain classes of values that contain a component of sympathy that reflects values perfectly while giving them smaller weight in the overall game, but then we’d want to technically understand this “sympathy” to have any confidence in the outcome.
This depends on something like aggregative utilitarianism. If additional resources have diminishing marginal value in fulfilling human aims, that getting a little slice of the universe (in the course of negotiating terms of surrender with the inhuman AI, if it can make credible commitments, or because we serve as acausal bargaining chips with other civilizations elsewhere in the universe) may be enough. Is getting 100% of the lightcone a hundred times better than 1%?
I think yes, if we take into account that the more of the lightcone we (our FAI) get, the more trading opportunities we would have with UFAI in other possible worlds. Diminishing marginal value shouldn’t apply across possible worlds, because otherwise it would imply gross violations of expected utility maximization.
Also, I suspect that there are possible worlds with much greater resources than our universe (perhaps with physics that allow hypercomputation, or just many orders of magnitude more total exploitable resources), and some of them would have potential trading partners who are willing to give us a small share of their world for a large share of ours. We may eventually achieve most of our value from trading with them. But of course such trade wouldn’t be possible if we didn’t have something to trade with!
Interesting. This suggests thinking about FAI not as using its control to produce terminal value in its own world, but as using its control to buy as much terminal value as it can, in various world-programs. Since it doesn’t matter where the value is produced, most of the value doesn’t have to be produced in the possible worlds with FAIs in them. Indeed, it sounds unlikely that specifically the FAI worlds will be optimal for FAI-value optimization. FAIs (and the worlds they control) act as instrumental leverage, a way of controlling the global mathematical universe into having more value for our preference.
Thus, more FAIs means stronger control over the mathematical universe, while more UFAIs mean that the mathematical universe is richer, and so the FAIs can get more value out of it with the same control. The metaphors of trade and comparative advantage start applying again, not on the naive level of cohabitation on the same world, but on the level of the global ontology. Mathematics grants you total control over your domain, so that your “atoms” can’t be reused for something else by another stronger agent, and so you do benefit from most superintelligent “aliens”.
Yes, assuming that trading across possible worlds can be done in the first place. One thing that concerns me is the combinatorial explosion of potential trading partners. How do they manage to “find” each other?
It’s the same combinatorial explosion as with the future possible worlds. Even though you can’t locate individual valuable future outcomes (through certain instrumental sequences of exact events), you can still make decisions about your actions leading to certain consequences “in bulk”, and I expect the trade between possible worlds can be described similarly (after all, it does work on exactly the same decision-making algorithm). Thus, you usually won’t know who are you trading with, exactly, but on the net estimate that your actions are in the right direction.
Isn’t the set of future worlds with high measure a lot smaller?
I currently agree it’s a bad analogy and I no longer endorse the position that global acausal trade is probably feasible, although its theoretical possibility seems to be a stable conclusion.
Robin Hanson would be so pleased that it turns out economics is the fundamental law of the entire ensemble universe.
There are two distinct issues here: (1) how high would a human with original preference value a universe which only gives a small weight to their preference, and (2) how likely is the changed preference to give any weight whatsoever to the original preference, in other words to produce a universe to any extent valuable to the original preference, even if original preference values universes only weakly optimized in its direction.
Moving to a different preference is different from lowering weight of the original preference. A slightly changed (formal definition of) preference may put no weight at all on the preceding preference. The optimal outcome according to the modified preference can thus be essentially moral noise, paperclips, to the original preference. Giving a small slice of the universe, on the other hand, is what you get out of aggregation of preference, and a changed preference doesn’t necessarily have a form of aggregation that includes original preference. (On the other hand, there is a hope that human-like preferences include sympathy, which does make them aggregate preferences of other persons with some weight.)
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc). If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
Game-theoretic considerations are only relevant when you have non-trivial control, not when your atoms are used for something else. If singleton’s preference gives some weight to your preference, this is a case of having control directly through the singleton’s preference, but the origin of this control is not game-theoretic. If the singleton’s preference has sympathy for your preference, your explicit instantiation in the world doesn’t need to have any control, in order to win through the implicit control via singleton’s preference.
Game-theoretic aggregation, on the other hand, doesn’t work by influence on other agent’s preference. You only get your slice of the world because you already control it. Another agent may perform trade, but this is trade of control, rearranging what specifically each of you controls, without changing your preferences.
I assume that control will be winner-takes-all, so preferences of other agents existing at the time only matter if the winner’s preference directly pays to their preferences any attention, but not if they had some limited control from the start.
My point is that inhuman AI may give no weight to our preference, while FAI may give at least some weight to everyone’s preference. Game-theoretic trade won’t matter here because agents other than the singleton have no control to bargain with. FAI gives weight to other preferences not because of trade, but by construction from the start, even if people it gives weight to don’t exist at all (FAI giving them weight in optimization might cause them to appear, or a better event at least as good from their perspective).
This isn’t obviously the most natural way to describe a scenario in which an AI thinks it has a 90% chance of winning a conflict with humanity, but also has the ability to jointly create (with humanity) agents to enforce an agreement (and can do this quickly enough to be relevant), so cuts a deal splitting up the resources of the light cone at a 9:1 ratio.
Given that there are plausible sets of parameter values where this assumption is false, we can’t use it to assess overall expected value to astronomical precision.
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers. Indeed, the AI might treat us well simply because of the chance that benevolent non-human aliens will respond positively if its algorithm has this output (as the benevolent aliens might be modeling the AI’s algorithm).
Yes, I forgot about that (though I remain uncertain about how well this argument works, not having worked out a formal model). To summarize the arguments for why future is still significantly more valuable than what we have now, even if we run into Unfriendly AI,
(1) if there is a non-negligible chance that we’ll have FAI in the future, or that we could’ve created FAI if some morally random facts in the past (such as the coin in counterfactual mugging) were different, then we can estimate the present expected value of the world as pretty high, as a factor of getting whole universes (counterfactually or probably) optimized towards your specific preference is present in the expected utility computation. The counterfactual value is present even if it’s certain that the future contains Unfriendly AI.
(2) It’s even better, because the unfriendly singletons will also optimize their worlds towards your preference a little for game-theoretic reasons, even if they don’t care at all about your preference. This game is not with you personally, a human that controls very little and whose control can’t compel a singleton to any significant extent, but with the counterfactual FAIs. The FAIs that could be created, but weren’t, can act as Omega in counterfactual mugging, making it profitable for the indifferent singletons to pay the FAI a little in FAI-favored kind of world-optimization.
(3) Some singletons that don’t follow your preference in particular, but have remotely human-like preference, will have a component of sympathy in their preference, and will dole your preference some fair portion of control in their world, that is much greater than the portion of control you held originally. This sympathy seems to be godshatter of game-theoretic considerations that compel even singletons with non-evolved (artificial, random) preferences according to arguments (1) and (2).
The conclusion to this seems to be that creating an Unfriendly AI is significantly better than ending up with no rational singleton at all (existential disaster that terminates civilization), but significantly worse than a small chance of FAI.
Your comments are mostly good, but I dispute the final assumption that no singleton ⇒ disaster. There has as yet been no investigation into the merits of singleton vs. an economy (or ecosystem) of independent agents.
If we were living in the 18th century, it would be reasonable to suppose that the only stable situation is one where one agent is king. But we are not.
Yep, these are key considerations.
So there’s the utility difference between business-as-usual (no AI), and getting a small share of resources optimized for your preference, and the utility difference between getting small and large shares of resources. If the second difference is much larger than the first, then (1) is crucial, and (2) and (3) are not so good. But if the first difference is much bigger than the second, the pattern is the reverse.
And if we’re comparing expected utility conditioning on no local FAI here and EU conditioning on FAI here, moderate credences can suffice (depending on the shape of your utility function).
Whether FAI is local or not can’t matter, whether something is real or counterfactual is morally irrelevant. If we like small control, it means that the possible worlds with UFAI are significantly valuable, just as the worlds with FAI, provided there are enough worlds with FAI to weakly control the UFAIs; and if we like only large control, it means that the possible worlds with UFAI are not as valuable, and it’s mostly the worlds with FAI that matter.
What do “small control” and “large control” mean?
It’s not literally the reverse, because if you don’t create those FAIs, nobody will, and so the UFAIs won’t have the incentive to give you your small share. It’s never good to increase probability of UFAI at the expense of probability of FAI. I’m not sure whether there is any policy guideline suggested by these considerations, conditional on the pattern in utility you discuss. What should we do differently depending on how much we value small vs. large control? It’s still clearly preferable to have UFAI to having no future AI, and to have FAI to having UFAI, in both cases.
Worrying less about our individual (or national) shares, and being more cooperative with other humans or uploads seems like an important upshot.
I’m not convinced by the claim that human values have high Kolmogorov complexity.
In particular, Eliezer’s article Not for the Sake of Happiness Alone is totally at odds with my own beliefs. In my mind, it’s incoherent to give anything other than subjective experiences ethical consideration. My own preference for real science over imagined science is entirely instrumental and not at all terminal.
Now, maybe Eliezer is confused about what his terminal values are, or maybe I’m confused about what my terminal values are, or maybe our terminal values are incompatible. In any case, it’s not obvious that an AI should care about anything other than the subjective experiences of sentient beings.
Suppose that it’s okay for an AI to exclude everything but subjective experience from ethical consideration. Is there then still reason to expect that human values have high Kolmogorov complexity?
I don’t have a low complexity description to offer, but it seems to me that one can get a lot of mileage out of the principles “if an individual prefers state A to state B whenever he/she/it is in either of state A or state B, then state A is superior for that individual to state B” and “when faced with two alternatives, the moral alternative is the one that you would prefer if you were going to live through the lives of all sentient beings involved.”
Of course “sentient being” is ill-defined and one would have to do a fair amount of work frame the things that I just said in more formal terms, but anyway, it’s not clear to me that there’s a really serious problem here.
I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
Have you read the Heaven post by denisbider and the two follow-ups constituting a mini-wireheading series? There have been other posts on the difference between wanting and liking; but it illustrates a fairly strong problem with wireheading: Even if all we’re worried about is “subjective states,” many people won’t want to be put in that subjective state, even knowing they’ll like it. Forcing them into it or changing their value system so they do want it are ethically suboptimal solutions.
So, it seems to me that if anything other than maximized absolute wireheading for everyone is the AI’s goal, it’s gonna start to get complicated.
Thanks for the references to the posts which I had not seen before and which I find relevant. I’m sympathetic toward denisbider’s view, but will read the comments to see if I find diverging views compelling.
Maybe you should start with what’s linked from fake fake utility functions then (the page on the wiki wasn’t organized quite as I expected).
But I would qualify the last sentence of my reply by saying that the best way to get a superhuman AI to be as friendly as possible may not be to work on friendly AI or advocate for friendly AI. For example, it may be best to work toward geopolitical stability to minimize the chances of some country rashly creating a potentially unsafe AI out of a sense of desperation during wartime.
(?) I never said that.
Yes, I was agreeing with what I inferred your attitude to be rather than agreeing with something that you said. (I apologize if I distorted your views—if you’d like I can edit my comment to remove the suggestion that you hold the position that I attributed to you.)
I don’t believe that we “should focus all of our resources” on FAI, as there are many other worthy activities to focus on. The argument is that this particular problem gets disproportionally little attention, and while with other risks we can in principle luck out even if they get no attention, it isn’t so for AI. Failing to take FAI seriously is fatal, failing to take nanotech seriously isn’t necessarily fatal.
Thus, although strictly speaking I agree with your implication, I don’t see its condition plausible, and so implication as whole relevant.
Re: “Is there then still reason to expect that human values have high Kolmogorov complexity?”
Human values are mosly a product of their genes and their memes. There is an awful lot of information in those. However, it is true that you can fairly closely approximate human values—or those of any other creature—by the directive to make as many grandchildren as possible—which seems reasonably simple.
Most of the arguments for humans having complex values appear to list a whole bunch of proximate goals—as though that constitutes evidence.
I disagree. You need to know much more than just the drive for grandchildren, given the massively diverse ways we observe even in our present world for species to propagate, all of which correspond to different articulable values once they reach human intelligence.
Human values should be expected to have a high K-complexity because you would need to specify both the genes/early environment, and the precise place in history/Everett branches where humans are now.
The idea was to “approximate human values”—not to express them in precise detail: nobody cares much if Jim likes strawberry jam more than he likes raspberry jam.
The environment mostly drops out of the equation—because most of it is shared between the agents involved—and because of the phenomenon of Canalisation: http://en.wikipedia.org/wiki/Canalisation_%28genetics%29
Sure, but I take “approximation” to mean something like getting you within 10 or so bits of the true distribution, but the heuristic you gave still leaves you maybe 500 or so bits away, which is huge, and far more than you implied.
That would help you on message length if you had already stored one person’s values and were looking to store a second person’s. It does not for describing the first person’s value, or some aggregate measure of humans’ values.
10 bits!!! That’s not much of a message!
The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
10 bits short of the needed message, not a 10-bit message. I mean that e.g. an approximation gives 100 bits when full accuracy would be 110 bits (and 10 bits is an upper bound).
That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.
Re: “That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.”
To specify the environment, choose the universe, galaxy, star, planet, lattiude, longitude and time. I am not pretending that information is simple, just that it is already there, if your project is building an intelligent agent.
Re: “10 bits short of the needed message”.
Yes, I got that the first time. I don’t think you are appreciating the difficulty of coding even relatively simple utility functions. A couple of ASCII characters is practically nothing!
ASCII characters aren’t a relevant metric here. Getting within 10 bits of the correct answer means that you’ve narrowed it down to 2^10 = 1024 distinct equiprobable possibilities [1], one of which is correct. Sounds like an approximation to me! (if a bit on the lower end of the accuracy expected out of one)
[1] or probability distribution with the same KL divergence from the true governing distribution
Or you can implement constant K-complexity learn-by-example algorithm and get all the rest from environment.
How about “Do as your creators do (generalize this as your creators generalize)”?
Maybe you should start with what’s linked from fake fake utility functions then (the page on the wiki wasn’t organized quite as I expected).
Not clear to me either that unfriendly AI is the greatest risk, in the sense of having the most probability of terminating the future (though “resource shortage” as existential risk sounds highly implausible—we are talking about extinction risks, not merely potential serious issues; and “world war” doesn’t seem like something particularly relevant for the coming risks, dangerous technology doesn’t need war to be deployed).
But Unfriendly AI seems to be the only unavoidable risk, something we’d need to tackle in any case if we get through the rest. On other problems we can luck out, not on this one. Without solving this problem, the efforts to solve the rest are for naught (relatively speaking).
I mean “existential risk” in a broad sense.
Suppose we run out of a source of, oh, say, electricity too fast to find a substitute. Then we would be forced to revert to a preindustrial society. This would be a permanent obstruction to technological progress—we would have no chance of creating a transhuman paradise or populating the galaxy with happy sentient machines and this would be an astronomical waste.
Similarly if we ran out of any number of things (say, one of the materials that’s currently needed to build computers) before finding an adequate substitute.
My understanding is that a large scale nuclear war could seriously damage infrastructure. I could imagine this preventing technological development as well.
On the other hand, it’s equally true that if another existential risk hits us before we friendly AI, all of our friendly AI directed efforts will be for naught.
That’s not how economics works. If one source of electricity becomes scarce, that means it’s more expensive, so people will switch to cheaper alternatives. All the energy we use ultimately comes from either decaying isotopes (fission, geothermal) or the sun; neither of those will run out in the next thousand years.
Modern computer chips are doped silicon semiconductors. We’re not going to run out of sand any time soon, either. Of course, purification is the hard part, but people have been thinking up clever ways to purify stuff since before they stopped calling it ‘alchemistry.’
The energy requirements for running modern civilization aren’t just a scalar number—we need large amounts of highly concentrated energy, and an infrastructure for distributing it cheaply. The normal economics of substitution don’t work for energy.
It’s entirely possible that failure to create a superintelligence before the average EROI drops too low for sustainment would render us unable to create one for long enough to render other existential risks inevitabilities.
“Substitution economics” seems unlikely to stop us eventually substituting fusion power and biodesiel for oil. Meanwhile, we have an abundance of energy in the form of coal—more than enough to drive progress for a loooog while yet. The “energy apocalypse”-gets-us-first scenario is just very silly.
Energy economics is interconnected enough with politics to make me lower my expectation of rationality from both of us for the remainder of the discussion due to reference class forecasting. Also, we are several inferential steps away from each other, so any discussion is going to be long and full of details. Regardless, I’m going to go ahead, assuming agreement that market forces cannot necessarily overcome resource shortages (or the Easter Islanders would still be with us).
Historically, the world switched from coal to petroleum before developing any technologies we’d regard as modern. The reason, unlike so much else in economics, is simple: the energy density of coal is 24 MJ/kg; the energy density of gasoline is 44 MJ/kg. Nearly doubling the energy density makes many things practical that wouldn’t otherwise be, like cars, trucks, airplanes, etc. Coal cannot be converted into a higher energy density fuel except at high expense and with large losses, making the expected reserves much smaller. The fuels it can be converted to require significant modifications to engines and fuel storage.
Coal is at least plausible, although a stop-gap measure with many drawbacks. It’s your hopes for fusion that really show the wishful thinking. Fusion is 20 years away from being a practical energy source, just like it was in 1960. The NIF has yet to reach break-even; economically practical power generation is far beyond that point; assuming a substantial portion of US energy generation needs is farther still. It’d be nice if Polywell/Bussard fusion proved practical, but that’s barely a speck on the horizon, getting its first big basic research grant from the US Navy. And nothing but Mr. Fusion will help unless someone makes an order of magnitude improvement in battery or ultracapacitor energy density.
No matter which of the alternatives you plan to replace the energy infrastructure with, you needed to start about 20 years ago. World petroleum production is no longer sufficient to sustain economic growth and infrastructure transition simultaneously. Remember, the question isn’t whether it’s theoretically possible to substitute more plentiful energy sources for the ones that are getting more difficult to extract, it’s whether the declining EROI of current energy sources will remain high enough for the additional economic activity of converting infrastructure to other sources while still feeding people, let alone indulging in activities with no immediate payoff like GAI research.
We seem to be living in a world where the EROI is declining faster than willingness to devote painful amounts of the GDP to energy source conversion is increasing. This doesn’t mean an immediate biker zombie outlaw apocalypse, but it does mean a slow, unevenly distributed “catabolic collapse” of decreasing standards of living, security, and stability.
Upvoted chiefly for
but I appreciate the analysis. (I am behind on reading comments, so I will be continuing downthread now.)
I don’t know why you focus so much on fusion although I agree it isn’t practical at this point. But note that batteries and ultracapacitors are just energy storage devices. Even if they become far more energy dense they don’t provide a source of energy.
Unfortunately, that appears to be part of the bias I’d expected in myself—since timtyler mentioned fusion, biofuels, and coal; I was thinking about refuting his arguments instead of laying out the best view of probable futures that I could.
The case for wind, solar, and other renewables failing to take up petroleum’s slack before it’s too late is not as overwhelmingly probable as fusion’s, but it takes the same form—they form roughly 0.3% of current world power generation, and even if the current exponential growth curve is somehow sustainable indefinitely they won’t replace current capacity until the late 21st century.
With the large-scale petroleum supply curve, that leaves a large gap between 2015 and 2060 where we’re somehow continuing to build renewable energy infrastructure with a steadily diminishing total supply of energy. I expect impoverished people to loot energy infrastructure for scrap metal to sell for food faster than other impoverished people can keep building it.
That we will eventually substitute fusion power and biodesiel for oil seems pretty obvious to me. You are saying it represents “wishful thinking”—because of the possibility of civilisation not “making it” at all? If so, be aware that I think that the chances of that happening seem to grossly exaggerated around these parts.
It seem very doubtful that we’ll have practical fusion power any time soon or necessarily ever. The technical hurdles are immense. Note that any form of fusion plant will almost certainly be using deuterium-tritium fusion. That means you need tritium sources. This also means that the internal structure will undergo constant low-level neutron bombardment which seriously reduces the lifespan of basic parts such as the electromagnets used. If we look at he form of proposed fusion that has had the most work and has the best chance of success, tokamaks, then we get to a number of other serious problems such as plasma leaks. Other forms of magnetic containment have also not solved the plasma leak problem. Forms of reactors that don’t use magnetic containment suffer from other similarly serious problems. For example, the runner up to magnetic containment is laser confinement but no one hasa good way to actually get energy out of laser confinement.
That said, I think that there are enough other potential sources of energy (nuclear fission, solar (and space based solar especially), wind, and tidal to name a few) that this won’t be an issue.
Um.. not sure what you mean. The energy out of inertial (i.e., laser) confinement is thermal. You implode and heat a ball of D-T, causing fusion, releasing heat energy, which is used to generate steam for a turbine.
Fusion has a bad rap, because the high benefits that would accrue if it were accomplished encourage wishful thinking. But that doesn’t mean it’s all wishful thinking. Lawrence Livermore has seen some encouraging results, for example.
EDIT: for fact checking vis-a-vis LLNL.
Yeah, but a lot of that energy that is released isn’t in happy forms. D-T releases not just high energy photons but also neutrons which are carrying away a lot of the energy. So what you actually need is something that can absorb the neutrons in a safe fashion and convert that to heat. Lithium blankets are a commonly suggested solution since a lot of the time lithium will form tritium after you bombard it with neutrons (so you get more tritium as a result). There’s also the technically simpler solution of just using paraffin. But the conversion of the resulting energy into heat for steam is decidedly non-trivial.
I see, thanks.
Imagine what people must have thought in 1910 about the feasibility of getting to the Moon or generating energy by artificially splitting atoms (especially within the 20th century).
Two problems with that sort of comparison: First, something like going to the Moon is a goal, not a technology. Thus, if we have other sources of power, the incentive to work out the details for fusion becomes small. Second, one shouldn’t forget how many technologies have been tried and have fallen by the wayside as not very practical or not at all practical. A good way of getting a handle on this is to read old issue of something like Scientific American from the 1950s and 1960s. Or read scifi from that time period. One of example of historical technology that never showed up on any substantial scale is nuclear powered airplanes, despite a lot of research in the 1950s about them. Similarly, nuclear thermal rockets have not been made. This isn’t because they are impossible, but because they are extremely impractical compared to other technologies. It seems likely that fusion power will fall into the same category. See this article about Project Pluto for example.
These are perfectly valid arguments and I admit that I share your skepticism concerning the economic competitiveness of the fusion technology. I admit, if I had a decision to make about buying some security, the payout of which would depend on the amount of energy produced by fusion power within 30 years, I would not hurry to place any bet.
What I lack is your apparent confidence in ruling out the technology based on the technological difficulties we face at this point in time.
I am always surprised how the opinion of so called experts diverges when it comes to estimating the feasibility and cost of different energy production options (even excluding fusion power). For example there is recent TED video where people discuss the pros and cons of nuclear power. The whole discussion boils down to the question: What are the resources we need in order to produce X amount of energy using
nuclear
wind
solar
biofuel
geothermal
power. For me, the disturbing thing was that the statements about the resource usage (e.g. area consumption, but also risks) of the different technologies were sometimes off by magnitudes.
If we lack the information to produce numbers in the same ballpark even for technologies that we have been using for decades (if not longer), then how much confidence can we have about the viability, costs, risks and competitiveness of a technology, like fusion, that we have not even started to tap.
Ask and ye shall receive: David MacKay, Sustainable energy without the hot air. A free online book that reads like porn for LessWrong regulars.
Yes, I’ve read that (pretty good) book quite a while ago and it is also referenced in the TED talk I mentioned.
This was one of the reasons I was surprised that there is still such a huge disagreement about the figures even among experts.
Re: “Second, one shouldn’t forget how many technologies have been tried and have fallen by the wayside as not very practical or not at all practical. [...] It seems likely that fusion power will fall into the same category.”
Er, not to the governments that have already invested many billions of dollars in fusion research it doesn’t! They have looked into the whole issue of the chances of success.
Automatically self-repairing nanotech construction? (To suggest a point where a straightforward way of dealing with this becomes economically viable.)
You would need not only self-repairing nanotech but such technology that could withstand both large amounts of radiation as well as strong magnetic fields. Of the currently proposed major methods of nanotech I’m not aware of any that has anything resembling a chance to meet those criteria (with the disclaimer that I’m not a chemist.) If we had nanotech that was that robust it would bump up so many different technologies that fusion would look pretty unnecessary. For example the main barrier to space elevators is efficient reliable synthesis of long chains of carbon nanotubes that could be placed in a functional composite (see this NASA Institute for Advanced Concepts Report for a discussion of these and related issues). We’d almost certainly have that technology well before anything like self-repairing nanotech that stayed functional in high radiation environments. And if you have functional space elevators then you get cheap solar power because it becomes very easy to launch solar power satellites.
I’m not talking about plausible now, but plausible some day, as a reply to your “It seem very doubtful … any time soon or necessarily ever”. The sections being repaired could be offline. “Self-repair” doesn’t assume repair within volume of an existing/operating structure, it could be all cleared out and rebuilt anew, for example. That it’s done more or less automatically is the economic requirement. Any other methods of relatively cheap and fast production, assembly and recycling will work too.
Ah ok. That’s a lot more plausible. There’s still the issue that once you have cheap solar the resources it takes to make fusion power will simply cost so much more as to likely not be worth it. But if it could be substantially more efficient than straight fission then maybe it would get used for stuff not directly on Earth if/when we have large installations that aren’t the inner solar system.
Estimating feasibility using exploratory engineering is much simpler than estimating what will actually happen. I’m only arguing that this technology will almost certainly be feasible on human level in not absurdly distant future, not that it’ll ever be actually used.
In that case, there’s no substantial disagreement.
There don’t seem to be too many electromagnets at the NIF: https://lasers.llnl.gov/
It seems to me that the problems are relatively minor, and so that we will have fusion power—with high probabilty this century.
[Wow—LW codebase doesn’t know about https!]
I would have thought that those ‘cheaper alternatives’ could still be more expensive than the initial cost of the original source of electricity...? In which case losing that original source of electricity could still bite pretty hard (albeit maybe not to the extent of being an existential risk).
Yes.
A stably benevolent stable world government/singleton could take its time solving AI, or inching up to it with biological and culture intelligence enhancement. From our perspective we should count that as almost a maximal win in terms of existential risks.
I don’t see your point. It would take an unrealistic world dictatorship (whether it’s “benevolent” seems like irrelevant hair-splitting at that point) to stop the risks (stop the technological progress in the wild!) and allow more time for development of FAI. And in the end, solving FAI still remains a necessary step, even if done by modified/improved people, even if given a safe environment to work in.
You were talking about hundred year time scales. That’s time enough for neuroscience lie detectors, whole brain emulation, democratization in authoritarian countries, continued expansion of EU-like arrangements, and many other things to occur. That’s time for lie detectors/neuroscience to advance a lot, whole brain emulation to take off
But from our perspective, if we can get the benevolent non-AI (but perhaps WBE) singleton, it can do the FAI work at leisure and we don’t need to. So the relative marginal impact of our working on say, FAI theory or institutional arrangements for WBE, need to be weighed against one another.
It’s also time enough for any of the huge number of other outcomes. It’s not outright impossible, but pretty improbable, that the world will go this exact road. And don’t underestimate how crazy people are.
After the change of mind about value of drifted human preference, I agree that WBE/intelligence enhancement is a viable road. Here’re my arguments about the impact of these paths at this point.
WBE is still at least decades away, probably more than a hundred years if you take planning fallacy into account, and depends on the development of global technological efforts that are not easily influenced. Value of any “institutional arrangements” and viability of arguing for them given the remoteness (hence irrelevance at present) and implausibility (to most people) of WBE, also seems doubtful at present. This in my mind makes the marginal value on any present effort related to WBE relatively small. This will go up sharply as WBE tech gets closer
I suspect that FAI theory, once understood, will still be simple enough (if any general theory is possible), and can be developed by vanilla humans (on unknown timescale, probably decades to hundreds of years, but at some point WBEs overtake the timescale estimates). By the time WBE becomes viable, the risk situation will be already very explosive, so if we can get a good understanding earlier, we could possibly avoid that risky period entirely. Also, having a viable technical Friendliness programme might give academic recognition to the problem (that these risks are as unavoidable as laws of physics, and not just something to talk with your friends about, like politics or football), which might spread awareness of the AI risks on an otherwise unachievable level, helping with institutional change promoting measures against wild AI and other existential risks. On the other hand, I won’t underestimate human craziness on this point as well—technical recognition of the problem may still live side to side with global indifference.
I believed similarly until I read Steve Omohundro’s The Basic AI Drives. It convinced me that a paperclip maximizer is the overwhelmingly likely outcome of creating an AGI.
That paper makes a convincing case that the ‘generic’ AI (some distribution of AI motivations weighted by our likelihood of developing them) will most prefer outcomes that rank low in our preference ordering, i.e. the free energy and atoms needed to support life as we know it or would want it will get reallocated to something else. That means that an AI given arbitrary power (e.g. because of a very hard takeoff, or easy bargaining among AIs but not humans, or other reasons) would be lethal. However, the situation seems different and more sensitive to initial conditions when we consider AIs with limited power that must trade off chances of conquest with a risk of failure and retaliation. I’m working on a write up of those issues.
Thanks Craig, I’ll check it out!
“But I believe it’s certain to eventually happen, absent the end of civilization before that.”
And I will live 1000 years, provided I don’t die first.
(As opposed to gradual progress, of course. I could make a case with your analogy facing an unexpected distinction also, as in what happens if you got overrun by a Friendly intelligence explosion, and persons don’t prove to be a valuable pattern, but death doesn’t adequately describe the transition either, as value doesn’t get lost.)