Let’s assume there is a rational agent without a goal.
Why should we assume this? Where would such an agent come from? Who would create it?
Let’s assume there is a rational agent without a goal.
Why should we assume this? Where would such an agent come from? Who would create it?
I had several teachers in both high school and college ask for writing assignments that had maximum word counts. e.g. “Write an explanation of the significance of the Treaty of Westphalia, in no more than 1500 words.” Those writing assignments were, to me, more difficult than the ones that had minimum word counts, because they required you to get to all the major elements of the response in a minimum of space, leaving space for explanation and implication.
And, for what it’s worth, writing in the “real world” is far more like this. Journalists, for example, rarely have minimum word count limits, but very often have maximum word counts.
I’ve never understood the obsession with going to bed and getting up at fixed times, independent of the seasons and everything else. (Is it a general American thing? I don’t hear about it in the UK.)
It’s a you-have-to-commute-to-work thing. If you’re expected in the office by a particular time (i.e. for morning stand-up), then you need to leave at a particular time. This implies you need to wake up at a particular time, so you can brush your teeth, shower, get dressed, etc.
At every time step, the AI will be trading off these drives against the value of producing more or doing more of whatever it was programmed to do. What happens when the AI decides that it’s learned enough from the biosphere and that the costs of preserving a biosphere for humans no longer outweigh the potential benefit that it earns from learning about biology, evolution and thermodynamics?
We humans make these trade-offs all the time, often unconsciously, as we weigh whether to bulldoze a forest, or build a dam, or dig a mine. A superintelligent AI will perhaps be more intentional in its calculations, but that’s still no guarantee that the result of the calculation will swing in humanity’s favor. We could, in theory, program the AI to preserve earth as a sanctuary. But, in my view, that’s functionally equivalent to solving alignment.
Your argument appears to be that an unaligned AI will, spontaneously, choose to, at the very least, preserve Earth as a sanctuary for humans into perpetuity. I still don’t see why it should do that.
Why should the AI prioritize preserving information over whatever other goal that it’s been programmed to accomplish?
The Sequence post Doublethink (Choosing To Be Biased) addresses the general form of this question, which is, “Is it ever optimal to adopt irrational beliefs in order to advance instrumental goals, such as happiness, wealth, etc?”
I’ll quote at length what I think is the relevant part of the post:
For second-order rationality to be genuinely rational, you would first need a good model of reality, to extrapolate the consequences of rationality and irrationality. If you then chose to be first-order irrational, you would need to forget this accurate view. And then forget the act of forgetting. I don’t mean to commit the logical fallacy of generalizing from fictional evidence, but I think Orwell did a good job of extrapolating where this path leads.
You can’t know the consequences of being biased, until you have already debiased yourself. And then it is too late for self-deception.
The other alternative is to choose blindly to remain biased, without any clear idea of the consequences. This is not second-order rationality. It is willful stupidity.
Be irrationally optimistic about your driving skills, and you will be happily unconcerned where others sweat and fear. You won’t have to put up with the inconvenience of a seat belt. You will be happily unconcerned for a day, a week, a year. Then crash, and spend the rest of your life wishing you could scratch the itch in your phantom limb. Or paralyzed from the neck down. Or dead. It’s not inevitable, but it’s possible; how probable is it? You can’t make that tradeoff rationally unless you know your real driving skills, so you can figure out how much danger you’re placing yourself in. You can’t make that tradeoff rationally unless you know about biases like neglect of probability.
No matter how many days go by in blissful ignorance, it only takes a single mistake to undo a human life, to outweigh every penny you picked up from the railroad tracks of stupidity.
In other words, the trouble with wilfully blinding yourself to reality is that you don’t get to choose what you’re blinding yourself to. It’s very difficult to say, “I’m going to ignore rationality for these specific domains, and only these specific domains.” The human brain really isn’t set up like that. If you’re going to abandon rational thought in favor of religious thought, are you sure you’ll be able to stop before you’re, e.g. questioning the efficacy of vaccines?
Another way of looking at the situation is by thinking about The Litany of Gendlin:
What is true is already so.
Owning up to it doesn’t make it worse.
Not being open about it doesn’t make it go away.
And because it’s true, it is what is there to be interacted with.
Anything untrue isn’t there to be lived.
People can stand what is true,
for they are already enduring it.
If an AI is capable of taking 99% of the resources that humans rely on to live, it’s capable of taking 100%.
Tell me why the AI should stop at 99% (or 85%, or 70%, or whatever threshold you wish to draw) without having that threshold encoded as one of its goals.
From your other reply
I just find the idea that the ASI will want my atoms for something trivial, when there are so many other atoms in the universe that are not part of a grand exploration of the extremes of thermodynamics, unconvincing.
The problem isn’t that the AI will want the atoms that comprise your body, specifically. That’s trivially false. It makes as much sense as the scene in The Matrix where Morpheus explained to Neo that the Matrix was using humans as living energy sources.
What is less trivially false is that the AI will alter the biosphere in ways that make it impossible (or merely very difficult) for humans to live, just as humans have altered the biosphere in ways that have made it impossible (or merely very difficult) for many other species to live. The AI will not intend to alter the biosphere. The biosphere alteration will be a side-effect of whatever the AI’s goals are. But the alteration will take place, regardless.
Put more pithily: tell me why I should expect a superintelligent AI to be an environmentalist.
My argument is that, like humanity, a superintelligent AI will initially find it easier to extract resources from Earth than it will from space based sources. By the time earth’s resources are sufficiently depleted that this is no longer the case, there will be far too little remaining for humanity to survive on.
The amount of energy and resources on Earth would be a rounding error in an ASI’s calculations.
Once again: this argument applies to humanity too. Everyone acknowledges that the asteroid belt holds far more resources than Earth. But here we are, building strip mines in Australia rather than hauling asteroids in from the belt.
Your counterargument is that the AI will find it much easier to go to space, not being constrained by human biology. Fine. But won’t the AI also find it much easier to build strip mines? Or harvest resources from the oceans? Or pave over vast tracts of land for use as solar farms? You haven’t answered why going to space will be cheaper for the AI than staying on earth. All you’ve proven is that going to space will be cheaper for the AI than it will be for humans, which is a claim that I’m not contesting.
The question isn’t whether it would be easier for superintelligent AI to go to space than it would be for humans. Of course it would be! Everything will be easier for a superintelligent AI.
The question is whether a superintelligent AI would prioritize going to space immediately, leaving Earth as an “untouched wilderness”, where humans are free to thrive. Or, will the superintelligent AI work on fully exploiting the resources it has at hand, here on earth, before choosing to go to space? I think the latter is far more likely. Superintelligence can’t beat physics. No matter what, it will always be easier to harvest closer resources than it will be to harvest resources that are farther away. The closest resources are on earth. So why should the superintelligent AI go to space, when, at least in the immediate term, it has everything it needs to grow right here?
There is so much space and solar energy in the asteroid belt, I’m sure there is a good chance that the ASI will be chill.
You could say the same thing about humanity. But here we are, maximizing our usage of Earth’s resources before we move out into the solar system.
I agree. But that’s true only for a very short time. I think it is certain that the rapidly self-improving AGI of superhuman intelligence will find a way to liberate itself from the human control within seconds at most. And long before humans start to consider switching off the entire Internet, the AGI will become free from the human infrastructure.
I think the misconception here is that the AGI has to conceive of humans as an existential threat for it to wipe them out. But why should that be the case? We wipe out lots of species which we don’t consider threats at all, merely by clearcutting forests and converting them to farmland. Or damming rivers for agriculture and hydropower. Or by altering the environment in a myriad number of other ways which make the environment more convenient for us, but less convenient for the other species.
Why do you think an unaligned AGI will leave Earth’s biosphere alone? What if we’re more akin to monarch butterflies than ants?
EDIT: (to address your sloth example specifically)
After this critical period, humanity will be as much a threat to the AGI as a caged mentally-disabled sloth baby is a threat to the US military. The US military is not waging wars against mentally disabled sloth babies.
Sure, humanity isn’t waging some kind of systematic campaign against pygmy three-toed sloths. At least, not from our perspective. But take the perspective of a pygmy three-toed sloth. From the sloth’s perspective, the near-total destruction of its habitat sure looks like a systematic campaign of destruction. Does the sloth really care that we didn’t intend to drive it to extinction while clearing forests for housing, farms and industry?
Similarly, does it really matter that much if the AI is being intentional about destroying humanity?
Conscience, as you’ve defined it, is value alignment. If the AI values the same things that we do, when offered a choice between two courses of action, it will choose the one that serves to enhance human values rather than degrade them. Designing an AI that does this, with no exceptions, is very hard.
This is similar to a scenario described by Michael Lewis, in The Big Short. In Lewis’ telling, Michael Burry noticed that there was a company (Liberty Interactive, if I remember correctly), that was in legal trouble. This legal trouble was fairly serious—it might have resulted in the liquidation of the company. However, if the company came through the legal trouble, it had good cash flow and was a decent investment.
Burry noticed that the company was trading at a steep discount to what cash flow analysis would predict its share price to be. He realized that what was occurring was that there was one group of investors who were betting that the company would survive its legal troubles, and trade at a “high” price, and there was another group of investors who thought that the stock was going to go to zero because of the legal trouble the company found itself in. Burry read the legal filings himself, came to the conclusion that it was probable that the company would survive its brush with the law, and invested heavily in it. As it turn out, his prediction was proven correct, and he made a nice return.
Burry’s position was a likely outcome. The short-sellers who thought that the stock would go to zero bet on another likely outcome. The only truly unlikely outcome is the one that the market, as a whole, was predicting when Burry made his investment. The price of the stock was an average of two viewpoints that, in a fundamental sense, could not be averaged. Either the company loses its court case, and the stock goes to zero. Or the company survives its court case (perhaps paying a fine in the process), and proceeds with business as usual. As a result, the current market price of the company is not a good guide to its long-term value, and it was possible, as Burry did, to beat the market.
I had a similar reaction when I read Zen and the Art of Motorcycle Maintenance. The main character is a terrible friend. There’s a passage in the book where his friend is clearly frustrated with his motorcycle, trying mightily to start it, and instead of clearly and patiently explaining, “No, the engine is flooded, let it rest for a few minutes and then try again,” he says, “It smells like a refinery,” or something equally cryptic. And then he sits back, smug in the knowledge that he has imparted great wisdom, but his friend is too stupid to understand.
Similarly, when he was at his friend’s house and noticed that their faucet was leaking, his reaction was to reflect on the superiority of people who know how to work with their hands and have a talent for dealing with mechanical systems, rather than asking to help. Even defusing the tension with an offhand remark about the unreliability of modern appliances would have been more helpful, and earned more of my respect as a reader, than his actual course of action.
Put bluntly, Pirsig comes off very much like the sort of edgelord presented in this meme. Whenever I have to help my non-technical relatives with computer issues or mechanical issues, I ask myself, “What would Pirsig do in this situation,” and then do the exact opposite of that.
I don’t want to speak for Duncan, but that’s not the meaning I took away from his reply. What I took away was that the received traditional wisdom often times is neither traditional, nor especially wise. Very many of our “ancient, cherished traditions”, date back to intentional attempts to create ancient cherished traditions in the roughly 1850-1950 era. These traditions were, oftentimes, not based on any actual historical research or scientific investigation. They were based on stereotypes and aesthetics.
To return to the fence analogy, it’s important to do a bit of historical research and try to determine whether the fence is actually a long-standing feature of the landscape or whether it was put up (figuratively) yesterday by someone who may not have known more about the territory than you.
EDIT: One common example is “blue for boys, red for girls”. In the past, red was the preferred color for males, because it was considered to be more “active” and “energetic”, as opposed to the “cool” “passive” energy that blue exuded. At some point this flipped, with blue becoming the color of reason and consideration (and thus associated with “rational” males) and red becoming the color of passion and emotion (and thus associated with “passionate, emotional” females). Why did it switch? Some people blame some marketing campaigns that were carried out at the turn of the 20th century, but the reason isn’t totally clear cut. What is clear to me, however, is that when people started associating blue with male-hood and red with female-hood, it wasn’t because of some careful consideration and close examination of the previous era’s choices regarding color associations. So, today, when associating colors with gender, I don’t feel any particular loyalty to “blue = boy; red = girl”, because that association wasn’t chosen via a considered process and hasn’t been in place nearly long enough to have established itself as a truly time honored tradition.
Criticizing the use of Bayes Theorem because it’s 260 years old is such a weird take.
The Pythagorean theorem is literally thousands of years old. But it’s still useful, even though lots of progress has been made in trigonometry since then. Should we abandon , as a result?
Any form of learning is a being modifying itself. How else would learning occur?
My objection is that any intelligence that is capable of considering these arguments and updating its goals in response is an intelligence that is either already aligned or capable of being brought into into alignment (i.e. “corrigible”).
An unaligned intelligence will have just as much comprehension of this post as a shredder has of the paper it’s chewing to pieces.