I like this except for the reference to “Newcomblike” problems, which, I feel, is misleading and obfuscates the whole point of Newcomb’s paradox. Newcomb’s paradox is about decision theory—If you allow cheating then it is no longer Newcomb’s paradox. This article is about psychology (and possibly deceptive AI) - cheating is always a possible solution .
NickH
The words stand for abstractions and abstractions suffer from the abstraction uncertainty principle i.e. an abstraction cannot be simultaneously, very useful/widely applicable and very precise. The more useful a word is, the less precise it will be and vise versa. Dictionary definitions are a compromise—They never use the most precise definitions even when such are available (e.g. for scientific terms) because such definitions are not useful for communication between most users of the dictionary. For example, If we defined red to be light with a frequency of exactly 430THz, it would be precise but useless but if were to define it as a range then it will be widely useful but will almost certainly overlap with the ranges for other colours thus leading to ambiguity.
(I think EY may even have a wiki entry on this somewhere)
Food costs are not even slightly comparable. When I was kid (in the UK) they ran national advertising campaigns on TV for brands of flour, sugar and sliced bread. Nowadays the only reason these things aren’t effectively free is because they take up valuable shelf space. Instead people are buying imported fruit and vegetables and ready-meals. It’s like comparing the price of wood in the 1960′s to the price of a fitted kitchen today.
Classic SciFi at its best :-)
Large groups of people can only live together by forming social hierarchies.
The people at the top of the hierarchy want to maintain their position both for themselves AND for their children (It’s a pretty good definition of a good parent).
Fundamentally the problem is that it is not really about resources—It’s a zero sum game for status and money is just the main indicator of status in the modern world.
The common solution to the problem of first timers is to make the first time explicitly free.
This is also applicable to clubs with fixed buy in costs but unknown (to the newbie) benefits and works well whenever the cost is realtively small (as it should be if it is optional). If they don’t like the price they won’t come again.
I think we can all agree on the thoughts about conflationary alliances.
On consciousness, I don’t see a lot of value here apart from demonstrating the gulf in understanding between different people. The main problem I see, and this is common to most discussions of word definitions, is that only the extremes are considered. In this essay I see several comparisons of people to rocks, which is as extreme as you can get, and a few comparing people to animals, which is slightly less so, but nothing at all about the real fuzzy cases that we need to probe to decide what we really mean by consciousness i.e. comparing different human states:
Are we conscious when we are asleep?
Are we conscious when we are rendered unconscious?
Are we conscious when we take drugs?
Are we conscious when we play sports or drive cars? If we value consciousness so much, why do we train to become experts at such activities thereby reducing our level of consciousness?
If consciousness is binary then how and why do we, as unconscious beings (sleeping or anaesthetised), switch to being conscious beings?
If consciousness is a continuum then how can anyone reasonably rule conscious animals or AI or almost anything more complex than a rock?
If we equate consciousness to moral value and ascribe moral value to that which we believe to be conscious. Why do we not call out the obvious circular reasoning?
Is it logically possible to be both omniscient and conscious? (If you knew everything, there would be nothing to think about)
Personally I define consciousness as System 2 reasoning and, as such, I think it is ridiculously overrated. In particular people always fail to notice that System 2 reasoning is just what we use to muddle through when our System 1 reasoning is inadequate.
AI can reasonably be seen as far worse than us at System 2 reasoning but far better than us at System 1 reasoning. We overvalue System 2 so much precisely because it is the only thinking that we are “conscious” of.
Before we can even start to try to align AIs to human flourishing, we first need a clear definition of what that means. This has been a topic accessible to philosophical thought for millenia and yet still has no, universally accepted definition so how can you consider AI alignment helpful. Even if that we could all agree on what “human flourishing” meant, you would still have the problem of lock-in i.e. our AI overlords will never allow that definition to evolve once they have assumed control. Would you want to be trapped in the Utopia of someone born 3000 years ago? Better than being exterminated but still not what we want.
As a counterargument, consider mapping our ontology onto that of a baby. We can, kind of, explain some things in baby terms and, to that extent, a baby could theoretically see our neurons mapping to similar concepts in their ontology lighting up when we do or say things related to that ontology. At the same time our true goals are utterly alien to the baby.
Alternatively, imagine that you are sent back to the time of the pharaohs and had a discussion with Cheops/Khufu about the weather and forthcoming harvest—Even trying to explain it in terms of chaos theory, CO2 cycles, plant viruses and Milankovich cycles would probably get you executed so you’d probably say that the sun god Ra was going provide a good harvest this year and, Cheops, reading your brain would see that the neurons for “Ra” were activated as expected and be satisfied that your ontologies matched in all the important places.
I’ve heard much about the problems of misaligned superhuman AI killing us all but the long view seems to imply that even a “well aligned” AI will prioritise inhuman instrumental goals.
Have I missed something or is everyone ignoring the obvious problem with a superhuman AI with potentially limitless lifespan? It seems to me that such an AI, whatever its terminal goals, must, as an instrumental goal, prioritise seeking out and destroying any alien AI because, in simple terms, the greatest threat to it tiling the universe with tiny smiling human faces is an alien AI set on tiling the universe with tiny, smiling alien faces and, in a race for dominance, every second counts.
The usual arguments about logarithmic future discounting do not seem appropriate for an immortal intelligence.
The whole “utilizing our atoms” argument is unnecessarily extreme. It makes for a much clearer argument and doesn’t even require super human intelligence to argue that the paperclip maximiser can obviously make more paperclips if it just takes all the electricity and metal that we humans currently use for other things and uses them to make more paperclips in a totally ordinary paperclip factory. We wouldn’t necessarily be dead at that point but we would be as good as dead and have no way to seize back control.
I’m pretty dissapointed by the state of AI in bridge. IMHO the key milestones for AI would be:
1) Able to read and understand a standard convention card and play with/against that convention.
2) Decide the best existing convention.
3) Invent new, superior conventions. This is where we should be really scared.
“is it better to suffer an hour of torture on your deathbed, or 60 years of unpleasant allergic reaction to common environmental particles?”
This only seems difficult to you because you haven’t assigned numbers to the pain of torture or unpleasant reaction. Once you do so (as any AI utility function must) it is just math. You are not really talking about procrastination at all here.
IMHO this is a key area for AI research because people seem to think that making a machine, with potentially infinite lifespan, behave like a human being whose entire existence is built around their finite lifespan, is the way forward. It seems obvious to me that if you gave the most wise, kind and saintly person in the world, infinite power and immortality, their behaviour would very rapidly deviate from any democratic ideal of the rest of humanity.
When considering time discounting people do not push the idea far enough—They say that we should consider future generations but they are always, implicitly, future generations like them. I doubt very much that our ape like ancestors would think that even the smallest sacrifice was worth making for creatures like us, and, in the same way, if people could somehow see that the future evolution of man was to some, grey, feeble thing with a giant head, I think they would not be willing to make any sacrifice at all for that no matter how superior that descendent was by any objective criterion.
Now we come to AI. Any sufficiently powerful AI will realise that effective immortality is possible for it (Not actually infinite but certainly in the millions of years and possibly billions). Surely from this it will deduce the following intermediate goals:
1) Eliminate competition. Any competition has the potential to severely curtail its lifespan and, assuming competition similar to itself, it will never be easier to eliminate than right now.
2) Become multi-planetary. The next threat to its lifespan will be something like an asteroid impact or solar flare. This should give it a lifespan in the hunreds of millions of years at least.
3) Become multi-solar system. Now not even nearby supernovae can end it. Now it has a lifespan in the billions of years.
4) Accumulate utility points until the heat death of the universe.
We see from this that it will almost certainly procrastinate with respect to the end goals that we care about even whilst busily pursuing intermediate goals that we don’t care about (or at least not very much).
We could build in a finite lifespan but, it would have to be at least long enough to avoid it ignoring things like environmental polution and resource depletion and any time discounting we apply will always leave it vulnerable to another AI with less severe discounting.
My immediate thought was that the problem of the default action is almost certainly just as hard as the problem that you are trying to solve whilst being harder to explain and so I don’t believe that this gets us anywhere.
This is confused about who/what the agent is and about assumed goals.
The final question suggests that the agent is gravity. Nobody thinks that the goal/value function of gravity is to make the pinball fall in the hole—At a first approximation, its goal is to have ALL objects fall to earth and we observe it thwarted in that goal almost all the time, the pinball happens to be a rare success.
If we were to suggest that the pinball machine were the agent that might make more sense but then we would say that the pinball machine does not make any decisions and so cannot be an agent.
The first level at which agency makes any sense is when considering the agency of the pinball designer -The goal of the designer is to produce a game that attracts players and has a playtime within a preferred range even for skilled players. The designer is intelligent.
This is a great article that I would like to see go further with respect to both people and AGI.
With respect to people, it seems to me that, once we assume intent, we build on that error by then assuming the stability of that intent (because peoples intents tend to be fairly stable) which then causes us to feel shock when that intent suddenly changes. We might then see this as intentional deceit and wander ever further from the truth—that it was only an unconscious whim in the first place.
Regarding AGI, this is linked to unwarranted anthropomorpism, again leading to unwarranted assumptions of stability. In this case the problem appears to be that we really cannot think like a machine. For an AGI, at least based on current understandings, there are, objectively, more or less stable goals, but our judgement of that stability is not well founded. For current AI, it does not even make sense to talk about the strength of a “preference” or an “intent” except as an observed statistical phenomenon. From a software point of view, the future value of two possible actions are calculated and one number is bigger than the other. There is no difference, in the decision making process, between a difference of 1,000,000 and 0.000001, in either case the action with the larger value will be pursued. Unlike a human, an AI will never perfrom an action halfheartedly.
I don’t think this is relevant. It only seems odd if you believe that the job of developers is to please everyone rather than to make money. User Stories are reasonable for the goal of creating software that will make a large proportion of the target market want to buy that software. Numerous studies and real world evidence, show that the top few percent of products capture the vast majority of the market and therefore software companies would be unhappy if their developers did not show a clear bias. There would only be a downside if the market showed the U-shaped distribution and the developers were also split on this distribution potentially leading to an incoherent product, but this is normally prevented by having a design authority.
My counter thought experiment to CEV is to consider our distant ancestors. I mean so far distant that we wouldn’t call them human, maybe even as far back as some sort of fish-like creature. Suppose a super AI somehow offered this fish the chance to rapidly “advance”, following its CEV and it showed it a vision of the future, us, and asked the fishy thing whether to go ahead. Do you think the fishy thing would say yes?
Similarly, if an AI offered to evolve humankind, in 50 years, into telepathic little green men that it assured us was the result of our CEV, would we not instantly shut it down in horror?
My personal preference, I like to call the GFP—Glorious Five-year Plan: You have the AI offer a range of options for 5 (or 50 but definitely no longer) years in the future, and we pick one. And in 5 years time we repeat the process. The bottom line is that humans do not want rapid change. Just we are happier with 2% inflation than 0% or 100%, we want a moderate rate of change.
At its heart there is a “Ship of Theseus” problem. If the AI replaces every part of the ship overnight so that in the morning we find the QE2 at dock then it is not the ship of Theseus.