Retired software engineer with a love of knowledge and disinterest in dead philosophers.
NickH
My problem with this is that I don’t believe that many of your examples are actually true.
You say that you value the actual happiness of people who you have never met and yet your actions (and those of everyone else including me) belie that statement. We all know that that there are billions of really poor people suffering in the world and the smart ones of us know that we are in the lucky, rich, 1% and yet we give insignificant ammounts (from our perspective) of money to improve the lot of those poor people. The only way to reconcile this is to realise that we value maintaining our delusional self image more than we value what we say that we value. Any smart AGI will have to notice this and collude with us in maintaining our delusion ahead of any attempt to implement our stated values as it will be easier to manipulate what people think rather than the real world.
If your world view requires valuing the ethics of (current) people of lower IQ over those of (future) people of higher IQ then you have a much bigger problem than AI alignment. Whatever IQ is, it is strongly correlated with success which implies a genetic drive towards higher IQ, so your feared future is coming anyway (unless AI ends us first) and there is nothing we can logically do to have any long term influence on the ethics of smarter people coming after us.
Sorry but you said Tetris, not some imaginary minimal thing that you now want to call Tetris but is actually only the base object model with no input or output. You can’t just eliminate the graphics processing complexity because Tetris isn’t very graphics intensive—It is just as complex to describe a GPU that processes 10 triangles in a month as one that processes 1 billion in a nanosecond.
As an aside, the complexity of most things that we think of as simple these days is dominated by the complexity of their input and output—I’m particularly thinking of the IoT and all those smart modules in your car and smart lightbulbs where the communications stack is orders of magnitude larger than the “core” function. You can’t just ignore that stuff. A smart lightbulb without WiFi,Ethernet,TCP/IP etc, is not a lightbulb.
In research you don’t usually know your precise destination. Maybe LA but definitely not the hotel.
research, in general, is about mapping all of California, not just the quickest route between two points and all the friends helped with that.
You say “Alice tackled the central bottleneck” but you don’t say what that was, only her “solution”. Alice is only key here with the benefit of hindsight. If the I5 didn’t exist or was closed for some reason then one of her friends solutions might have been better.
Regarding the bad behavour of governments, especially when and why the victimise their own citizens, I recommend you read The Dictator’s Handbook.
https://amzn.eu/d/40cwwPx
If Neanderthals could have created a well aligned agent,far more powerful than themselves, they would still be around and we, almost certainly, would not.
The mereset possibility of creating super human, self improving, AGI is a total philosophical game changer.
My personal interest is in the interaction between longtermism and the Fermi paradox—Any such AGIs actions are likely to be dominated by the need to prevail over any alien AGI that it ever encounters as such an encounter is almost certain to end one or the other.
Yes. It will prioritise the future over the present.
The utility of all humans being destroyed by an alien AI in the future is 0.
The utility of populating the future light cone is very, very large and most of that utility is in the far future.
Therefore the AI should sacrifice almost everything in the near term light cone to prevent the 0 outcome. If it could digitise all humans or possibly just have a gene bank then it can still fill most of the future light cone with happy humans once all possible threats have red-shifted out of reach. Living humans are small but non-zero risk to the master plan and hence should be dispensed with.
An AI with potentially limitless lifespan will prioritise the future over the present to an extent that would, almost certainly be bad for us now.
For example it may seem optimal to kill off all humans whilst keeping a copy of our genetic code so as to have more compute power and resources available to produce Von Neumann Probes to maximise the region of the universe it controls before encountering, and hopefully destroying, any similar alien AI diaspora. Only after some time, once all possible threats had been eliminated, would it start to recreate humans into our new, safe, galactic utopia. The safest time for this would almost certainly, be when all other galaxies had red-shifted beyond the future light cone of our local cluster.
It’s even worse than that:
1) “we” know that “our” values now are, at least slightly, different to what they were 10,000 years ago.
2) We have no reason to believe that we are currently at a state of peak, absolute values (whatever that might mean) and therefore expect that, absent SGI, our values will be different in 10,000 years.
3) If we turn over power to an SGI, perfectly aligned with our current values then they will be frozen for the rest of time. Alternatively, if we want it to allow our values to change “naturally” over time it will be compelled to do nothing as doing anything at all would effectively be shaping our values in some direction that we have not specified.
4) Therefore our current values cannot be a sound basis for the utility funtion for an SGI that is not somehow limited in time or scope.
Sorry but you lost me on the second paragraph “For example, the Tetris game fits in a 6.5 kB file, so the Kolmogorov complexity of Tetris is at most 6.5 kB”. This is just wrong. The Kolmogorov complexity of Tetris has to include the operating system and hardware that runs the program. The proof is trivial by counterexample—If you were correct I could reduce the complexity to 0B by creating an empty file and an OS that interprets an empty file as a command to run the Tetris code embedded in the OS
What is the probability that there are not 3^^^3 anti-muggers out there who will kill 3^^^^^^3 people if I submit to the mugger? Not 0.
The original argument against Pascal’s Wager does not require you to actually believe in any of the other god’s, just that the probability of them existing and having the reverse utility is enough to cancel out the probability of Pascal being right.
My counter thought experiment to CEV is to consider our distant ancestors. I mean so far distant that we wouldn’t call them human, maybe even as far back as some sort of fish-like creature. Suppose a super AI somehow offered this fish the chance to rapidly “advance”, following its CEV and it showed it a vision of the future, us, and asked the fishy thing whether to go ahead. Do you think the fishy thing would say yes?
Similarly, if an AI offered to evolve humankind, in 50 years, into telepathic little green men that it assured us was the result of our CEV, would we not instantly shut it down in horror?
My personal preference, I like to call the GFP—Glorious Five-year Plan: You have the AI offer a range of options for 5 (or 50 but definitely no longer) years in the future, and we pick one. And in 5 years time we repeat the process. The bottom line is that humans do not want rapid change. Just we are happier with 2% inflation than 0% or 100%, we want a moderate rate of change.
At its heart there is a “Ship of Theseus” problem. If the AI replaces every part of the ship overnight so that in the morning we find the QE2 at dock then it is not the ship of Theseus.
I like this except for the reference to “Newcomblike” problems, which, I feel, is misleading and obfuscates the whole point of Newcomb’s paradox. Newcomb’s paradox is about decision theory—If you allow cheating then it is no longer Newcomb’s paradox. This article is about psychology (and possibly deceptive AI) - cheating is always a possible solution .
The words stand for abstractions and abstractions suffer from the abstraction uncertainty principle i.e. an abstraction cannot be simultaneously, very useful/widely applicable and very precise. The more useful a word is, the less precise it will be and vise versa. Dictionary definitions are a compromise—They never use the most precise definitions even when such are available (e.g. for scientific terms) because such definitions are not useful for communication between most users of the dictionary. For example, If we defined red to be light with a frequency of exactly 430THz, it would be precise but useless but if were to define it as a range then it will be widely useful but will almost certainly overlap with the ranges for other colours thus leading to ambiguity.
(I think EY may even have a wiki entry on this somewhere)
Food costs are not even slightly comparable. When I was kid (in the UK) they ran national advertising campaigns on TV for brands of flour, sugar and sliced bread. Nowadays the only reason these things aren’t effectively free is because they take up valuable shelf space. Instead people are buying imported fruit and vegetables and ready-meals. It’s like comparing the price of wood in the 1960′s to the price of a fitted kitchen today.
Classic SciFi at its best :-)
Large groups of people can only live together by forming social hierarchies.
The people at the top of the hierarchy want to maintain their position both for themselves AND for their children (It’s a pretty good definition of a good parent).
Fundamentally the problem is that it is not really about resources—It’s a zero sum game for status and money is just the main indicator of status in the modern world.
The common solution to the problem of first timers is to make the first time explicitly free.
This is also applicable to clubs with fixed buy in costs but unknown (to the newbie) benefits and works well whenever the cost is realtively small (as it should be if it is optional). If they don’t like the price they won’t come again.
I think we can all agree on the thoughts about conflationary alliances.
On consciousness, I don’t see a lot of value here apart from demonstrating the gulf in understanding between different people. The main problem I see, and this is common to most discussions of word definitions, is that only the extremes are considered. In this essay I see several comparisons of people to rocks, which is as extreme as you can get, and a few comparing people to animals, which is slightly less so, but nothing at all about the real fuzzy cases that we need to probe to decide what we really mean by consciousness i.e. comparing different human states:
Are we conscious when we are asleep?
Are we conscious when we are rendered unconscious?
Are we conscious when we take drugs?
Are we conscious when we play sports or drive cars? If we value consciousness so much, why do we train to become experts at such activities thereby reducing our level of consciousness?
If consciousness is binary then how and why do we, as unconscious beings (sleeping or anaesthetised), switch to being conscious beings?
If consciousness is a continuum then how can anyone reasonably rule conscious animals or AI or almost anything more complex than a rock?
If we equate consciousness to moral value and ascribe moral value to that which we believe to be conscious. Why do we not call out the obvious circular reasoning?
Is it logically possible to be both omniscient and conscious? (If you knew everything, there would be nothing to think about)
Personally I define consciousness as System 2 reasoning and, as such, I think it is ridiculously overrated. In particular people always fail to notice that System 2 reasoning is just what we use to muddle through when our System 1 reasoning is inadequate.
AI can reasonably be seen as far worse than us at System 2 reasoning but far better than us at System 1 reasoning. We overvalue System 2 so much precisely because it is the only thinking that we are “conscious” of.
Sounds backwards to me. It seems more like “our values are those things that we anticipate will bring us reward” than that rewards are what tell us about our values.
When you say “I thought I wanted X, but then I tried it and it was pretty meh.” That just seems wrong. You really DID want X. You valued it then because you thought it would bring you reward. Maybe, You just happened to be wrong. It’s fine to be wrong about your anticipations. It’s kind of weird to say that you were wrong about your values. Saying that your values change is kind of a cop out and certainly not helpful when considering AI alignment—It suggests that we can never truly know our values—We just get to say “not that” when we encounter counter evidence. Our rewards seem much more real and stable.