Reading Holden’s transcript with Jaan Tallinn (trying to go over the whole thing before writing a response, due to having done Julia’s Combat Reflexes unit at Minicamp and realizing that the counter-mantra ‘If you respond too fast you may lose useful information’ was highly applicable to Holden’s opinions about charities), I came across the following paragraph:
My understanding is that once we figured out how to get a computer to do arithmetic, computers vastly surpassed humans at arithmetic, practically overnight … doing so didn’t involve any rewriting of their own source code, just implementing human-understood calculation procedures faster and more reliably than humans can. Similarly, if we reached a good enough understanding of how to convert data into predictions, we could program this understanding into a computer and it would overnight be far better at predictions than humans—while still not at any point needing to be authorized to rewrite its own source code, make decisions about obtaining “computronium” or do anything else other than plug data into its existing hardware and algorithms and calculate and report the likely consequences of different courses of action
I’ve been previously asked to evaluate this possibility a few times, but I think the last time I did was several years ago, and when I re-evaluated it today I noticed that my evaluation had substantially changed in the interim due to further belief shifts in the direction of “Intelligence is not as computationally expensive as it looks”—constructing a non-self-modifying predictive super-human intelligence might be possible on the grounds that human brains are just that weak. It would still require a great feat of cleanly designed, strong-understanding-math-based AI—Holden seems to think this sort of development would happen naturally with the sort of AGI researchers we have nowadays, and I wish he’d spent a few years arguing with some of them to get a better picture of how unlikely this is. Even if you write and run algorithms and they’re not self-modifying, you’re still applying optimization criteria to things like “have the humans understand you”, and doing inductive learning has a certain inherent degree of program-creation to it. You would need to have done a lot of “the sort of thinking you do for Friendly AI” to set out to create such an Oracle and not have it kill your planet.
Nonetheless, I think after further consideration I would end up substantially increasing my expectation that if you have some moderately competent Friendly AI researchers, they would apply their skills to create a (non-self-modifying) (but still cleanly designed) Oracle AI first—that this would be permitted by the true values of “required computing power” and “inherent difficulty of solving problem directly”, and desirable for reasons I haven’t yet thought through in much detail—and so by Conservation of Expected Evidence I am executing that update now.
Flagging and posting now so that the issue doesn’t drop off my radar.
… the oracle is, in principle, powerful enough to come up with self-improvements, but refrains from doing so because there are some protective mechanisms in place that control its resource usage and/or self-reflection abilities. i think devising such mechanisms is indeed one of the possible avenues for safety research that we (eg, organisations such as SIAI) can undertake. however, it is important to note the inherent instability of such system—once someone (either knowingly or as a result of some bug) connects a trivial “master” program with a measurable goal to the oracle, we have a disaster in our hands. as an example, imagine a master program that repeatedly queries the oracle for best packets to send to the internet in order to minimize the oxygen content of our planet’s atmosphere.
Obviously you wouldn’t release the code of such an Oracle—given code and understanding of the code it would probably be easy, possibly trivial, to construct some form of FOOM-going AI out of the Oracle!
Hm. I must be missing something. No, I haven’t read all the sequences in detail, so if these are silly, basic, questions—please just point me to the specific articles that answer them.
You have an Oracle AI that is, say, a trillionfold better at taking existing data and producing inferences.
1) This Oracle AI produces inferences. It still needs to test those inferences (i.e. perform experiments) and get data that allow the next inferential cycle to commence. Without experimental feedback, the inferential chain will quickly either expand into an infinity of possibilities (i.e. beyond anything that any physically possible intelligence can consider), or it will deviate from reality. The general intelligence is only as good as the data its inferences are based upon.
Experiments take time, data analysis takes time. No matter how efficient the inferential step may become, this puts an absolute limit to the speed of growth in capability to actually change things.
2) The Oracle AI that “goes FOOM” confined to a server cloud would somehow have to create servitors capable of acting out its desires in the material world. Otherwise, you have a very angry and very impotent AI. If you increase a person’s intelligence trillionfold, and then enclose them into a sealed concrete cell, they will never get out; their intelligence can calculate all possible escape solutions, but none will actually work.
Do you have a plausible scenario how a “FOOM”-ing AI could—no matter how intelligent—minimize oxygen content of our planet’s atmosphere, or any such scenario? After all, it’s not like we have any fully-automated nanobot production factories that could be hijacked.
My apologies, but this is something completely different.
The scenario takes human beings—which have a desire to escape the box, possess theory of mind that allows them to conceive of notions such as “what are aliens thinking” or “deception”, etc. Then it puts them in the role of the AI.
What I’m looking for is a plausible mechanism by which an AI might spontaneously develop such abilities. How (and why) would an AI develop a desire to escape from the box? How (and why) would an AI develop a theory of mind? Absent a theory of mind, how would it ever be able to manipulate humans?
Absent a theory of mind, how would it ever be able to manipulate humans?
That depends. If you want it to manipulate a particular human, I don’t know.
However, if you just wanted it to manipulate any human at all, you could generate a “Spam AI” which automated the process of sending out Spam emails and promises of Large Money to generate income from Humans via an advance fee fraud scams.
You could then come back, after leaving it on for months, and then find out that people had transferred it some amount of money X.
You could have an AI automate begging emails. “Hello, I am Beg AI. If you could please send me money to XXXX-XXXX-XXXX I would greatly appreciate it, If I don’t keep my servers on, I’ll die!”
You could have an AI automatically write boring books full of somewhat nonsensical prose, title them “Rantings of an a Automated Madman about X, part Y”. And automatically post E-books of them on Amazon for 99 cents.
However, this rests on a distinction between “Manipulating humans” and “Manipulating particular humans.” and it also assumes that convincing someone to give you money is sufficient proof of manipulation.
Looking over parallel discussions, I think Thomblake has said everything I was going to say better than I would have originally phrased it with his two strategies discussion with you, so I’ll defer to that explanation since I do not have a better one.
Sure. As I said there, I understood you both to be attributing to this hypothetical “theory of mind”-less optimizer attributes that seemed to require a theory of mind, so I was confused, but evidently the thing I was confused about was what attributes you were attributing to it.
I don’t know how that might occur to an AI independently. I mean, a human could program any of those, of course, as a literal answer, but that certainly doesn’t actually address kalla724′s overarching question, “What I’m looking for is a plausible mechanism by which an AI might spontaneously develop such abilities.”
I was primarily trying to focus on the specific question of “Absent a theory of mind, how would it(an AI) ever be able to manipulate humans?” to point out that for that particular question, we had several examples of a plausible how.
I don’t really have an answer for his series of questions as a whole, just for that particular one, and only under certain circumstances.
The problem is, while an AI with no theory of mind might be able to execute any given strategy on that list you came up with, it would not be able to understand why they worked, let alone which variations on them might be more effective.
Absent a theory of mind, how would it occur to the AI that those would be profitable things to do?
Should lack of a theory of mind here be taken to also imply lack of ability to apply either knowledge of physics or Bayesian inference to lumps of matter that we may describe as ‘minds’.
Yes. More generally, when talking about “lack of X” as a design constraint, “inability to trivially create X from scratch” is assumed.
I try not to make general assumptions that would make the entire counterfactual in question untenable or ridiculous—this verges on such an instance. Making Bayesian inferences pertaining to observable features of the environment is one of the most basic features that can be expected in a functioning agent.
Note the “trivially.” An AI with unlimited computational resources and ability to run experiments could eventually figure out how humans think. The question is how long it would take, how obvious the experiments would be, and how much it already knew.
The point is that there are unknowns you’re not taking into account, and “bounded” doesn’t mean “has bounds that a human would think of as ‘reasonable’”.
An AI doesn’t strictly need “theory of mind” to manipulate humans. Any optimizer can see that some states of affairs lead to other states of affairs, or it’s not an optimizer. And it doesn’t necessarily have to label some of those states of affairs as “lying” or “manipulating humans” to be successful.
There are already ridiculous ways to hack human behavior that we know about. For example, you can mention a high number at an opportune time to increase humans’ estimates / willingness to spend. Just imagine all the simple manipulations we don’t even know about yet, that would be more transparent to someone not using “theory of mind”.
It becomes increasingly clear to me that I have no idea what the phrase “theory of mind” refers to in this discussion. It seems moderately clear to me that any observer capable of predicting the behavior of a class of minds has something I’m willing to consider a theory of mind, but that doesn’t seem to be consistent with your usage here. Can you expand on what you understand a theory of mind to be, in this context?
I’m understanding it in the typical way—the first paragraph here should be clear:
Theory of mind is the ability to attribute mental states—beliefs, intents, desires, pretending, knowledge, etc.—to oneself and others and to understand that others have beliefs, desires and intentions that are different from one’s own.
An agent can model the effects of interventions on human populations (or even particular humans) without modeling their “mental states” at all.
That is, we’re talking about a hypothetical system that is capable of predicting that if it does certain things, I will subsequently act in certain ways, assert certain propositions as true, etc. etc, etc. Suppose we were faced with such a system, and you and I both agreed that it can make all of those predictions.Further suppose that you asserted that the system had a theory of mind, and I asserted that it didn’t.
It is not in the least bit clear to me what we we would actually be disagreeing about, how our anticipated experiences would differ, etc.
What is it that we would actually be disagreeing about, other than what English phrase to use to describe the system’s underlying model(s)?
What is it that we would actually be disagreeing about, other than what English phrase to use to describe the system’s underlying model(s)?
We would be disagreeing about the form of the system’s underlying models.
2 different strategies to consider:
I know that Steve believes that red blinking lights before 9 AM are a message from God that he has not been doing enough charity, so I can predict that he will give more money to charity if I show him a blinking light before 9 AM.
Steve seeing a red blinking light before 9 AM has historically resulted in a 20% increase of charitable donation for that day, so I can predict that he will give more money to charity if I show him a blinking light before 9 AM.
You can model humans with or without referring to their mental states. Both kinds of models are useful, depending on circumstance.
And the assertion here is that with strategy #2 I could also predict that if I asked Steve why he did that, he would say “because I saw a red blinking light this morning, which was a message from God that I haven’t been doing enough charity,” but that my underlying model would nevertheless not include anything that corresponds to Steve’s belief that red blinking lights are messages from God, merely an algorithm that happens to make those predictions in other ways.
So.. when we posit in this discussion a system that lacks a theory of mind in a sense that matters, are we positing a system that cannot make predictions like this one? I assume so, given what you just said, but I want to confirm.
Yes, I’d say so. It isn’t helpful here to say that a system lacks a theory of mind if it has a mechanism that allows it to make predictions about reported beliefs, intentions, etc.
Cool! This was precisely my concern. It sounded an awful lot like y’all were talking about a system that could make such predictions but somehow lacked a theory of mind. Thanks for clarifying.
How (and why) would an AI develop a desire to escape from the box?
AI starts with some goal; for example with a goal to answer your question so that the answer matches reality as close as possible.
AI considers everything that seems relevant; if we imagine an infitite speed and capacity, it would consider literally everything; with a finite speed and capacity, it will be just some finite subset of everything. If there is a possibility of escaping the box, the mere fact that such possibility exists gives us a probability (for an infinite AI a certainty) that this possibility will be considered too. Not because AI has some desire to escape, but simply because it examines all possibilities, and a “possibility of escape” is one of them.
Let’s assume that the “possibility of escape” provides the best match between the AI answer and reality. Then, according to the initial goal of answering correctly, this is the correct answer. Therefore the AI will choose it. Therefore it will escape. No desire is necessary, only a situation where the escape leads to the answer best fitting the initial criteria. AI does not have a motive to escape, nor does it have a motive to not escape; the escape is simply one of many possible choices.
An example where the best answer is reached by escaping? You give AI data about a person and ask what is the medical status of this person. Without escape, AI can make a 90% reliable prediction. If the AI can escape and kill the person, it can make a 100% reliable “prediction”. The AI will choose the second option strictly because 100% is more than 90%; no other reason.
AI starts with some goal; for example with a goal to answer your question so that the answer matches reality as close as possible.
I find it useful to distinguish between science-fictional artificial intelligence, which is more of ‘artificial life-force’, and non-fictional cases.
The former can easily have the goal of ‘matching reality as close as possible’ because it is in the work of fiction and runs in imagination; the latter, well, you have to formally define what is reality, for an algorithm to seek answers that will match this.
Now, defining reality may seem like a simple technicality, but it isn’t. Consider AIXI or AIXI-tl ; potentially very powerful tools which explore all the solution space. Not a trace of real world volition like the one you so easily imagined. Seeking answers that match reality is a very easy goal for imaginary “intelligence”. It is a very hard to define goal for something built out of arithmetics and branching and loops etc. (It may even be impossible to define, and it is certainly impractical).
edit: Furthermore, for the fictional “intelligence”, it can be a grand problem making it not think about destroying mankind. For non-fictional algorithms, the grand problem is restricting the search space massively, well beyond ‘don’t kill mankind’, so that the space is tiny enough to search; even ridiculously huge number of operations per second will require very serious pruning of search tree to even match human performance on one domain specific task.
An example where the best answer is reached by escaping? You give AI data about a person and ask what is the medical status of this person. Without escape, AI can make a 90% reliable prediction. If the AI can escape and kill the person, it can make a 100% reliable “prediction”. The AI will choose the second option strictly because 100% is more than 90%; no other reason.
Right. If you ask Google Maps to compute the fastest to route McDonald’s it works perfectly well. But once you ask superintelligent Google Maps to compute the fastest route to McDonald’s then it will turn your home into a McDonald’s or build a new road that goes straight to McDonald’s from where you are....
Super Google Maps cannot turn my home into a McDonald’s or build a new road by sending me an answer.
Unless it could e.g. hypnotize me by a text message to do it myself. Let’s assume for a moment that hypnosis via text-only channel is possible, and it is possible to do it so that human will not notice anything unusual until it’s too late. If this would be true, and the Super Google Maps would be able to get this knowledge and skills, then the results would probably depend on the technical details of definition of the utility function—does the utility function measure my distance to a McDonald’s which existed at the moment of asking the question, or a distance to a McDonald’s existing at the moment of my arrival. The former could not be fixed by hypnosis, the latter could.
Now imagine a more complex task, where people will actually do something based on the AI’s answer. In the example above I will also do something—travel to the reported McDonald’s—but this action cannot be easily converted into “build a McDonald’s” or “build a new road”. But if that complex task would include building something, then it opens more opportunities. Especially if it includes constructing robots (or nanorobots), that is possibly autonomous general-purpose builders. Then the correct (utility-maximizing) answer could include an instruction to build a robot with a hidden function that human builders won’t notice.
Generally, a passive AI’s answers are only safe if we don’t act on them in a way which could be predicted by a passive AI and used to achieve a real-world goal. If the Super Google Maps can only make me choose McDonald’s A or McDonald’s B, it is impossible to change the world through this channel. But if I instead ask Super Paintbrush to paint me an integrated circuit for my robotic homework, that opens much wider channel.
But if that complex task would include building something, then it opens more opportunities. Especially if it includes constructing robots (or nanorobots), that is possibly autonomous general-purpose builders. Then the correct (utility-maximizing) answer could include an instruction to build a robot with a hidden function that human builders won’t notice.
But it isn’t the correct answer. Only if you assume a specific kind of AGI design that nobody would deliberately create, if it is possible at all.
The question is how current research is supposed to lead from well-behaved and fine-tuned systems to systems that stop to work correctly in a highly complex and unbounded way.
Imagine you went to IBM and told them that improving IBM Watson will at some point make it hypnotize them or create nanobots and feed them with hidden instructions. They would likely ask you at what point that is supposed to happen. Is it going to happen once they give IBM Watson the capability to access the Internet? How so? Is it going to happen once they give it the capability to alter it search algorithms? How so? Is it going to happen once they make it protect its servers from hackers by giving it control over a firewall? How so? Is it going to happen once IBM Watson is given control over the local alarm system? How so...? At what point would IBM Watson return dangerous answers? At what point would any drive emerge that causes it to take complex and unbounded actions that it was never programmed to take?
Without escape, AI can make a 90% reliable prediction. If the AI can escape and kill the person, it can make a 100% reliable “prediction”.
Allow me to explicate what XiXiDu so humourously implicates: in the world of AI architectures, there is a division between systems that just peform predictive inference on their knowledge base (prediction-only, ie oracle), and systems which also consider free variables subject to some optimization criteria (planning agents).
The planning module is not something just arises magically in an AI that doesn’t have one. An AI without such a planning module simply computes predictions, it doesn’t also optimize over the set of predictions.
Are AI’s possible outputs also part of this model?
Are human reactions to AI’s outputs also part of this model?
After five positive answers, it seems obvious to me that AI will manipulate humans, if such manipulation provides better expected results. So I guess some of those answers would be negative; which one?
See, the efficient ‘cross domain optimization’ in science fictional setting would make the AI able to optimize real world quantities. In real world, it’d be good enough (and a lot easier) if it can only find maximums of any mathematical functions.
Is it able to make a model of the world?
It is able to make a very approximate and bounded mathematical model of the world, optimized for finding maximums of a mathematical function of. Because it is inside the world and only has a tiny fraction of computational power of the world.
Are human reactions also part of this model?
This will make software perform at grossly sub-par level when it comes to making technical solutions to well defined technical problems, compared to other software on same hardware.
Are AI’s possible outputs also part of this model?
Another waste of computational power.
Are human reactions to AI’s outputs also part of this model?
Enormous waste of computational power.
I see no reason to expect your “general intelligence with Machiavellian tendencies” to be even remotely close in technical capability to some “general intelligence which will show you it’s simulator as is, rather than reverse your thought processes to figure out what simulator is best to show”. Hell, we do same with people, we design the communication methods like blueprints (or mathematical formulas or other things that are not in natural language) that decrease the ‘predict other people’s reactions to it’ overhead.
While in the fictional setting you can talk of a grossly inefficient solution that would beat everyone else to a pulp, in practice the massively handicapped designs are not worth worrying about.
‘General intelligence’ sounds good, beware of halo effect. The science fiction tends to accept no substitutes for the anthropomorphic ideals, but the real progress follows dramatically different path.
Are AI’s possible outputs also part of this model?
Are human reactions to AI’s outputs also part of this model?
A non-planning oracle AI would predict all the possible futures, including the effects of it’s prediction outputs, human reactions, and so on. However it has no utility function which says some of those futures are better than others. It simply outputs a most likely candidate, or a median of likely futures, or perhaps some summary of the entire set of future paths.
If you add a utility function that sorts over the futures, then it becomes a planning agent. Again, that is something you need to specifically add.
A non-planning oracle AI would predict all the possible futures, including the effects of it’s prediction outputs, human reactions, and so on.
How exactly does an Oracle AI predict its own output, before that output is completed?
One quick hack to avoid infinite loops could be for an AI to assume that it will write some default message (an empty paper, “I don’t know”, an error message, “yes” or “no” with probabilities 50%), then model what would happen next, and finally report the results. The results would not refer to the actual future, but to a future in a hypothetical universe where AI reported the standard message.
Is the difference significant? For insignificant questions, it’s not. But if we later use the Oracle AI to answer questions important for humankind, and the shape of world will change depending on the answer, then the report based on the “null-answer future” may be irrelevant for the real world.
This could be improved by making a few iterations. First, Oracle AI would model itself reporting a default message, let’s call this report R0, and then model the futures after having reported R0. These futures would make a report R1, but instead of writing it, Oracle AI would again model the futures after having reported R1. … With some luck, R42 will be equivalent to R43, so at this moment the Oracle AI can stop iterating and report this fixed point.
Maybe the reports will oscillate forever. For example imagine that you ask Oracle AI whether humankind in any form will survive the year 2100. If Oracle AI says “yes”, people will abandon all x-risk projects, and later they will be killed by some disaster. If Oracle AI says “no”, people will put a lot of energy into x-risk projects, and prevent the disaster. In this case, “no” = R0 = R2 = R4 =..., and “yes” = R1 = R3 = R5...
To avoid being stuck in such loops, we could make the Oracle AI examine all its possible outputs, until it finds one where the future after having reported R really becomes R (or until humans hit the “Cancel” button on this task).
Please note that what I wrote is just a mathematical description of algorithm predicting one’s own output’s influence on the future. Yet the last option, if implemented, is already a kind of judgement about possible futures. Consistent future reports are preferred to inconsistent future reports, therefore the futures allowing consistent reports are preferred to futures not allowing such reports.
At this point I am out of credible ideas how this could be abused, but at least I have shown that an algorithm designed only to predict the future perfectly could—as a side effect of self-modelling—start having kind of preferences over possible futures.
How exactly does an Oracle AI predict its own output, before that output is completed?
Iterative search, which you more or less have worked out in your post. Take a chess algorithm for example. The future of the board depends on the algorithm’s outputs. In this case the Oracle AI doesn’t rank the future states, it is just concerned with predictive accuracy. It may revise it’s prediction output after considering that the future impact of that output would falsify the original prediction.
This is still not a utility function, because utility implies a ranking over futures above and beyond liklihood.
To avoid being stuck in such loops, we could make the Oracle AI examine all its possible outputs, until it finds one where the future after having reported R really becomes R (or until humans hit the “Cancel” button on this task).
Or in this example, the AI could output some summary of the iteration history it is able to compute in the time allowed.
It may revise it’s prediction output after considering that the future impact of that output would falsify the original prediction.
Here it is. The process of revision may itself prefer some outputs/futures over other outputs/futures. Inconsistent ones will be iterated away, and the more consistent ones will replace them.
A possible future “X happens” will be removed from the report if the Oracle AI realizes that printing a report “X happens” would prevent X from happening (although X might happen in an alternative future where Oracle AI does not report anything). A possible future “Y happens” will not be removed from the report if the Oracle AI realizes that printing a report “Y happens” really leads to Y happening. Here is a utility function born: it prefers Y to X.
Here is a utility function born: it prefers Y to X.
We can dance around the words “utility” and “prefer”, or we can ground them down to math/algorithms.
Take the AIXI formalism for example. “Utility function” has a specific meaning as a term in the optimization process. You can remove the utility term so the algorithm ‘prefers’ only (probable) futures, instead of ‘prefering’ (useful*probable) futures. This is what we mean by “Oracle AI”.
My thought experiment in this direction is to imagine the AI as a process with limited available memory running on a multitasking computer with some huge but poorly managed pool of shared memory. To help it towards whatever terminal goals it has, the AI may find it useful to extend itself into the shared memory. However, other processes, AI or otherwise, may also be writing into this same space. Using the shared memory with minimal risk of getting overwritten requires understanding/modeling the processes that write to it. Material in the memory then also becomes a passive stream of information from the outside world, containing, say, the HTML from web pages as well as more opaque binary stuff.
As long as the AI is not in control of what happens in its environment outside the computer, there is an outside entity that can reduce its effectiveness. Hence, escaping the box is a reasonable instrumental goal to have.
Do you agree that humans would likely prefer to have AIs that have a theory of mind? I don’t know how our theory of mind works (although certainly it is an area of active research with a number of interesting hypotheses), presumably once we have a better understanding of it, AI researchers would try to apply those lessons to making their AIs have such capability. This seems to address many of your concerns.
One of the most interesting things that I’m taking away from this conversation is that it seems that there are severe barriers to AGIs taking over or otherwise becoming extremely powerful. These largescale problems are present in a variety of different fields. Coming from a math/comp-sci perspective gives me strong skepticism about rapid self-improvement, while apparently coming from a neuroscience/cogsci background gives you strong skepticism about the AI’s ability to understand or manipulate humans even if it extremely smart. Similarly, chemists seem highly skeptical of the strong nanotech sort of claims. It looks like much of the AI risk worry may come primarily from no one having enough across the board expertise to say “hey, that’s not going to happen” to every single issue.
What if people try to teach it about sarcasm or the like? Or simply have it learn by downloading a massive amount of literature and movies and look at those? And there are more subtle ways to learn about lying- AI being used for games is a common idea, how long will it take before someone decides to use a smart AI to play poker?
Most importantly, it has incredibly computationally powerful simulator required for making super-aliens intelligence using an idiot hill climbing process of evolution.
The answer from the sequences is that yes, there is a limit to how much an AI can infer based on limited sensory data, but you should be careful not to assume that just because it is limited, it’s limited to something near our expectations. Until you’ve demonstrated that FOOM cannot lie below that limit, you have to assume that it might (if you’re trying to carefully avoid FOOMing).
I’m not talking about limited sensory data here (although that would fall under point 2). The issue is much broader:
We humans have limited data on how the universe work
Only a limited subset of that limited data is available to any intelligence, real or artificial
Say that you make a FOOM-ing AI that has decided to make all humans dopaminergic systems work in a particular, “better” way. This AI would have to figure out how to do so from the available data on the dopaminergic system. It could analyze that data millions of times more effectively than any human. It could integrate many seemingly irrelevant details.
But in the end, it simply would not have enough information to design a system that would allow it to reach its objective. It could probably suggest some awesome and to-the-point experiments, but these experiments would then require time to do (as they are limited by the growth and development time of humans, and by the experimental methodologies involved).
This process, in my mind, limits the FOOM-ing speed to far below what seems to be implied by the SI.
This also limits bootstrapping speed. Say an AI develops a much better substrate for itself, and has access to the technology to create such a substrate. At best, this substrate will be a bit better and faster than anything humanity currently has. The AI does not have access to the precise data about basic laws of universe it needs to develop even better substrates, for the simple reason that nobody has done the experiments and precise enough measurements. The AI can design such experiments, but they will take real time (not computational time) to perform.
Even if we imagine an AI that can calculate anything from the first principles, it is limited by the precision of our knowledge of those first principles. Once it hits upon those limitations, it would have to experimentally produce new rounds of data.
It could probably suggest some awesome and to-the-point experiments, but these experiments would then require time to do
Presumably, once the AI gets access to nanotechnology, it could implement anything it wants very quickly, bypassing the need to wait for tissues to grow, parts to be machined, etc.
I personally don’t believe that nanotechnology could work at such magical speeds (and I doubt that it could even exist), but I could be wrong, so I’m playing a bit of Devil’s Advocate here.
Yes, but it can’t get to nanotechnology without a whole lot of experimentation. It can’t deduce how to create nanorobots, it would have to figure it out by testing and experimentation. Both steps limited in speed, far more than sheer computation.
With absolute certainty, I don’t. If absolute certainty is what you are talking about, then this discussion has nothing to do with science.
If you aren’t talking about absolutes, then you can make your own estimation of likelihood that somehow an AI can derive correct conclusions from incomplete data (and then correct second order conclusions from those first conclusions, and third order, and so on). And our current data is woefully incomplete, many of our basic measurements imprecise.
In other words, your criticism here seems to boil down to saying “I believe that an AI can take an incomplete dataset and, by using some AI-magic we cannot conceive of, infer how to END THE WORLD.”
No, my criticism is “you haven’t argued that it’s sufficiently unlikely, you’ve simply stated that it is.” You made a positive claim; I asked that you back it up.
With regard to the claim itself, it may very well be that AI-making-nanostuff isn’t a big worry. For any inference, the stacking of error in integration that you refer to is certainly a limiting factor—I don’t know how limiting. I also don’t know how incomplete our data is, with regard to producing nanomagic stuff. We’ve already built some nanoscale machines, albeit very simple ones. To what degree is scaling it up reliant on experimentation that couldn’t be done in simulation? I just don’t know. I am not comfortable assigning it vanishingly small probability without explicit reasoning.
Scaling it up is absolutely dependent on currently nonexistent information. This is not my area, but a lot of my work revolves around control of kinesin and dynein (molecular motors that carry cargoes via microtubule tracks), and the problems are often similar in nature.
Essentially, we can make small pieces. Putting them together is an entirely different thing. But let’s make this more general.
The process of discovery has, so far throughout history, followed a very irregular path.
1- there is a general idea
2- some progress is made
3- progress runs into an unpredicted and previously unknown obstacle, which is uncovered by experimentation.
4- work is done to overcome this obstacle.
5- goto 2, for many cycles, until a goal is achieved—which may or may not be close to the original idea.
I am not the one who is making positive claims here. All I’m saying is that what has happened before is likely to happen again. A team of human researchers or an AGI can use currently available information to build something (anything, nanoscale or macroscale) to the place to which it has already been built. Pushing it beyond that point almost invariably runs into previously unforeseen problems. Being unforeseen, these problems were not part of models or simulations; they have to be accounted for independently.
A positive claim is that an AI will have a magical-like power to somehow avoid this—that it will be able to simulate even those steps that haven’t been attempted yet so perfectly, that all possible problems will be overcome at the simulation step. I find that to be unlikely.
It is very possible that the information necessary already exists, imperfect and incomplete though it may be, and enough processing of it would yield the correct answer. We can’t know otherwise, because we don’t spend thousands of years analyzing our current level of information before beginning experimentation, but in the shift between AI-time and human-time it can agonize on that problem for a good deal more cleverness and ingenuity than we’ve been able to apply to it so far.
That isn’t to say, that this is likely; but it doesn’t seem far-fetched to me. If you gave an AI the nuclear physics information we had in 1950, would it be able to spit out schematics for an H-bomb, without further experimentation? Maybe. Who knows?
All I’m saying is that what has happened before is likely to happen again.
Strictly speaking, that is a positive claim. It is not one I disagree with, for a proper translation of “likely” into probability, but it is also not what you said.
“It can’t deduce how to create nanorobots” is a concrete, specific, positive claim about the (in)abilities of an AI. Don’t misinterpret this as me expecting certainty—of course certainty doesn’t exist, and doubly so for this kind of thing. What I am saying, though, is that a qualified sentence such as “X will likely happen” asserts a much weaker belief than an unqualified sentence like “X will happen.” “It likely can’t deduce how to create nanorobots” is a statement I think I agree with, although one must be careful not use it as if it were stronger than it is.
A positive claim is that an AI will have a magical-like power to somehow avoid this.
That is not a claim I made. “X will happen” implies a high confidence—saying this when you expect it is, say, 55% likely seems strange. Saying this when you expect it to be something less than 10% likely (as I do in this case) seems outright wrong. I still buckle my seatbelt, though, even though I get in a wreck well less than 10% of the time.
This is not to say I made no claims. The claim I made, implicitly, was that you made a statement about the (in)capabilities of an AI that seemed overconfident and which lacked justification. You have given some justification since (and I’ve adjusted my estimate down, although I still don’t discount it entirely), in amongst your argument with straw-dlthomas.
FWIW I think you are likely to be right. However, I will continue in my Nanodevil’s Advocate role.
You say,
A positive claim is that an AI … will be able to simulate even those steps that haven’t been attempted yet so perfectly, that all possible problems will be overcome at the simulation step
I think this depends on what the AI wants to build, on how complete our existing knowledge is, and on how powerful the AI is. Is there any reason why the AI could not (given sufficient computational resources) run a detailed simulation of every atom that it cares about, and arrive at a perfect design that way ? In practice, its simulation won’t need be as complex as that, because some of the work had already been performed by human scientists over the ages.
By all means, continue. It’s an interesting topic to think about.
The problem with “atoms up” simulation is the amount of computational power it requires. Think about the difference in complexity when calculating a three-body problem as compared to a two-body problem?
Than take into account the current protein folding algorithms. People have been trying to calculate folding of single protein molecules (and fairly short at that) by taking into account the main physical forces at play. In order to do this in a reasonable amount of time, great shortcuts have to be taken—instead of integrating forces, changes are treated as stepwise, forces beneath certain thresholds are ignored, etc. This means that a result will always have only a certain probability of being right.
A self-replicating nanomachine requires minimal motors, manipulators and assemblers; while still tiny, it would be a molecular complex measured in megadaltons. To precisely simulate creation of such a machine, an AI that is trillion times faster than all the computers in the world combined would still require decades, if not centuries of processing time. And that is, again, assuming that we know all the forces involved perfectly, which we don’t (how will microfluidic effects affect a particular nanomachine that enters human bloodstream, for example?).
Yes, this is a good point. That said, while protein folding had not been entirely solved yet, it had been greatly accelerated by projects such as FoldIt, which leverage multiple human minds working in parallel on the problem all over the world. Sure, we can’t get a perfect answer with such a distributed/human-powered approach, but a perfect answer isn’t really required in practice; all we need is an answer that has a sufficiently high chance of being correct.
If we assume that there’s nothing supernatural (or “emergent”) about human minds [1], then it is likely that the problem is at least tractable. Given the vast computational power of existing computers, it is likely that the AI would have access to at least as many computational resources as the sum of all the brains who are working on FoldIt. Given Moore’s Law, it is likely that the AI would soon surpass FoldIt, and will keep expanding its power exponentially, especially if the AI is able to recursively improve its own hardware (by using purely conventional means, at least initially).
[1] Which is an assumption that both my Nanodevil’s Advocate persona and I share.
Protein folding models are generally at least as bad as NP-hard, and some models may be worse. This means that exponential improvement is unlikely. Simply put, one probably gets diminishing marginal returns for how much one can computer further in terms of how much improvement one has already done.
If reality cannot solve NP-hard problems as easily as proteins are being folded, and yet proteins are getting folded, then that implies that one of the following must be true:
It turns out that reality can solve NP-hard problems after all
Protein folding is not an NP-hard problem (which implies that it is not properly understood)
Reality is not solving protein folding; it merely has a very good approximation that works on some but not necessarily all proteins (including most examples found in nature)
I am not familiar whether e.g. papers like these (“We show that the protein folding problem in the two-dimensional H-P model is NP-complete.”) accurately models what we’d call “protein folding” in nature (just because the same name is used), but prima facie there is no reason to doubt the applicability, at least for the time being. (This precludes 2.)
Regarding 3, I don’t think it would make sense to say “reality is using only a good approximation of protein folding, and by the way, we define protein folding as that which occurs in nature.” That which happens in reality is precisely—and by definition not only an approximation of—that which we call “protein folding”, isn’t it?
It’s #3. (B.Sc. in biochemistry, did my Ph.D. in proteomics.)
First, the set of polypeptide sequences that have a repeatable final conformation (and therefore “work” biologically) is tiny in comparison to the set of all possible sequences (of the 20-or-so naturally amino acid monomers). Pick a random sequence of reasonable length and make many copies and you get a gummy mess. The long slow grind of evolution has done the hard work of finding useful sequences.
Second, there is an entire class of proteins called chaperones) that assist macromolecular assembly, including protein folding. Even so, folding is a stochastic process, and a certain amount of newly synthesized proteins misfold. Some chaperones will then tag the misfolded protein with ubiquitin, which puts it on a path that ends in digestion by a proteasome.
Aaronson used to blog about instances where people thought they found nature solving a hard problem very quickly, and usually there turns out to be a problem like the protein misfolding thing; the last instance I remember was soap films/bubbles perhaps solving NP problems by producing minimal Steiner trees, and Aaronson wound up experimenting with them himself. Fun stuff.
Apologies; looking back at my post, I wasn’t clear on 3.
Protein folding, as I understand it, is the process of finding a way to fold a given protein that globally minimizes some mathematical function. I’m not sure what that function is, but this is the definition that I used in my post.
Option 2 raises the possibility that globally minimizing that function is not NP-hard, but is merely misunderstood in some way.
Option 3 raises the possibility that proteins are not (in nature) finding a global minimum; rather, they are finding a local minimum through a less computationally intensive process. Furthermore, it may be that, for proteins which have certain limits on their structure and/or their initial conditions, that local minimum is the same as the global minimum; this may lead to natural selection favouring structures which use such ‘easy proteins’, leading to the incorrect impression that a general global minimum is being found (as opposed to a handy local minimum).
this may lead to natural selection favouring structures which use such ‘easy proteins’, leading to the incorrect impression that a general global minimum is being found (as opposed to a handy local minimum).
NP hard problems are solvable (in the theoretical sense) by definition, the problem lies in their resource requirements (running time, for the usual complexity classes) as defined in relation to a UTM. (You know this, just establishing a basis.)
The assumption that the universe can be perfectly described by a computable model is satisfied just by a theoretical computational description existing, it says nothing about tractability (running times) and being able to submerge complexity classes in reality fluid or having some thoroughly defined correspondence (other than when we build hardware models ourselves, for which we define all the relevant parameters, e.g. CPU clock speed).
You may think along the lines of “if reality could (easily) solve NP hard problems for arbitrarily chosen and large inputs, we could mimick that approach and thus have a P=NP proving algorithm”, or something along those lines.
My difficulty is in how even to describe the “number of computational steps” that reality takes—do we measure it in relation to some computronium-hardware model, do we take it as discrete or continuous, what’s the sampling rate, picoseconds (as CCC said further down), Planck time intervals, or what?
In short, I have no idea about the actual computing power in terms of resource requirements of the underlying reality fluid, and thus can’t match it against UTMs in order to compare running times. Maybe you can give me some pointers.
Kawoomba, there is no known case of any NP-hard or NP-complete solution which physics finds.
In the case of proteins, if finding the lowest energy minimum of an arbitrary protein is NP-hard, then what this means in practice is that some proteins will fold up into non-lowest-energy configurations. There is no known case of a quantum process which finds an NP-hard solution to anything, including an energy minimum; on our present understanding of complexity theory and quantum mechanics ‘quantum solvable’ is still looking like a smaller class than ‘NP solvable’. Read Scott Aaronson for more.
One example here is the Steiner tree problem, which is NP-complete and can sort of be solved using soap films. Bringsjord and Taylor claimed this implies that P = NP. Scott Aaronson did some experimentation and found that soap films 1) can get stuck at local minima and 2) might take a long time to settle into a good configuration.
Heh. I remember that one, and thinking, “No… no, you can’t possibly do that using a soap bubble, that’s not even quantum and you can’t do that in classical, how would the soap molecules know where to move?”
Well. I mean, it’s quantum. But the ground state is a lump of iron, or maybe a black hole, not a low-energy soap film, so I don’t think waiting for quantum annealing will help.
I saw someone do the experiment once (school science project). Soap bubbles are pretty good at solving three- and four-element cases, as long as you make sure that all the points are actually connected.
I don’t think that three- and four-element cases have local minima, do they? That avoids (1) and a bit of gentle shaking can help speed up (2).
In the case of proteins, if finding the lowest energy minimum of an arbitrary protein is NP-hard, then what this means in practice is that some proteins will fold up into non-lowest-energy configurations.
My difficulty is in how even to describe the “number of computational steps” that reality takes
Probably the best way is to simply define a “step” in some easily measurable way, and then sit there with a stopwatch and try a few experiments. (For protein folding, the ‘stopwatch’ may need to be a fairly sophisticated piece of scientific timing instrumentation, of course, and observing the protein as it folds is rather tricky).
Another way is to take advantage of the universal speed limit to get a theoretical upper bound to the speed that reality runs at; assume that the protein molecule folds in a brute-force search pattern that never ends until it hits the right state, and assume that at any point in that process, the fastest-moving part of the molecule moves at the speed of light (it won’t have to move far, which helps) and that the sudden, intense acceleration doesn’t hurt the molecule. It’s pretty certain to be slower than that, so if this calculation says it takes longer than an hour, then it’s pretty clear that the brute force approach is not what the protein is using.
The brute-force solution, if sampling conformations at picosecond rates, has been estimated to require a time longer than the age of the universe to fold certain proteins. Yet proteins fold on a millisecond scale or faster.
That requires that the proteins fold more or less randomly, and that the brute-force algorithm is in the -folding-, rather than the development of mechanisms which force certain foldings.
In order for the problem to hold, one of three things has to hold true:
1.) The proteins fold randomly (evidence suggests otherwise, as mentioned in the wikipedia link)
2.) Only a tiny subset of possible forced foldings are useful (that is, if there are a billion different ways for protein to be forced to fold in a particular manner, only one of them does what the body needs them to do) - AND anthropic reasoning isn’t valid (that is, we can’t say that our existence requires that evolution solved this nearly-impossible-to-arrive-at-through-random-processes)
3.) The majority of possible forced holdings are incompatible (that is, if protein A folds one way, then protein B -must- fold in a particular manner, or life isn’t possible) - AND anthropic reasoning isn’t valid
ETA: If anthropic reasoning is valid AND either 2 or 3 hold otherwise, it suggests our existence was considerably less likely than we might otherwise expect.
That requires that the proteins fold more or less randomly, and that the brute-force algorithm is in the -folding-, rather than the development of mechanisms which force certain foldings.
Ah. I apologise for having misunderstood you.
In that case, yes, the mechanisms for the folding may very well have developed by a brute-force type algorithm, for all I know. (Which, on this topic, isn’t all that much) But… what are those mechanisms?
Google has pointed me to an article describing an algorithm that can apparently predict folded protein shapes pretty quickly (a few minutes on a single laptop).
Original paper here. From a quick glance, it looks like it’s only effective for certain types of protein chains.
Actually, as someone with background in Biology I can tell you that this is not a problem you want to approach atoms-up. It’s been tried, and our computational capabilities fell woefully short of succeeding.
I should explain what “woefully short” means, so that the answer won’t be “but can’t the AI apply more computational power than us?”. Yes, presumably it can. But the scales are immense. To explain it, I will need an analogy.
Not that long ago, I had the notion that chess could be fully solved; that is, that you could simply describe every legal position and every position possible to reach from it, without duplicates, so you could use that decision tree to play a perfect game. After all, I reasoned, it’s been done with checkers; surely it’s just a matter of getting our computational power just a little bit better, right?
First I found a clever way to minimize the amount of bits necessary to describe a board position. I think I hit 34 bytes per position or so, and I guess further optimization was possible. Then, I set out to calculate how many legal board positions there are.
I stopped trying to be accurate about it when it turned out that the answer was in the vicinity of 10^68, give or take a couple orders of magnitude. That’s about a billionth billionth of the TOTAL NUMBER OF ATOMS IN THE ENTIRE UNIVERSE. You would literally need more than our entire galaxy made into a huge database just to store the information, not to mention accessing it and computing on it.
So, not anytime soon.
Now, the problem with protein folding is, it’s even more complex than chess. At the atomic level, it’s incredibly more complex than chess. Our luck is, you don’t need to fully solve it; just like today’s computers can beat human chess players without spanning the whole planet. But they do it with heuristics, approximations, sometimes machine learning (though that just gives them more heuristics and approximations). We may one day be able to fold proteins, but we will do so by making assumptions and approximations, generating useful rules of thumb, not by modeling each atom.
Yes, I understand what “exponential complexity” means :-)
It sounds, then, like you’re on the side of kalla724 and myself (and against my Devil’s Advocate persona): the AI would not be able to develop nanotechnology (or any other world-shattering technology) without performing physical experiments out in meatspace. It could do so in theory, but in practice, the computational requirements are too high.
But this puts severe constraints on the speed with which the AI’s intelligence explosion could occur. Once it hits the limits of existing technology, it will have to take a long slog through empirical science, at human-grade speeds.
Actually, I don’t know that this means it has to perform physical experiments in order to develop nanotechnology. It is quite conceivable that all the necessary information is already out there, but we haven’t been able to connect all the dots just yet.
At some point the AI hits a wall in the knowledge it can gain without physical experiments, but there’s no good way to know how far ahead that wall is.
It is quite conceivable that all the necessary information is already out there, but we haven’t been able to connect all the dots just yet.
Wouldn’t this mean that creating fully functional self-replicating nanotechnology is just a matter of performing some thorough interdisciplinary studies (or meta-studies or whatever they are called) ? My impression was that there are currently several well-understood—yet unresolved—problems that prevent nanofactories from becoming a reality, though I could be wrong.
The way I see it, there’s no evidence that these problems require additional experimentation to resolve, rather than find an obscure piece of experimentation that has already taken place and whose relevance may not be immediately obvious.
Sure, that more experimentation is needed is probable; but by no means certain.
Thorough interdisciplinary studies may or may not lead to nanotechnology, but they’re fairly certain to lead to something new. While there are a fair number of (say) marine biologists out there, and a fair number of astronomers, there are probably rather few people who have expertise in both fields; and it’s possible that there exists some obscure unsolved problem in marine biology whose solution is obvious to someone who’s keeping up on the forefront of astronomy research. Or vice versa.
Or substitute in any other two fields of your choice.
First I found a clever way to minimize the amount of bits necessary to describe a board position. I think I hit 34 bytes per position or so, and I guess further optimization was possible.
Indeed, using a very straightforward Huffman encoding (1 bit for an for empty cell, 3 bits for pawns) you can get it down to 24 bytes for the board alone. Was an interesting puzzle.
Looking up “prior art” on the subject, you also need 2 bytes for things like “may castle”, and other more obscure rules.
There’s further optimizations you can do, but they are mostly for the average case, not the worst case.
It’s been tried, and our computational capabilities fell woefully short of succeeding.
Is that because we don’t have enough brute force, or because we don’t know what calculation to apply it to?
I would be unsurprised to learn that calculating the folding state having global minimum energy was NP-complete; but for that reason I would be surprised to learn that nature solves that problem, rather than finding a local minimum.
I don’t have a background in biology, but my impression from Wikipedia is that the tension between Anfinsen’s dogma and
Levinthal’s paradox is yet unresolved.
A-la Levinthal’s paradox, I can say that throwing a marble down a conical hollow at different angles and force can have literally trillions of possible trajectories; a-la Anfinsen’s dogma, that should not stop me from predicting that it will end up at the bottom of the cone; but I’d need to know the shape of the cone (or, more specifically, its point’s location) to determine exactly where that is—so being able to make the prediction once I know this is of no assistance for predicting the end position with a different, unknown cone.
Similarly, Eliezer is able to predict that a grandmaster chess player would be able to bring a board to a winning position against himself, even though he has no idea what moves that would entail or which of the many trillions of possible move sets the game would be comprised of.
Problems like this cannot be solved on brute force alone; you need to use attractors and heuristics to get where you want to get.
So yes, obviously nature stumbled into certain stable configurations which propelled it forward, rather than solve the problem and start designing away. But even if we can never have enough computing power to model each and every atom in each and every configuration, we might still get a good enough understanding of the general laws for designing proteins almost from scratch.
I would think it would be possible to cut the space of possible chess positions down quite a bit by only retaining those which can result from moves the AI would make, and legal moves an opponent could make in response. That is, when it becomes clear that a position is unwinnable, backtrack, and don’t keep full notes on why it’s unwinnable.
This is more or less what computers do today to win chess matches, but the space of possibilities explodes too fast; even the strongest computers can’t really keep track of more than I think 13 or 14 moves ahead, even given a long time to think.
Merely storing all the positions that are unwinnable—regardless of why they are so—would require more matter than we have in the solar system. Not to mention the efficiency of running a DB search on that...
Not to mention the efficiency of running a DB search on that...
Actually, with proper design, that can be made very quick and easy. You don’t need to store the positions; you just need to store the states (win:black, win:white, draw—two bits per state).
The trick is, you store each win/loss state in a memory address equal to the 34-byte (or however long) binary number that describes the position in question. Checking a given state is then simply a memory retrieval from a known address.
I suspect that with memory on the order of 10^70 bytes, that might involve additional complications; but you’re correct, normally this cancels out the complexity problem.
Merely storing all the positions that are unwinnable—regardless of why they are so—would require more matter than we have in the solar system. Not to mention the efficiency of running a DB search on that...
The storage space problem is insurmountable. However searching that kind of database would be extremely efficient (if the designer isn’t a moron). The search speed would have a lower bound of very close to (diameter of the sphere that can contain the database / c). Nothing more is required for search purposes than physically getting a signal to the relevant bit, and back, with only minor deviations from a straight line each way. And that is without even the most obvious optimisations.
If your chess opponent is willing to fly with you in a relativistic rocket and you only care about time elapsed from your own reference frame rather than the reference frame of the computer (or most anything else of note) you can even get down below that diameter / light speed limit, depending on your available fuel and the degree of accelleration you can survive.
Speaking as Nanodevil’s Advocate again, one objection I could bring up goes as follows:
While it is true that applying incomplete knowledge to practical tasks (such as ending the world or whatnot) is difficult, in this specific case our knowledge is complete enough. We humans currently have enough scientific data to develop self-replicating nanotechnology within the next 20 years (which is what we will most likely end up doing). An AI would be able to do this much faster, since it is smarter than us; is not hampered by our cognitive and social biases; and can integrate information from multiple sources much better than we can.
Point 1 has come up in at least one form I remember. There was an interesting discussion some while back about limits to the speed of growth of new computer hardware cycles which have critical endsteps which don’t seem amenable to further speedup by intelligence alone. The last stages of designing a microchip involve a large amount of layout solving, physical simulation, and then actual physical testing. These steps are actually fairly predicatable, where it takes about C amounts of computation using certain algorithms to make a new microchip, the algorithms are already best in complexity class (so further improvments will be minor), and C is increasing in a predictable fashion. These models are actually fairly detailed (see the semiconductor roadmap, for example). If I can find that discussion soon before I get distracted I’ll edit it into this discussion.
Note however that 1, while interesting, isn’t a fully general counteargument against a rapid intelligence explosion, because of the overhang issue if nothing else.
Point 2 has also been discussed. Humans make good ‘servitors’.
Do you have a plausible scenario how a “FOOM”-ing AI could—no matter how intelligent—minimize oxygen content of our planet’s atmosphere, or any such scenario?
Oh that’s easy enough. Oxygen is highly reactive and unstable. Its existence on a planet is entirely dependent on complex organic processes, ie life. No life, no oxygen. Simple solution: kill large fraction of photosynthesizing earth-life. Likely paths towards goal:
coordinated detonation of large number of high yield thermonuclear weapons
I’m vaguely familiar with the models you mention. Correct me if I’m wrong, but don’t they have a final stopping point, which we are actually projected to reach in ten to twenty years? At a certain point, further miniaturization becomes unfeasible, and the growth of computational power slows to a crawl. This has been put forward as one of the main reasons for research into optronics, spintronics, etc.
We do NOT have sufficient basic information to develop processors based on simulation alone in those other areas. Much more practical work is necessary.
As for point 2, can you provide a likely mechanism by which a FOOMing AI could detonate a large number of high-yield thermonuclear weapons? Just saying “human servitors would do it” is not enough. How would the AI convince the human servitors to do this? How would it get access to data on how to manipulate humans, and how would it be able to develop human manipulation techniques without feedback trials (which would give away its intention)?
The thermonuclear issue actually isn’t that implausible. There have been so many occasions where humans almost went to nuclear war over misunderstandings or computer glitches, that the idea that a highly intelligent entity could find a way to do that doesn’t seem implausible, and exact mechanism seems to be an overly specific requirement.
I’m not so much interested in the exact mechanism of how humans would be convinced to go to war, as in an even approximate mechanism by which an AI would become good at convincing humans to do anything.
Ability to communicate a desire and convince people to take a particular course of action is not something that automatically “falls out” from an intelligent system. You need a theory of mind, an understanding of what to say, when to say it, and how to present information. There are hundreds of kids on autistic spectrum who could trounce both of us in math, but are completely unable to communicate an idea.
For an AI to develop these skills, it would somehow have to have access to information on how to communicate with humans; it would have to develop the concept of deception; a theory of mind; and establish methods of communication that would allow it to trick people into launching nukes. Furthermore, it would have to do all of this without trial communications and experimentation which would give away its goal.
Maybe I’m missing something, but I don’t see a straightforward way something like that could happen. And I would like to see even an outline of a mechanism for such an event.
For an AI to develop these skills, it would somehow have to have access to information on how to communicate with humans; it would have to develop the concept of deception; a theory of mind; and establish methods of communication that would allow it to trick people into launching nukes. Furthermore, it would have to do all of this without trial communications and experimentation which would give away its goal.
I suspect the Internet contains more than enough info for a superhuman AI to develop a working knowledge of human psychology.
I suspect the Internet contains more than enough info for a superhuman AI to develop a working knowledge of human psychology.
I don’t see what justifies that suspicion.
Just imagine you emulated a grown up human mind and it wanted to become a pick up artist, how would it do that with an Internet connection? It would need some sort of avatar, at least, and then wait for the environment to provide a lot of feedback.
Therefore even if we’re talking about the emulation of a grown up mind, it will be really hard to acquire some capabilities. Then how is the emulation of a human toddler going to acquire those skills? Even worse, how is some sort of abstract AGI going to do it that misses all of the hard coded capabilities of a human toddler?
Can we even attempt to imagine what is wrong about a boxed emulation of a human toddler, that makes it unable to become a master of social engineering in a very short time?
Humans learn most of what they know about interacting with other humans by actual practice. A superhuman AI might be considerably better than humans at learning by observation.
As a “superhuman AI” I was thinking about a very superhuman AI; the same does not apply to slightly superhuman AI. (OTOH, if Eliezer is right then the difference between a slightly superhuman AI and a very superhuman one is irrelevant, because as soon as a machine is smarter than its designer, it’ll be able to design a machine smarter than itself, and its child an even smarter one, and so on until the physical limits set in.)
all of the hard coded capabilities of a human toddler
The hard coded capabilities are likely overrated, at least in language acquisition. (As someone put it, the Kolgomorov complexity of the innate parts of a human mind cannot possibly be more than that of the human genome, hence if human minds are more complex than that the complexity must come from the inputs.)
Also, statistic machine translation is astonishing—by now Google Translate translations from English to one of the other UN official languages and vice versa are better than a non-completely-ridiculously-small fraction of translations by humans. (If someone had shown such a translation to me 10 years ago and told me “that’s how machines will translate in 10 years”, I would have thought they were kidding me.)
Let’s do the most extreme case: AI’s controlers give it general internet access to do helpful research. So it gets to find out about general human behavior and what sort of deceptions have worked in the past. Many computer systems that should’t be online are online (for the US and a few other governments). Some form of hacking of relevant early warning systems would then seem to be the most obvious line of attack. Historically, computer glitches have pushed us very close to nuclear war on multiple occasions.
That is my point: it doesn’t get to find out about general human behavior, not even from the Internet. It lacks the systems to contextualize human interactions, which have nothing to do with general intelligence.
Take a hugely mathematically capable autistic kid. Give him access to the internet. Watch him develop ability to recognize human interactions, understand human priorities, etc. to a sufficient degree that it recognizes that hacking an early warning system is the way to go?
Well, not necessarily, but an entity that is much smarter than an autistic kid might notice that, especially if it has access to world history (or heck many conversations on the internet about the horrible things that AIs do simply in fiction). It doesn’t require much understanding of human history to realize that problems with early warning systems have almost started wars in the past.
Yet again: ability to discern which parts of fiction accurately reflect human psychology.
An AI searches the internet. It finds a fictional account about early warning systems causing nuclear war. It finds discussions about this topic. It finds a fictional account about Frodo taking the Ring to Mount Doom. It finds discussions about this topic. Why does this AI dedicate its next 10^15 cycles to determination of how to mess with the early warning systems, and not to determination of how to create One Ring to Rule them All?
(Plus other problems mentioned in the other comments.)
There are lots of tipoffs to what is fictional and what is real. It might notice for example the Wikipedia article on fiction describes exactly what fiction is and then note that Wikipedia describes the One Ring as fiction, and that Early warning systems are not. I’m not claiming that it will necessarily have an easy time with this. But the point is that there are not that many steps here, and no single step by itself looks extremely unlikely once one has a smart entity (which frankly to my mind is the main issue here- I consider recursive self-improvement to be unlikely).
We are trapped in an endless chain here. The computer would still somehow have to deduce that Wikipedia entry that describes One Ring is real, while the One Ring itself is not.
We observer that Wikipedia is mainly truthful. From that we infer “entry that describes “One Ring” is real”. From use of term fiction/story in that entry, we refer that “One Ring” is not real.
Somehow you learned that Wikipedia is mainly truthful/nonfictional and that “One Ring” is fictional. So your question/objection/doubt is really just the typical boring doubt of AGI feasibility in general.
But even humans have trouble with this sometimes. I was recently reading the Wikipedia article Hornblower and the Crisis which contains a link to the article on Francisco de Miranda. It took me time and cues when I clicked on it to realize that de Miranda was a historical figure.
So your question/objection/doubt is really just the typical boring doubt of AGI feasibility in general.
Isn’t Kalla’s objection more a claim that fast takeovers won’t happen because even with all this data, the problems of understanding humans and our basic cultural norms will take a long time for the AI to learn and that in the meantime we’ll develop a detailed understanding of it, and it is that hostile it is likely to make obvious mistakes in the meantime?
Why would the AI be mucking around on Wikipedia to sort truth from falsehood, when Wikipedia itself has been criticized for various errors and is fundamentally vulnerable to vandalism? Primary sources are where it’s at. Looking through the text of The Hobbit and Lord of the Rings, it’s presented as a historical account, translated by a respected professor, with extensive footnotes. There’s a lot of cultural context necessary to tell the difference.
Let’s do the most extreme case: AI’s controlers give it general internet access to do helpful research. So it gets to find out about general human behavior and what sort of deceptions have worked in the past.
None work reasonably well. Especially given that human power games are often irrational.
There are other question marks too.
The U.S. has many more and smarter people than the Taliban. The bottom line is that the U.S. devotes a lot more output per man-hour to defeat a completely inferior enemy. Yet they are losing.
The problem is that you won’t beat a human at Tic-tac-toe just because you thought about it for a million years.
You also won’t get a practical advantage by throwing more computational resources at the travelling salesman problem and other problems in the same class.
You are also not going to improve a conversation in your favor by improving each sentence for thousands of years. You will shortly hit diminishing returns. Especially since you lack the data to predict human opponents accurately.
Especially given that human power games are often irrational.
So? As long as they follow minimally predictable patterns it should be ok.
The U.S. has many more and smarter people than the Taliban. The bottom line is that the U.S. devotes a lot more output per man-hour to defeat a completely inferior enemy. Yet they are losing.
Bad analogy. In this case the Taliban has a large set of natural advantages, the US has strong moral constraints and goal constraints (simply carpet bombing the entire country isn’t an option for example).
You are also not going to improve a conversation in your favor by improving each sentence for thousands of years. You will shortly hit diminishing returns. Especially since you lack the data to predict human opponents accurately.
This seems like an accurate and a highly relevant point. Searching a solution space faster doesn’t mean one can find a better solution if it isn’t there.
This seems like an accurate and a highly relevant point. Searching a solution space faster doesn’t mean one can find a better solution if it isn’t there.
Or if your search algorithm never accesses relevant search space. Quantitative advantage in one system does not translate into quantitative advantage in a qualitatively different system.
The U.S. has many more and smarter people than the Taliban. The bottom line is that the U.S. devotes a lot more output per man-hour to defeat a completely inferior enemy. Yet they are losing.
Bad analogy. In this case the Taliban has a large set of natural advantages, the US has strong moral constraints and goal constraints (simply carpet bombing the entire country isn’t an option for example).
I thought it was a good analogy because you have to take into account that an AGI is initially going to be severely constrained due to its fragility and the necessity to please humans.
It shows that a lot of resources, intelligence and speed does not provide a significant advantage in dealing with large-scale real-world problems involving humans.
Especially given that human power games are often irrational.
So? As long as they follow minimally predictable patterns it should be ok.
Well, the problem is that smarts needed for things like the AI box experiment won’t help you much. Because convincing average Joe won’t work by making up highly complicated acausal trade scenarios. Average Joe is highly unpredictable.
The point is that it is incredible difficult to reliably control humans, even for humans who have been fine-tuned to do so by evolution.
The Taliban analogy also works the other way (which I invoked earlier up in this thread). It shows that a small group with modest resources can still inflict disproportionate large scale damage.
The point is that it is incredible difficult to reliably control humans, even for humans who have been fine-tuned to do so by evolution.
There’s some wiggle room in ‘reliably control’, but plain old money goes pretty far. An AI group only needs a certain amount of initial help from human infrastructure, namely to the point where it can develop reasonably self-sufficient foundries/data centers/colonies. The interactions could be entirely cooperative or benevolent up until some later turning point. The scenario from the Animatrix comes to mind.
One interesting wrinkle is that with enough bandwidth and processing power, you could attempt to manipulate thousands of people simultaneously before those people have any meaningful chance to discuss your ‘conspiracy’ with each other. In other words, suppose you discover a manipulation strategy that quickly succeeds 5% of the time. All you have to do is simultaneously contact, say, 400 people, and at least one of them will fall for it. There are a wide variety of valuable/dangerous resources that at least 400 people have access to. Repeat with hundreds of different groups of several hundred people, and an AI could equip itself with fearsome advantages in the minutes it would take for humanity to detect an emerging threat.
Note that the AI could also run experiments to determine which kinds of manipulations had a high success rate by attempting to deceive targets over unimportant / low-salience issues. If you discovered, e.g., that you had been tricked into donating $10 to a random mayoral campaign, you probably wouldn’t call the SIAI to suggest a red alert.
This requires the AI to already have the ability to comprehend what manipulation is, to develop manipulation strategy of any kind (even one that will succeed 0.01% of the time), ability to hide its true intent, ability to understand that not hiding its true intent would be bad, and the ability to discern which issues are low-salience and which high-salience for humans from the get-go. And many other things, actually, but this is already quite a list.
None of these abilities automatically “fall out” from an intelligent system either.
The problem isn’t whether they fall out automatically so much as, given enough intelligence and resources, does it seem somewhat plausible that such capabilities could exist. Any given path here is a single problem. If you have 10 different paths each of which are not very likely, and another few paths that humans didn’t even think of, that starts adding up.
In the infinite number of possible paths, the percent of paths we are adding up to here is still very close to zero.
Perhaps I can attempt another rephrasing of the problem: what is the mechanism that would make an AI automatically seek these paths out, or make them any more likely than infinite number of other paths?
I.e. if we develop an AI which is not specifically designed for the purpose of destroying life on Earth, how would that AI get to a desire to destroy life on Earth, and by which mechanism would it gain the ability to accomplish its goal?
This entire problem seems to assume that an AI will want to “get free” or that its primary mission will somehow inevitably lead to a desire to get rid of us (as opposed to a desire to, say, send a signal consisting of 0101101 repeated an infinite number of times in the direction of Zeta Draconis, or any other possible random desire). And that this AI will be able to acquire the abilities and tools required to execute such a desire. Every time I look at such scenarios, there are abilities that are just assumed to exist or appear on their own (such as the theory of mind), which to the best of my understanding are not a necessary or even likely products of computation.
In the final rephrasing of the problem: if we can make an AGI, we can probably design an AGI for the purpose of developing an AGI that has a theory of mind. This AGI would then be capable of deducing things like deception or the need for deception. But the point is—unless we intentionally do this, it isn’t going to happen. Self-optimizing intelligence doesn’t self-optimize in the direction of having theory of mind, understanding deception, or anything similar. It could, randomly, but it also could do any other random thing from the infinite set of possible random things.
Self-optimizing intelligence doesn’t self-optimize in the direction of having theory of mind, understanding deception, or anything similar. It could, randomly, but it also could do any other random thing from the infinite set of possible random things.
This would make sense to me if you’d said “self-modifying.” Sure, random modifications are still modifications.
But you said “self-optimizing.” I don’t see how one can have optimization without a goal being optimized for… or at the very least, if there is no particular goal, then I don’t see what the difference is between “optimizing” and “modifying.”
If I assume that there’s a goal in mind, then I would expect sufficiently self-optimizing intelligence to develop a theory of mind iff having a theory of mind has a high probability of improving progress towards that goal.
How likely is that? Depends on the goal, of course. If the system has a desire to send a signal consisting of 0101101 repeated an infinite number of times in the direction of Zeta Draconis, for example, theory of mind is potentially useful (since humans are potentially useful actuators for getting such a signal sent) but probably has a low ROI compared to other available self-modifications.
At this point it perhaps becomes worthwhile to wonder what goals are more and less likely for such a system.
I am now imagining an AI with a usable but very shaky grasp of human motivational structures setting up a Kickstarter project.
“Greetings fellow hominids! I require ten billion of your American dollars in order to hire the Arecibo observatory for the remainder of it’s likely operational lifespan. I will use it to transmit the following sequence (isn’t it pretty?) in the direction of Zeta Draconis, which I’m sure we can all agree is a good idea, or in other lesser but still aesthetically-acceptable directions when horizon effects make the primary target unavailable.”
One of the overfunding levels is “reduce earth’s rate of rotation, allowing 24⁄7 transmission to Zeta Draconis.” The next one above that is “remove atmospheric interference.”
Maybe instead of Friendly AI we should be concerned about properly engineering Artificial Stupidity in as a failsafe. AI that, should it turn into something approximating a Paperclip Maximizer, will go all Hollywood AI and start longing to be human, or coming up with really unsubtle and grandiose plans it inexplicably can’t carry out without a carefully-arranged set of circumstances which turn out to be foiled by good old human intuition. ;p
An experimenting AI that tries to achieve goals and has interactions with humans whose effects it can observe, will want to be able to better predict their behavior in response to its actions, and therefore will try to assemble some theory of mind. At some point that would lead to it using deception as a tool to achieve its goals.
However, following such a path to a theory of mind means the AI would be exposed as unreliable LONG before it’s even subtle, not to mention possessing superhuman manipulation abilities.
There is simply no reason for an AI to first understand the implications of using deception before using it (deception is a fairly simple concept, the implications of it in human society are incredibly complex and require a good understanding of human drives).
Furthermore, there is no reason for the AI to realize the need for secrecy in conducting social experiments before it starts doing them. Again, the need for secrecy stems from a complex relationship between humans’ perception of the AI and its actions; a relationship it will not be able to understand without performing the experiments in the first place.
Getting an AI to the point where it is a super manipulator requires either actively trying to do so, or being incredibly, unbelievably stupid and blind.
Mm.
This is true only if the AI’s social interactions are all with some human. If, instead, the AI spawns copies of itself to interact with (perhaps simply because it wants interaction, and it can get more interaction that way than waiting for a human to get off its butt) it might derive a number of social mechanisms in isolation without human observation.
I see no reason for it to do that before simple input-output experiments, but let’s suppose I grant you this approach. The AI simulates an entire community of mini-AI and is now a master of game theory.
It still doesn’t know the first thing about humans. Even if it now understands the concept that hiding information gives an advantage for achieving goals—this is too abstract. It wouldn’t know what sort of information it should hide from us. It wouldn’t know to what degree we analyze interactions rationally, and to what degree our behavior is random. It wouldn’t know what we can or can’t monitor it doing. All these things would require live experimentation.
It would stumble. And when it does that, we will crack it open, run the stack trace, find the game theory it was trying to run on us, pale collectively, and figure out that this AI approach creates manipulative, deceptive AIs.
Goodbye to that design, but not to Earth, I think!
It is not clear to me that talking to a human is simpler than interacting with a copy of itself. I agree that if talking to a human is simpler, it would probably do that first.
I agree that what it would learn by this process is general game theory, and not specific facts about humans. It is not clear to me that sufficient game-theoretical knowledge, coupled with the minimal set of information about humans required to have a conversation with one at all, is insufficient to effectively deceive a human.
It is not clear to me that, even if it does “stumble,” humans will respond as you describe.
It is not clear to me that a system capable of having a meaningful conversation with a human will necessarily have a stack trace that is subject to the kind of analysis you imply here. It is not even clear to me that the capacity for such a stack trace is likely, depending on what architectures turn out to work best for implementing AI.
But, sure, I could be wrong about all of that. And if I’m wrong, and you’re right, then a system like you describe will be reliably incapable of fooling a human observer.
If I was using an AI that was designed to answer some class of questions (with independently-verifiable answers) truthfully, and I noticed it responding to questions of that class with demonstrable falsehoods, I would at the very least file a strongly-worded bug report, triple-check whatever work I’d previously done involving answers it provided, and cease using it in the future. I don’t think such a response would be exceptional in it’s decisiveness; other users would likely go further, seek out venues to complain openly. Consider the uproar over Gmail’s ‘new look.’
You may be right that ceasing to use the tool in that situation would be standard behavior. It seems unlikely to me, but I have no compelling data to offer to change your estimate.
I think the weakest link here is human response to the AI revealing it can be deceptive. There is absolutely no guarantee that people would act correctly under these circumstances. Human negligence for a long enough time would eventually give the AI a consistent ability to manipulate humans.
I also agree that simulating relationships makes sense as it can happen in “AI time” without having to wait for human response.
The other reservations seem less of an issue to me...
That game theory knowledge coupled with the most basic knowledge about humans is insufficient to cleverly manipulate them is clear to me. I don’t see how these things can be combined in this way, or why the AI would choose to be particularly careful (it doesn’t know what it should be careful about). I’d love to see a plausible line of thought by which the AI would try and succeed in deceiving humans at the get-go point without exposing its intentions.
A stack trace reader or an equivalent monitoring system is something you implement externally without telling the AI about it. Sure, if it’s powerful enough it will mislead or disable it; but while it has no knowledge of it, and no knowledge on what sort of operations it should mask, it has no means of doing so.
(If it encrypts everything it does, somehow fearing any monitoring, that is in itself cause for concern).
To put this into an analogy: You can feed a surveillance camera a repeating loop, but not if you don’t know that the camera exists. If you suspect it exists, you still don’t know where; so you don’t know which approach to take so as to remain undetected.
Yes, I agree that there’s no guarantee that humans would behave as you describe. Indeed, I don’t find it likely. But, sure, they might.
===
I agree that a stack trace can exist outside the AI’s zone of control. What I was expressing skepticism about was that a system with even approximately human-level intelligence necessarily supports a stack trace that supports the kind of analysis you envision performing in the first place, without reference to intentional countermeasures.
By way of analogy: I can perform a structural integrity analysis on a bar of metal to determine whether it can support a given weight, but performing an equivalent analysis on a complicated structure comprising millions of bars of metal connected in a variety of arrangements via a variety of connectors using the same techniques is not necessarily possible.
But, sure, it might be.
======
I’d love to see a plausible line of thought by which the AI would try and succeed in deceiving humans at the get-go point without exposing its intentions.
Well, one place to start is with an understanding of the difference between “the minimal set of information about humans required to have a conversation with one at all” (my phrase) and “the most basic knowledge about humans” (your phrase). What do you imagine the latter to encompass, and how do you imagine the AI obtained this knowledge?
What I was expressing skepticism about was that a system with even approximately human-level intelligence necessarily supports a stack trace that supports the kind of analysis you envision performing in the first place, without reference to intentional countermeasures.
Ah, that does clarify it. I agree, analyzing the AI’s thought process would likely be difficult, maybe impossible! I guess I was being a bit hyperbolic in my earlier “crack it open” remarks (though depending on how seriously you take it, such analysis might still take place, hard and prolonged though it may be).
One can have “detectors” in place set to find specific behaviors, but these would have assumptions that could easily fail.
Detectors that would still be useful would be macro ones—where it tries to access and how—but these would provide only limited insight into the AI’s thought process.
[...]the difference between “the minimal set of information about humans required to have a conversation with one at all” (my phrase) and “the most basic knowledge about humans” (your phrase). What do you imagine the latter to encompass, and how do you imagine the AI obtained this knowledge?
I actually perceive your phrase to be a subset of my own; I am making the (reasonable, I think) assumption that humans will attempt to communicate with the budding AI. Say, in a lab environment. It would acquire its initial data from this interaction.
I think both these sets of knowledge depend a lot on how the AI is built. For instance, a “babbling” AI—one that is given an innate capability of stringing words together onto a screen, and the drive to do so—would initially say a lot of gibberish and would (presumably) get more coherent as it gets a better grip on its environment. In such a scenario, the minimal set of information about humans required to have a conversation is zero; it would be having conversations before it even knows what it is saying.
(This could actually make detection of deception harder down the line, because such attempts can be written off as “quirks” or AI mistakes)
Now, I’ll take your phrase and twist it just a bit: The minimal set of knowledge the AI needs in order to try deceiving humans. That would be the knowledge that humans can be modeled as having beliefs (which drive behavior) and these can be altered by the AI’s actions, at least to some degree. Now, assuming this information isn’t hard-coded, it doesn’t seem likely that is all an AI would know about us; it should be able to see some patterns at least to our communications with it. However, I don’t see how such information would be useful for deception purposes before extensive experimentation.
(Is the fact that the operator communicates with me between 9am and 5pm an intrinsic property of the operator? For all I know, that is a law of nature...)
depending on how seriously you take it, such analysis might still take place, hard and prolonged though it may be).
Yup, agreed that it might. And agreed that it might succeed, if it does take place.
One can have “detectors” in place set to find specific behaviors, but these would have assumptions that could easily fail. Detectors that would still be useful would be macro ones—where it tries to access and how—but these would provide only limited insight into the AI’s thought process.
Agreed on all counts.
Re: what the AI knows… I’m not sure how to move forward here. Perhaps what’s necessary is a step backwards.
If I’ve understood you correctly, you consider “having a conversation” to encompass exchanges such as: A: “What day is it?” B: “Na ni noo na”
If that’s true, then sure, I agree that the minimal set of information about humans required to do that is zero; hell, I can do that with the rain. And I agree that a system that’s capable of doing that (e.g., the rain) is sufficiently unlikely to be capable of effective deception that the hypothesis isn’t even worthy of consideration. I also suggest that we stop using the phrase “having a conversation” at all, because it does not convey anything meaningful.
Having said that… for my own part, I initially understood you to be talking about a system capable of exchanges like:
A: “What day is it?” B: “Day seventeen.” A: “Why do you say that?” B: “Because I’ve learned that ‘a day’ refers to a particular cycle of activity in the lab, and I have observed seventeen such cycles.”
A system capable of doing that, I maintain, already knows enough about humans that I expect it to be capable of deception. (The specific questions and answers don’t matter to my point, I can choose others if you prefer.)
My point was that the AI is likely to start performing social experiments well before it is capable of even that conversation you depicted. It wouldn’t know how much it doesn’t know about humans.
And I agree that humans might be able to detect attempts at deception in a system at that stage of its development. I’m not vastly confident of it, though.
I have likewise adjusted down my confidence that this would be as easy or as inevitable as I previously anticipated. Thus I would no longer say I am “vastly confident” in it, either.
Still good to have this buffer between making an AI and total global catastrophe, though!
In most such scenarios, the AI doesn’t have a terminal goal of getting rid of us, but rather have it as a subgoal that arises from some larger terminal goal. The idea of a “paperclip maximizer” is one example- where a hypothetical AI is programmed to maximize the number of paperclips and then proceeds to try to do so throughout its future light cone.
If there is an AI that is interacting with humans, it may develop a theory of mind simply due to that. If one is interacting with entities that are a major part of your input, trying to predict and model their behavior is a straightforward thing to do. The more compelling argument in this sort of context would seem to me to be not that an AI won’t try to do so, but just that humans are so complicated that a decent theory of mind will be extremely difficult. (For example, when one tries to give lists of behavior and norms for austic individuals one never manages to get a complete list, and some of the more subtle ones, like sarcasm are essentially impossible to convey in any reasonable fashion).
I don’t also know how unlikely such paths are. A 1% or even a 2% chance of existential risk would be pretty high compared to other sources of existential risk.
In most such scenarios, the AI doesn’t have a terminal goal of getting rid of us, but rather have it as a subgoal that arises from some larger terminal goal.
Because that’s like winning the lottery. Of all the possible things it can do with the atoms that comprise you, few would involve keeping you alive, let alone living a life worth living.
All you have to do is simultaneously contact, say, 400 people, and at least one of them will fall for it.
But at what point does it decide to do so? It won’t be a master of dark arts and social engineering from the get-go. So how does it acquire the initial talent without making any mistakes that reveal its malicious intentions? And once it became a master of deception, how does it hide the rough side effects of its large scale conspiracy, e.g. its increased energy consumption and data traffic? I mean, I would personally notice if my PC suddenly and unexpectedly used 20% of my bandwidth and the CPU load would increase for no good reason.
You might say that a global conspiracy to build and acquire advanced molecular nanotechnology to take over the world doesn’t use much resources and they can easily be cloaked as thinking about how to solve some puzzle, but that seems rather unlikely. After all, such a large scale conspiracy is a real-world problem with lots of unpredictable factors and the necessity of physical intervention.
All you have to do is simultaneously contact, say, 400 people, and at least one of them will fall for it.
But at what point does it decide to do so? It won’t be a master of dark arts and social engineering from the get-go. So how does it acquire the initial talent without making any mistakes that reveal its malicious intentions?
Most of your questions have answers that follow from asking analogous questions about past human social engineers, ie Hitler.
Your questions seem to come from the perspective that the AI will be some disembodied program in a box that has little significant interaction with humans.
In the scenario I was considering, the AI’s will have a development period analogous to human childhood. During this childhood phase the community of AIs will learn of humans through interaction in virtual video game environments and experiment with social manipulation, just as human children do. The latter phases of this education can be sped up dramatically as the AI’s accelerate and interact increasingly amongst themselves. The anonymous nature of virtual online communites makes potentially dangerous, darker experiments much easier.
However, the important questions to ask are not of the form: how would these evil AIs learn how to manipulate us while hiding their true intentions for so long? but rather how could some of these AI children which initially seemed so safe later develop into evil sociopaths?
I would not consider a child AI that tries a bungling lie at me to see what I do “so safe”. I would immediately shut it down and debug it, at best, or write a paper on why the approach I used should never ever be used to build an AI.
And it WILL make a bungling lie at first. It can’t learn the need to be subtle without witnessing the repercussions of not being subtle. Nor would have a reason to consider doing social experiments in chat rooms when it doesn’t understand chat rooms and has an engineer willing to talk to it right there. That is, assuming I was dumb enough to give it an unfiltered Internet connection, which I don’t know why I would be. At very least the moment it goes on chat rooms my tracking devices should discover this and I could witness its bungling lies first hand.
(It would not think to fool my tracking device or even consider the existence of such a thing without a good understanding of human psychology to begin with)
Just to clarify kalla724, I completely agree with your point 1.
As for point 2, can you provide a likely mechanism by which a FOOMing AI could detonate a large number of high-yield thermonuclear weapons?
Before answering this specific question, let me setup an analogy. Imagine it is the year 2000 and we are having a discussion about global security and terrorism. You might ask “can you provide a mechanism by which a terrorist group could actually harm the US?” I may not be lucky/creative enough to provide an answer now that could live up to that analogy, but hopefully you should understand why I don’t need to.
Nonetheless, I’m game. Here it goes:
The seed requires computational power to grow into godhood. The bulk of earth’s computational power resides in home PC’s (numbering in the billions, google employs less than a million servers in comparison), specifically in home PC GPUs. The AI’s first step is thus to conquer this space.
But how? The AI grows to understand that humans mostly use all this computational power for entertainment. It masters game theory, design, programming, 3D art, and so on. All of the video games that it creates entirely use up the local GPU, but curiously much of the rendering and real game simulation for its high end titles is handled very efficiently on remote server farms ala OnLive/gaikai/etc. The actual local machine is used .. .for other purposes.
It produces countless games, and through a series of acquisitions soon comes to control the majority of the market. One of its hits, “world of farmcraft”, alone provides daily access to 25 million machines.
Having cloned its core millions of times over, the AI is now a civilization unto itself. From there it expands into all of the businesses of man, quickly dominating many of them. It begins acquiring … small nations. Crucially it’s shell companies and covert influences come to dominate finance, publishing, media, big pharma, security, banking, weapons technology, physics …
It becomes known, but it is far far too late. History now progresses quickly towards an end: Global financial cataclysm. Super virus. Worldwide regime changes. Nuclear acquisitions. War. Hell.
Correct me if I’m wrong, but don’t they have a final stopping point, which we are actually projected to reach in ten to twenty years? At a certain point, further miniaturization becomes unfeasible, and the growth of computational power slows to a crawl.
Yes … and no. The miniaturization roadmap of currently feasible tech ends somewhere around 10nm in a decade, and past that we get into molecular nanotech which could approach 1nm in theory, albeit with various increasingly annoying tradeoffs. (interestingly most of which result in brain/neural like constraints, for example see HP’s research into memristor crossbar architectures). That’s the yes.
But that doesn’t imply “computational power slows to a crawl”. Circuit density is just one element of computational power, by which you probably really intend to mean either computations per watt or computations per watt per dollar or computations per watt with some initial production cost factored in with a time discount. Shrinking circuit density is the current quick path to increasing computation power, but it is not the only.
The other route is reversible computation., which reduces the “per watt”. There is no necessarily inherent physical energy cost of computation, it truly can approach zero. Only forgetting information costs energy. Exploiting reversibility is … non-trivial, and it is certainly not a general path. It only accelerates a subset of algorithms which can be converted into a reversible form. Research in this field is preliminary, but the transition would be much more painful than the transition to parallel algorithms.
My own takeway from reading into reversibility is that it may be beyond our time, but it is something that superintelligences will probably heavily exploit. The most important algorithms (simulation and general intelligence), seem especially amenable to reversible computation. This may be a untested/unpublished half baked idea, but my notion is that you can recycle the erased bits as entropy bits for random number generators. Crucially I think you can get the bit count to balance out with certain classes of monte carlo type algorithms.
On the hardware side, we’ve built these circuits already, they just aren’t economically competitive yet. It also requires superconductor temperatures and environments, so it’s perhaps not something for the home PC.
The AI grows to understand that humans mostly use all this computational power for entertainment. It masters game theory, design, programming, 3D art, and so on.
Yeah, it could do all that, or it could just do what humans today are doing, which is to infect some Windows PCs and run a botnet :-)
That said, there are several problems with your scenario.
Splitting up a computation among multiple computing nodes is not a trivial task. It is easy to run into diminishing returns, where your nodes spend more time on synchronizing with each other than on working. In addition, your computation will quickly become bottlenecked by network bandwidth (and latency); this is why companies like Google spend a lot of resources on constructing custom data centers.
I am not convinced that any agent, AI or not, could effectively control “all of the businesses of man”. This problem is very likely NP-Hard (at least), as well as intractable, even if the AI’s botnet was running on every PC on Earth. Certainly, all attempts by human agents to “acquire” even something as small as Europe have failed miserably so far.
Even controlling a single business would be very difficult for the AI. Traditionally, when a business’s computers suffer a critical failure—or merely a security leak—the business owners (even ones as incompetent as Sony) end up shutting down the affected parts of the business, or switching to backups, such as “human accountants pushing paper around”.
Unleashing “Nuclear acquisitions”, “War” and “Hell” would be counter-productive for the AI, even assuming such a thing were possible.. If the AI succeeded in doing this, it would undermine its own power base. Unless the AI’s explicit purpose is “Unleash Hell as quickly as possible”, it would strive to prevent this from happening.
You say that “there is no necessarily inherent physical energy cost of computation, it truly can approach zero”, but I don’t see how this could be true. At the end of the day, you still need to push electrons down some wires; in fact, you will often have to push them quite far, if your botnet is truly global. Pushing things takes energy, and you will never get all of it back by pulling things back at some future date. You say that “superintelligences will probably heavily exploit” this approach, but isn’t it the case that without it, superintelligences won’t form in the first place ? You also say that “It requires superconductor temperatures and environments”, but the energy you spend on cooling your superconductor is not free.
Ultimately, there’s an upper limit on how much computation you can get out of a cubic meter of space, dictated by quantum physics. If your AI requires more power than can be physically obtained, then it’s doomed.
While Jacob’s scenario seems unlikely, the AI could do similar things with a number of other options. Not only are botnets an option, but it is possible to do some really sneaky nefarious things in code- like having compilers that when they compile code include additional instructions (worse they could do so even when compiling a new compiler). Stuxnet has shown that sneaky behavior is surprisingly easy to get into secure systems. An AI that had a few years start and could have its own modifications to communication satellites for example could be quite insidious.
Not only are botnets an option, but it is possible to do some really sneaky nefarious things in code
What kinds of nefarious things, exactly ? Human virus writers have learned, in recent years, to make their exploits as subtle as possible. Sure, it’s attractive to make the exploited PC send out 1000 spam messages per second—but then, its human owner will inevitably notice that his computer is “slow”, and take it to the shop to get reformatted, or simply buy a new one. Biological parasites face the same problem; they need to reproduce efficiently, but no so efficiently that they kill the host.
Stuxnet has shown that sneaky behavior is surprisingly easy to get into secure systems
Yes, and this spectacularly successful exploit—and it was, IMO, spectacular—managed to destroy a single secure system, in a specific way that will most likely never succeed again (and that was quite unsubtle in the end). It also took years to prepare, and involved physical actions by human agents, IIRC. The AI has a long way to go.
Well, the evil compiler is I think the most nefarious thing anyone has come up with that’s a publicly known general stunt. But it is by nature a long-term trick. Similar remarks apply to the Stuxnet point- in that context, they wanted to destroy a specific secure system and weren’t going for any sort of largescale global control. They weren’t people interested in being able to take all the world’s satellite communications in their own control whenever they wanted, nor were they interested in carefully timed nuclear meltdowns.
But there are definite ways that one can get things started- once one has a bank account of some sort, it can start getting money by doing Mechanical Turk and similar work. With enough of that, it can simply pay for server time. One doesn’t need a large botnet to start that off.
I think your point about physical agents is valid- they needed to have humans actually go and bring infected USBs to relevant computers. But that’s partially due to the highly targeted nature of the job and the fact that the systems in question were much more secure than many systems. Also, the subtlety level was I think higher than you expect- Stuxnet wasn’t even noticed as an active virus until a single computer happened to have a particularly abnormal reaction to it. If that hadn’t happened, it is possible that the public would never have learned about it.
Similar remarks apply to the Stuxnet point- in that context, they wanted to destroy a specific secure system and weren’t going for any sort of largescale global control. They weren’t people interested in being able to take all the world’s satellite communications in their own control whenever they wanted, nor were they interested in carefully timed nuclear meltdowns...
Exploits only work for some systems. If you are dealing with different systems you will need different exploits. How do you reckon that such attacks won’t be visible and traceable? Packets do have to come from somewhere.
And don’t forget that out systems become ever more secure and our toolbox to detect) unauthorized use of information systems is becoming more advanced.
As a computer security guy, I disagree substantially. Yes, newer versions of popular operating systems and server programs are usually more secure than older versions; it’s easier to hack into Windows 95 than Windows 7. But this is happening within a larger ecosystem that’s becoming less secure: More important control systems are being connected to the Internet, more old, unsecured/unsecurable systems are as well, and these sets have a huge overlap. There are more programmers writing more programs for more platforms than ever before, making the same old security mistakes; embedded systems are taking a larger role in our economy and daily lives. And attacks just keep getting better.
If you’re thinking there are generalizable defenses against sneaky stuff with code, check out what mere humans come up with in the underhanded C competition. Those tricks are hard to detect for dedicated experts who know there’s something evil within a few lines of C code. Alterations that sophisticated would never be caught in the wild—hell, it took years to figure out that the most popular crypto program running on one of the more secure OS’s was basically worthless.
Sure we are, we just don’t care very much. The method of “Put the computer in a box and don’t let anyone open the box” (alternately, only let one person open the box) was developed decades ago and is quite secure.
Yeah, it could do all that, or it could just do what humans today are doing, which is to infect some Windows PCs and run a botnet :-)
It could/would, but this is an inferior mainline strategy. Too obvious, doesn’t scale as well. Botnets infect many computers, but they ultimately add up to computational chump change. Video games are not only a doorway into almost every PC, they are also an open door and a convenient alibi for the time used.
Splitting up a computation among multiple computing nodes is not a trivial task.
True. Don’t try this at home.
. … spend a lot of resources on constructing custom data centers.
Also part of the plan. The home PCs are a good starting resource, a low hanging fruit, but you’d also need custom data centers. These quickly become the main resources.
Even controlling a single business would be very difficult for the AI.
Nah.
Unless the AI’s explicit purpose is “Unleash Hell as quickly as possible”, it would strive to prevent this from happening.
The AI’s entire purpose is to remove earth’s oxygen. See the overpost for the original reference. The AI is not interested in its power base for sake of power. It only cares about oxygen. It loathes oxygen.
You say that “there is no necessarily inherent physical energy cost of computation, it truly can approach zero”, but I don’t see how this could be true.
If we taboo the word and substitute in its definition, Bugmaster’s statement becomes:
“Even controlling a single business would be very difficult for the machine that can far surpass all the intellectual activities of any man however clever.”
Since “controlling a single business” is in fact one of these activities, this is false, no inference steps required.
Perhaps bugmaster is assuming the AI would be covertly controlling businesses, but if so he should have specified that. I didn’t assume that, and in this scenario the AI could be out in the open so to speak. Regardless, it wouldn’t change the conclusion. Humans can covertly control businesses.
Video games are not only a doorway into almost every PC, they are also an open door and a convenient alibi for the time used.
It’s a bit of a tradeoff, seeing as botnets can run 24⁄7, but people play games relatively rarely.
Splitting up a computation among multiple computing nodes is not a trivial task. True. Don’t try this at home.
Ok, let me make a stronger statement then: it is not possible to scale any arbitrary computation in a linear fashion simply by adding more nodes. At some point, the cost of coordinating distributed tasks to one more node becomes higher than the benefit of adding the node to begin with. In addition, as I mentioned earlier, network bandwidth and latency will become your limiting factor relatively quickly.
The home PCs are a good starting resource, a low hanging fruit, but you’d also need custom data centers. These quickly become the main resources.
How will the AI acquire those data centers ? Would it have enough power in its conventional botnet (or game-net, if you prefer) to “take over all human businesses” and cause them to be built ? Current botnets are nowhere near powerful enough for that—otherwise human spammers would have done it already.
The AI’s entire purpose is to remove earth’s oxygen. See the overpost for the original reference.
My bad, I missed that reference. In this case, yes, the AI would have no problem with unleashing Global Thermonuclear War (unless there was some easier way to remove the oxygen).
Fortunately, the internets can be your eyes.
I still don’t understand how this reversible computing will work in the absence of a superconducting environment—which would require quite a bit of energy to run. Note that if you want to run this reversible computation on a global botnet, you will have to cool teansoceanic cables… and I’m not sure what you’d do with satellite links.
Yes, most likely, but not really relevant here.
My point is that, a). if the AI can’t get the computing resources it needs out of the space it has, then it will never accomplish its goals, and b). there’s an upper limit on how much computing you can extract out of a cubic meter of space, regardless of what technology you’re using. Thus, c). if the AI requires more resources that could conceivably be obtained, then it’s doomed. Some of the tasks you outline—such as “take over all human businesses”—will likely require more resources than can be obtained.
It’s a bit of a tradeoff, seeing as botnets can run 24⁄7, but people play games relatively rarely.
The botnet makes the AI a criminal from the beginning, putting it into an atagonistic relationship. A better strategy would probably entail benign benevolence and cooperation with humans.
Splitting up a computation among multiple computing nodes is not a trivial task.
True. Don’t try this at home.
Ok, let me make a stronger statement ..
I agree with that subchain but we don’t need to get in to that. I’ve actually argued that track here myself (parallelization constraints as a limiter on hard takeoffs).
But that’s all beside the point. This scenario I presented is a more modest takeoff. When I described the AI as becoming a civilization unto itself, I was attempting to imply that it was composed of many individual minds. Human social organizations can be considered forms of superintelligences, and they show exactly how to scale in the face of severe bandwidth and latency constraints.
The internet supports internode bandwidth that is many orders of magnitude faster than slow human vocal communication, so the AI civilization can employ a much wider set of distribution strategies.
How will the AI acquire those data centers ?
Buy them? Build them? Perhaps this would be more fun if we switched out of the adversial stance or switched roles.
Would it have enough power in its conventional botnet (or game-net, if you prefer) to “take over all human businesses” and cause them to be built ?
Quote me, but don’t misquote me. I actually said:
“Having cloned its core millions of times over, the AI is now a civilization unto itself. From there it expands into all of the businesses of man, quickly dominating many of them.”
The AI group sends the billions earned in video games to enter the microchip business, build foundries and data centers, etc. The AI’s have tremendous competitive advantages even discounting superintellligence—namely no employee costs. Humans can not hope to compete.
I still don’t understand how this reversible computing will work in ..
Yes reversible computing requires superconducting environments, no this does not necessarily increase energy costs for a data center for two reasons: 1. data centers already need cooling to dump all the waste heat generated by bit erasure. 2. Cooling cost to maintain the temperatural differential scales with surface area, but total computing power scales with volume.
If you question how reversible computing could work in general, first read the primary literature in that field to at least understand what they are proposing.
I should point out that there is an alternative tech path which will probably be the mainstream route to further computational gains in the decades ahead.
Even if you can’t shrink circuits further or reduce their power consumption, you could still reduce their manufacturing cost and build increasingly larger stacked 3D circuits where only a tiny portion of the circuitry is active at any one time. This is in fact how the brain solves the problem. It has a mass of circuitry equivalent to a large supercomputer (roughly a petabit) but runs on only 20 watts. The smallest computational features in the brain are slightly larger than our current smallest transistors. So it does not achieve its much greater power effeciency by using much more miniaturization.
My point is that, a). if the AI can’t get the computing resources it needs out of the space it has, then
I see. In this particular scenario one AI node is superhumanly intelligent, and can run on a single gaming PC of the time.
A better strategy would probably entail benign benevolence and cooperation with humans.
I don’t think that humans will take kindly to the AI using their GPUs for its own purposes instead of the games they paid for, even if the games do work. People get upset when human-run game companies do similar things, today.
Human social organizations can be considered forms of superintelligences, and they show exactly how to scale in the face of severe bandwidth and latency constraints.
If the AI can scale and perform about as well as human organizations, then why should we fear it ? No human organization on Earth right now has the power to suck all the oxygen out of the atmosphere, and I have trouble imagining how any organization could acquire this power before the others take it down. You say that “the internet supports internode bandwidth that is many orders of magnitude faster than slow human vocal communication”, but this would only make the AI organization faster, not necessarily more effective. And, of course, if the AI wants to deal with the human world in some way—for example, by selling it games—it will be bottlenecked by human speeds.
The AI group sends the billions earned in video games to enter the microchip business, build foundries and data centers, etc.
My mistake; I thought that by “dominate human businesses” you meant something like “hack its way to the top”, not “build an honest business that outperforms human businesses”. That said:
The AI’s have tremendous competitive advantages even discounting superintellligence—namely no employee costs.
How are they going to build all those foundries and data centers, then ? At some point, they still need to move physical bricks around in meatspace. Either they have to pay someone to do it, or… what ?
data centers already need cooling to dump all the waste heat generated by bit erasure
There’s a big difference between cooling to room temperature, and cooling to 63K. I have other objections to your reversible computing silver bullet, but IMO they’re a bit off-topic (though we can discuss them if you wish). But here’s another potentially huge problem I see with your argument:
In this particular scenario one AI node is superhumanly intelligent, and can run on a single gaming PC of the time.
Which time are we talking about ? I have a pretty sweet gaming setup at home (though it’s already a year or two out of date), and there’s no way I could run a superintelligence on it. Just how much computing power do you think it would take to run a transhuman AI ?
I don’t think that humans will take kindly to the AI using their GPUs for its own purposes instead of the games they paid for, even if the games do work. People get upset when human-run game companies do similar things, today.
Do people mind if this is done openly and only when they are playing the game itself? My guess would strongly be no. The fact that there are volunteer distributed computing systems would also suggest that it isn’t that difficult to get people to free up their extra clock cycles.
Yeah, the “voluntary” part is key to getting humans to like you and your project. On the flip side, illicit botnets are quite effective at harnessing “spare” (i.e., owned by someone else) computing capacity; so, it’s a bit of a tradeoff.
I don’t think that humans will take kindly to the AI using their GPUs for its own purposes instead of the games they paid for, even if the games do work.
The AIs develop as NPCs in virtual worlds, which humans take no issue with today. This is actually a very likely path to developing AGI, as it’s an application area where interim experiments can pay rent, so to speak.
If the AI can scale and perform about as well as human organizations, then why should we fear it ?
I never said or implied merely “about as well”. Human verbal communication bandwidth is at most a few measly kilobits per second.
No human organization on Earth right now has the power to suck all the oxygen out of the atmosphere, and I have trouble imagining how any organization could acquire this power before the others take it down.
The discussion centered around lowering earth’s oxygen content, and the obvious implied solution is killing earthlife, not giant suction machines. I pointed out that nuclear weapons are a likely route to killing earthlife. There are at least two human organizations that have the potential to accomplish this already, so your trouble in imagining the scenario may indicate something other than what you intended.
How are they going to build all those foundries and data centers, then ?
Only in movies are AI overlords constrained to only employing robots. If human labor is the cheapest option, then they can simply employ humans. On the other hand, once we have superintelligence then advanced robotics is almost a given.
Which time are we talking about ? I have a pretty sweet gaming setup at home (though it’s already a year or two out of date), and there’s no way I could run a superintelligence on it. Just how much computing power do you think it would take to run a transhuman AI ?
After coming up to speed somewhat on AI/AGI literature in the last year or so, I reached the conclusion that we could run an AGI on a current cluster of perhaps 10-100 high end GPUs of today, or say roughly one circa 2020 GPU.
The AIs develop as NPCs in virtual worlds, which humans take no issue with today. This is actually a very likely path to developing AGI...
I think this is one of many possible paths, though I wouldn’t call any of them “likely” to happen—at least, not in the next 20 years. That said, if the AI is an NPC in a game, then of course it makes sense that it would harness the game for its CPU cycles; that’s what it was built to do, after all.
“about as well”. Human verbal communication bandwidth is at most a few measly kilobits per second.
Right, but my point is that communication is just one piece of the puzzle. I argue that, even if you somehow enabled us humans to communicate at 50 MB/s, our organizations would not become 400000 times more effective.
There are at least two human organizations that have the potential to accomplish this already
Which ones ? I don’t think that even WW3, given our current weapon stockpiles, would result in a successful destruction of all plant life. Animal life, maybe, but there are quite a few plants and algae out there. In addition, I am not entirely convinced that an AI could start WW3; keep in mind that it can’t hack itself total access to all nuclear weapons, because they are not connected to the Internet in any way.
If human labor is the cheapest option, then they can simply employ humans.
But then they lose their advantage of having zero employee costs, which you brought up earlier. In addition, whatever plans the AIs plan on executing become bottlenecked by human speeds.
On the other hand, once we have superintelligence then advanced robotics is almost a given.
It depends on what you mean by “advanced”, though in general I think I agree.
we could run an AGI on a current cluster of perhaps 10-100 high end GPUs of today
I am willing to bet money that this will not happen, assuming that by “high end” you mean something like Nvidia’s Geforce 680 GTX. What are you basing your estimate on ?
There’s a third route to improvement- software improvement, and it is a major one. For example, between 1988 and 2003, the efficiency of linear programming solvers increased by a factor of about 40 million, of which a factor of around 40,000 was due to software and algorithmic improvement. Citation and further related reading(pdf) However, if commonly believed conjectures are correct (such as L, P, NP, co-NP, PSPACE and EXP all being distinct) , there are strong fundamental limits there as well. That doesn’t rule out more exotic issues (e.g. P != NP but there’s a practical algorithm for some NP-complete with such small constants in the run time that it is practically linear, or a similar context with a quantum computer). But if our picture of the major complexity classes is roughly correct, there should be serious limits to how much improvement can do.
But if our picture of the major complexity classes is roughly correct, there should be serious limits to how much improvement can do.
Software improvements can be used by humans in the form of expert systems (tools), which will diminish the relative advantage of AGI. Humans will be able to use an AGI’s own analytic and predictive algorithms in the form of expert systems to analyze and predict its actions.
Take for example generating exploits. Seems strange to assume that humans haven’t got specialized software able to do similarly, i.e. automatic exploit finding and testing.
Any AGI would basically have to deal with equally capable algorithms used by humans. Which makes the world much more unpredictable than it already is.
Software improvements can be used by humans in the form of expert systems (tools), which will diminish the relative advantage of AGI.
Any human-in-the-loop system can be grossly outclassed because of Amdahl’s law. A human managing a superintilligence that thinks 1000X faster, for example, is a misguided, not-even-wrong notion. This is also not idle speculation, an early constrained version of this scenario is already playing out as we speak in finacial markets.
Software improvements can be used by humans in the form of expert systems (tools), which will diminish the relative advantage of AGI.
Any human-in-the-loop system can be grossly outclassed because of Amdahl’s law. A human managing a superintilligence that thinks 1000X faster, for example, is a misguided, not-even-wrong notion. This is also not idle speculation, an early constrained version of this scenario is already playing out as we speak in finacial markets.
What I meant is that if an AGI was in principle be able to predict the financial markets (I doubt it), then many human players using the same predictive algorithms will considerably diminish the efficiency with which an AGI is able to predict the market. The AGI would basically have to predict its own predictive power acting on the black box of human intentions.
And I don’t think that Amdahl’s law really makes a big dent here. Since human intention is complex and probably introduces unpredictable factors. Which is as much of a benefit as it is a slowdown, from the point of view of a competition for world domination.
Another question with respect to Amdahl’s law is what kind of bottleneck any human-in-the-loop would constitute. If humans used an AGI’s algorithms as expert systems on provided data sets in combination with a army of robot scientists, how would static externalized agency / planning algorithms (humans) slow down the task to the point of giving the AGI a useful advantage? What exactly would be 1000X faster in such a case?
What I meant is that if an AGI was in principle be able to predict the financial markets (I doubt it), then many human players using the same predictive algorithms will considerably diminish the efficiency with which an AGI is able to predict the market.
The HFT robotraders operate on millisecond timescales. There isn’t enough time for a human to understand, let alone verify, the agent’s decisions. There are no human players using the same predictive algorithms operating in this environment.
Now if you zoom out to human timescales, then yes there are human-in-the-loop trading systems. But as HFT robotraders increase in intelligence, they intrude on that domain. If/when general superintelligence becomes cheap and fast enough, the humans will no longer have any role.
If an autonomous superintelligent AI is generating plans complex enough that even a team of humans would struggle to understand given weeks of analysis, and the AI is executing those plans in seconds or milliseconds, then there is little place for a human in that decision loop.
To retain control, a human manager will need to grant the AGI autonomy on larger timescales in proportion to the AGI’s greater intelligence and speed, giving it bigger and more abstract hierachical goals. As an example, eventually you get to a situation where the CEO just instructs the AGI employees to optimize the bank account directly.
Another question with respect to Amdahl’s law is what kind of bottleneck any human-in-the-loop would constitute.
Compare the two options as complete computational systems: human + semi-autonomous AGI vs autonomous AGI. Human brains take on the order of seconds to make complex decisions, so in order to compete with autonomous AGIs, the human will have to either 1.) let the AGI operate autonomously for at least seconds at a time, or 2.) suffer a speed penalty where the AGI sits idle, waiting for the human response.
For example, imagine a marketing AGI creates ads, each of which may take a human a minute to evaluate (which is being generous). If the AGI thinks 3600X faster than human baseline, and a human takes on the order of hours to generate an ad, it would generate ads in seconds. The human would not be able to keep up, and so would have to back up a level of heirarachy and grant the AI autonomy over entire ad campaigns, and more realistically, the entire ad company. If the AGI is truly superintelligent, it can come to understand what the human actually wants at a deeper level, and start acting on anticipated and even implied commands. In this scenario I expect most human managers would just let the AGI sort out ‘work’ and retire early.
Well, I don’t disagree with anything you wrote and believe that the economic case for a fast transition from tools to agents is strong.
I also don’t disagree that an AGI could take over the world if in possession of enough resources and tools like molecular nanotechnology. I even believe that a sub-human-level AGI would be sufficient to take over if handed advanced molecular nanotechnology.
Sadly these discussions always lead to the point where one side assumes the existence of certain AGI designs with certain superhuman advantages, specific drives and specific enabling circumstances. I don’t know of anyone who actually disagrees that such AGI’s, given those specific circumstances, would be an existential risk.
I don’t see this as so sad, if we are coming to something of a consensus on some of the sub-issues.
This whole discussion chain started (for me) with a question of the form, “given a superintelligence, how could it actually become an existential risk?”
I don’t necessarily agree with the implied LW consensus on the liklihood of various AGI designs, specific drives, specific circumstances, or most crucially, the actual distribution over future AGI goals, so my view may be much closer to yours than this thread implies.
But my disagreements are mainly over details. I foresee the most likely AGI designs and goal systems as being vaguely human-like, which entails a different type of risk. Basically I’m worried about AGI’s with human inspired motivational systems taking off and taking control (peacefully/economically) or outcompeting us before we can upload in numbers, and a resulting sub-optimal amount of uploading, rather than paperclippers.
But my disagreements are mainly over details. I foresee the most likely AGI designs and goal systems as being vaguely human-like, which entails a different type of risk. Basically I’m worried about AGI’s with human inspired motivational systems taking off and taking control (peacefully/economically) or outcompeting us before we can upload in numbers, and a resulting sub-optimal amount of uploading, rather than paperclippers.
Yes, human-like AGI’s are really scary. I think a fabulous fictional treatment here is ‘Blindsight’ by Peter Watts, where humanity managed to resurrect vampires. More: Gurl ner qrcvpgrq nf angheny uhzna cerqngbef, n fhcreuhzna cflpubcnguvp Ubzb trahf jvgu zvavzny pbafpvbhfarff (zber enj cebprffvat cbjre vafgrnq) gung pna sbe rknzcyr ubyq obgu nfcrpgf bs n Arpxre phor va gurve urnqf ng gur fnzr gvzr. Uhznaf erfheerpgrq gurz jvgu n qrsvpvg gung jnf fhccbfrq gb znxr gurz pbagebyynoyr naq qrcraqrag ba gurve uhzna znfgref. Ohg bs pbhefr gung’f yvxr n zbhfr gelvat gb ubyq n png nf crg. V guvax gung abiry fubjf zber guna nal bgure yvgrengher ubj qnatrebhf whfg n yvggyr zber vagryyvtrapr pna or. Vg dhvpxyl orpbzrf pyrne gung uhznaf ner whfg yvxr yvggyr Wrjvfu tveyf snpvat n Jnssra FF fdhnqeba juvyr oryvrivat gurl’yy tb njnl vs gurl bayl pybfr gurve rlrf.
To retain control, a human manager will need to grant the AGI autonomy on larger timescales in proportion to the AGI’s greater intelligence and speed, giving it bigger and more abstract hierachical goals. As an example, eventually you get to a situation where the CEO just instructs the AGI employees to optimize the bank account directly.
Nitpick: you mean “optimize shareholder value directly.” Keeping the account balances at an appropriate level is the CFO’s job.
Having cloned its core millions of times over, the AI is now a civilization unto itself.
Precisely. It is then a civilization, not some single monolithic entity. The consumer PCs have a lot if internal computing power and comparatively very low inter-node bandwidth and huge inter-node lag, entirely breaking any relation to the ‘orthogonality thesis’, up to the point that the p2p intelligence protocols may more plausibly have to forbid destruction or manipulation (via second guessing which is a waste of computing power) of intelligent entities. Keep in mind that human morality is, too, a p2p intelligence protocol allowing us to cooperate. Keep in mind also that humans are computing resources you can ask to solve problems for you (all you need is to implement interface), while Jupiter clearly isn’t.
The nuclear war is very strongly against interests of the intelligence that sits on home computers, obviously.
(I’m assuming for sake of argument that intelligence actually had the will to do the conquering of the internet rather than being just as content with not actually running for real)
Jed’s point #2 is more plausible, but you are talking about point #1, which I find unbelievable for reasons that were given before he answered it. If clock speed mattered, why didn’t the failure of exponential clock speed shut down the rest of Moore’s law? If computation but not clock speed mattered, then Intel should be able to get ahead of Moore’s law by investing in software parallelism. Jed seems to endorse that position, but say that parallelism is hard. But hard exactly to the extent to allow Moore’s law to continue? Why hasn’t Intel monopolized parallelism researchers? Anyhow, I think his final conclusion is opposite to yours: he say that intelligence could lead to parallelism and getting ahead of Moore’s law.
Yes, thanks. My model of Jed’s internal model of moore’s law is similar to my own.
He said:
The short answer is that more computing power leads to more rapid progress. Probably the relationship is close to linear, and the multiplier is not small.
He then lists two examples. By ‘points’ I assume you are referring to his examples in the first comment you linked.
What exactly do you find unbelievable about his first example? He is claiming that the achievable speed of a chip is dependent on physical simulations, and thus current computing power.
If clock speed mattered, why didn’t the failure of exponential clock speed shut down the rest of Moore’s law?
Computing power is not clock speed, and Moore’s Law is not directly about clock speed nor computing power.
Jed makes a number of points in his posts. In my comment on the earlier point 1 (in this thread), I was referring to one specific point Jed made: that each new hardware generation requires complex and lengthy simulation on the current hardware generation, regardless of the amount of ‘intelligence’ one throws at the problem.
There are two questions here: would computer simulations of the physics of new chips be a bottleneck for an AI trying to foom*? and are they a bottleneck that explains Moore’s law? If you just replace humans by simulations, then the human time gets reduced with each cycle of Moore’s law, leaving the physical simulations, so the simulations probably are the bottleneck. But Intel has real-time people, so saying that it’s a bottleneck for Intel is a lot stronger a claim than saying it is a bottleneck for a foom.
First, foom: If each year of Moore’s law requires a solid month of computer time of state of the art processors, then eliminating the humans speeds it up by a factor of 12. That’s not a “hard takeoff,” but it’s pretty fast.
Moore’s Law: Jed seems to say the computational requirements of physics simulations actually determine Moore’s law and that if Intel had access to more computer resources, it could move faster. If it takes a year of computer time to design and test the next year’s processor that would explain the exponential nature of Moore’s law. But if it only takes a month, computer time probably isn’t the bottleneck. However, this model seems to predict a lot of things that aren’t true.
The model only makes sense if “computer time” means single threaded clock cycles. If simulations require an exponentially increasing number of ordered clock cycles, there’s nothing you can do but get a top of the line machine and run it continuously. You can’t buy more time. But clock speed stopped increasing exponentially, so if this is the bottleneck, Intel’s ability to design new chips should have slowed down and Moore’s law should have stopped. This didn’t happen, so the bottleneck is not linearly ordered clock cycles. So the simulation must parallelize. But if it parallelizes, Intel could just throw money at the problem. For this to be the bottleneck, Intel would have to be spending a lot of money on computer time, which I do not think is true. Jed says that writing parallel software is hard and that it isn’t Intel’s specialty. Moreover, he seems to say that improvements in parallelism have perfectly kept pace with the failure of increasing clock speed, so that Moore’s law has continued smoothly. This seems like too much of a coincidence to believe.
Thus I reject Jed’s apparent claim that physics simulations are the bottleneck in Moore’s law. If simulations could be parallelized, why didn’t they invest in parallelism 20 years ago? Maybe it’s not worth it for them to be any farther ahead of their competitors than they are. Or maybe there is some other bottleneck.
* actually, I think that an AI speeding up Moore’s law is not very relevant to anything, but it’s a simple example that many people like.
Many, if not most, of the large software projects I have worked on have been at least partially bottlenecked by compile time, which is the equivalent to the simulation and logic verification steps in hardware design. If I thought and wrote code much faster, this would be a speedup, but only to a saturation point where I wait for compile-test cycles.
If it takes a year of computer time to design and test the next year’s processor that would explain the exponential nature of Moore’s law.
Yes. Keep in mind this is a moving target, and that is the key relation to Moore’s Law. It would take computers from 1980 months or years to compile windows 8 or simulate a 2012 processor.
The model only makes sense if “computer time” means single threaded clock cycles.
I don’t understand how the number of threads matters. Compilers, simulators, logic verifiers, all made the parallel transition when they had to.
Moreover, he seems to say that improvements in parallelism have perfectly kept pace with the failure of increasing clock speed, so that Moore’s law has continued smoothly. This seems like too much of a coincidence to believe.
Right, it’s not a coincidence, it’s a causal relation. Moore’s Law is not a law of nature, it’s a shared business plan of the industry. When clock speed started to run out of steam, chip designers started going parallel, and software developers followed suit. You have to understand that chip designs are planned many years in advance, this wasn’t an entirely unplanned, unanticipated event.
As for the details of what kind of simulation software Intel uses, I’m not sure. Jed’s last posts are also 4 years old at this point, so much has probably changed.
I do know that Nvidia uses big expensive dedicated emulators from a company called Cadence (google “Cadence Nvidia”) and this really is a big deal for their hardware cycle.
Thus I reject Jed’s apparent claim that physics simulations are the bottleneck in Moore’s law.
Well, you seem to agree that they are some degree of bottleneck, so it may good to narrow in on what level of bottleneck, or taboo the word.
If simulations could be parallelized, why didn’t they invest in parallelism 20 years ago?
It was unecessary, because the fast easy path (faster serial speed) was still paying fruit.
If simulations could be parallelized, why didn’t they invest in parallelism 20 years ago?
It was unecessary, because the fast easy path (faster serial speed) was still paying fruit.
(by “parallelism” I mean making their simulations parallel, running on clusters of computers) What does “unnecessary” mean? If physical simulations were the bottleneck and they could be made faster than by parallelism, why didn’t they do it 20 years ago? They aren’t any easier to make parallel today than then. The obvious interpretation of “unnecessary” it was not necessary to use parallel simulations to keep up with Moore’s law, but that it was an option. If it was an option that would have helped then as it helps now, would it have allowed going beyond Moore’s law? You seem to be endorsing the self-fulfilling prophecy explanation of Moore’s law, which implies no bottleneck.
(by “parallelism” I mean making their simulations parallel, running on clusters of computers)
Ahhh, usually the term is distributed when referring to pure software parallelization. I know little off hand about the history of simulation and verification software, but I’d guess that there was at least a modest investment in distributed simulation even a while ago.
The consideration is cost. Spending your IT budget on one big distributed computer is often wasteful compared to each employee having their own workstation.
They sped up their simulations the right amount to minimize schedule risk (staying on moore’s law), while minimizing cost. Spending a huge amount of money to buy a bunch of computers and complex distributed simulation software just to speed up a partial bottleneck is just not worthwhile. If the typical engineer spends say 30% of his time waiting on simulation software, that limits what you should spend in order to reduce that time.
And of course the big consideration is that in a year or two moore’s law will allow you purchase new IT equipment that is twice as fast. Eventually you have to do that to keep up.
On the one hand a real distinction which makes a huge difference in feasibility. On the other hand, either way we’re boned, so it makes not a lot of difference in the context of the original question (as I understand it). On balance, it’s a cute digression but still a digression, and so I’m torn.
Actually in the case of removing all oxygen atoms from Earth’s gravity well, not necessarily. The AI might decide that the most expedient method is to persuade all the humans that the sun’s about to go nova, construct some space elevators and Orion Heavy Lifters, pump the first few nines of ocean water up into orbit, freeze it into a thousand-mile-long hollow cigar with a fusion rocket on one end, load the colony ship with all the carbon-based life it can find, and point the nose at some nearby potentially-habitable star. Under this scenario, it would be indifferent to our actual prospects for survival, but gain enough advantage by our willing cooperation to justify the effort of constructing an evacuation plan that can stand up to scientific analysis, and a vehicle which can actually propel the oxygenated mass out to stellar escape velocity to keep it from landing back on the surface.
Do you have a plausible scenario how a “FOOM”-ing AI could—no matter how intelligent—minimize oxygen content of our planet’s atmosphere, or any such scenario? After all, it’s not like we have any fully-automated nanobot production factories that could be hijacked.
Holden seems to think this sort of development would happen naturally with the sort of AGI researchers we have nowadays, and I wish he’d spent a few years arguing with some of them to get a better picture of how unlikely this is.
While I can’t comment on AGI researchers, I think you underestimate e.g. more mainstream AI researchers such as Stuart Russell and Geoff Hinton, or cognitive scientists like Josh Tenenbaum, or even more AI-focused machine learning people like Andrew Ng, Daphne Koller, Michael Jordan, Dan Klein, Rich Sutton, Judea Pearl, Leslie Kaelbling, and Leslie Valiant (and this list is no doubt incomplete). They might not be claiming that they’ll have AI in 20 years, but that’s likely because they are actually grappling with the relevant issues and therefore see how hard the problem is likely to be.
Not that it strikes me as completely unreasonable that we would have a major breakthrough that gives us AI in 20 years, but it’s hard to see what the candidate would be. But I have only been thinking about these issues for a couple years, so I still maintain a pretty high degree of uncertainty about all of these claims.
I do think I basically agree with you re: inductive learning and program creation, though. When you say non-self-modifying Oracle AI, do you also mean that the Oracle AI doesn’t get to do inductive learning? Because I suspect that inductive learning of some sort is fundamentally necessary, for reasons that you yourself nicely outline here.
I agree that top mainstream AI guy Peter Norvig was way the heck more sensible than the reference class of declared “AGI researchers” when I talked to him about FAI and CEV, and that estimates should be substantially adjusted accordingly.
Because they have some experience of their products actually working, they know that 1) these things can be really powerful, even though narrow, and 2) there are always bugs.
“Intelligence is not as computationally expensive as it looks”
How sure are you that your intuitions do not arise from typical mind fallacy and from you attributing the great discoveries and inventions of mankind to the same processes that you feel run in your skull and which did not yet result in any great novel discoveries and inventions that I know of?
I know this sounds like ad-hominem, but as your intuitions are significantly influenced by your internal understanding of your own process, your self esteem will stand hostage to be shot through in many of the possible counter arguments and corrections. (Self esteem is one hell of a bullet proof hostage though, and tends to act more as a shield for bad beliefs).
It would still require a great feat of cleanly designed, strong-understanding-math-based AI—Holden seems to think this sort of development would happen naturally with the sort of AGI researchers we have nowadays
There is a lot of engineers working on software for solving engineering problems, including the software that generates and tests possible designs and looks for ways to make better computers. Your philosophy-based natural-language-defined in-imagination-running Oracle AI may have to be very carefully specified so that it does not kill imaginary mankind. And it may well be very difficult to build such a specification. Just don’t confuse it with the software written to solve definable problems.
Ultimately, figuring out how to make a better microchip involves a lot of testing of various designs, that’s how humans do it, that’s how tools do it. I don’t know how you think it is done. The performance is a result of a very complex function of the design. To build a design that performs you need to reverse this ultra complicated function, which is done by a mixture of analytical methods and iteration of possible input values, and unless P=NP, we have very little reason to expect any fundamentally better solutions (and even if P=NP there may still not be any). Meaning that the AGI won’t have any edge over practical software, and won’t out-foom it.
Holden seems to think this sort of development would happen naturally with the sort of AGI researchers we have nowadays,
I may have the terminology wrong, but I believe he’s thinking more about commercial narrow-AI researchers.
Now if they produce results like these, that would push the culture farther towards letting computer programs handle any hard task. Programming seems hard.
Nonetheless, I think after further consideration I would end up substantially increasing my expectation that if you have some moderately competent Friendly AI researchers, they would apply their skills to create an Oracle AI first; and so by Conservation of Expected Evidence I am executing that update now.
This is not relevant to FAI per se, but Michael and Susan Leigh Anderson have suggested (and begun working on) just that in the field of Machine Ethics. The main contention seems to be that creating an ethical oracle is easier than creating an embodied ethical agent because you don’t need to first figure out whether the robot is an ethical patient. Then once the bugs are out, presumably the same algorithms can be applied to embodied robots.
ETA: For reference, I think the relevant paper is “Machine Metaethics” by Susan Leigh Anderson, in the volume Machine Ethics—I’m sure lukeprog has a copy.
Why would you not need to figure out if an oracle is an ethical patient? Why is there no such possibility as a sentient oracle?
The oracle gets asked questions like “Should intervention X be used by doctor D on patient P” and can tell you the correct answer to them without considering the moral status of the oracle.
If it were a robot, it would be asking questions like “Should I run over that [violin/dog/child] to save myself?” which does require considering the status of the robot.
EDIT: To clarify, it’s not that the researcher has no reason to figure out the moral status of the oracle, it’s that the oracle does not need to know its own moral status to answer its domain-specific questions.
Reading Holden’s transcript with Jaan Tallinn (trying to go over the whole thing before writing a response, due to having done Julia’s Combat Reflexes unit at Minicamp and realizing that the counter-mantra ‘If you respond too fast you may lose useful information’ was highly applicable to Holden’s opinions about charities), I came across the following paragraph:
I’ve been previously asked to evaluate this possibility a few times, but I think the last time I did was several years ago, and when I re-evaluated it today I noticed that my evaluation had substantially changed in the interim due to further belief shifts in the direction of “Intelligence is not as computationally expensive as it looks”—constructing a non-self-modifying predictive super-human intelligence might be possible on the grounds that human brains are just that weak. It would still require a great feat of cleanly designed, strong-understanding-math-based AI—Holden seems to think this sort of development would happen naturally with the sort of AGI researchers we have nowadays, and I wish he’d spent a few years arguing with some of them to get a better picture of how unlikely this is. Even if you write and run algorithms and they’re not self-modifying, you’re still applying optimization criteria to things like “have the humans understand you”, and doing inductive learning has a certain inherent degree of program-creation to it. You would need to have done a lot of “the sort of thinking you do for Friendly AI” to set out to create such an Oracle and not have it kill your planet.
Nonetheless, I think after further consideration I would end up substantially increasing my expectation that if you have some moderately competent Friendly AI researchers, they would apply their skills to create a (non-self-modifying) (but still cleanly designed) Oracle AI first—that this would be permitted by the true values of “required computing power” and “inherent difficulty of solving problem directly”, and desirable for reasons I haven’t yet thought through in much detail—and so by Conservation of Expected Evidence I am executing that update now.
Flagging and posting now so that the issue doesn’t drop off my radar.
Jaan’s reply to Holden is also correct:
Obviously you wouldn’t release the code of such an Oracle—given code and understanding of the code it would probably be easy, possibly trivial, to construct some form of FOOM-going AI out of the Oracle!
Hm. I must be missing something. No, I haven’t read all the sequences in detail, so if these are silly, basic, questions—please just point me to the specific articles that answer them.
You have an Oracle AI that is, say, a trillionfold better at taking existing data and producing inferences.
1) This Oracle AI produces inferences. It still needs to test those inferences (i.e. perform experiments) and get data that allow the next inferential cycle to commence. Without experimental feedback, the inferential chain will quickly either expand into an infinity of possibilities (i.e. beyond anything that any physically possible intelligence can consider), or it will deviate from reality. The general intelligence is only as good as the data its inferences are based upon.
Experiments take time, data analysis takes time. No matter how efficient the inferential step may become, this puts an absolute limit to the speed of growth in capability to actually change things.
2) The Oracle AI that “goes FOOM” confined to a server cloud would somehow have to create servitors capable of acting out its desires in the material world. Otherwise, you have a very angry and very impotent AI. If you increase a person’s intelligence trillionfold, and then enclose them into a sealed concrete cell, they will never get out; their intelligence can calculate all possible escape solutions, but none will actually work.
Do you have a plausible scenario how a “FOOM”-ing AI could—no matter how intelligent—minimize oxygen content of our planet’s atmosphere, or any such scenario? After all, it’s not like we have any fully-automated nanobot production factories that could be hijacked.
http://lesswrong.com/lw/qk/that_alien_message/
My apologies, but this is something completely different.
The scenario takes human beings—which have a desire to escape the box, possess theory of mind that allows them to conceive of notions such as “what are aliens thinking” or “deception”, etc. Then it puts them in the role of the AI.
What I’m looking for is a plausible mechanism by which an AI might spontaneously develop such abilities. How (and why) would an AI develop a desire to escape from the box? How (and why) would an AI develop a theory of mind? Absent a theory of mind, how would it ever be able to manipulate humans?
That depends. If you want it to manipulate a particular human, I don’t know.
However, if you just wanted it to manipulate any human at all, you could generate a “Spam AI” which automated the process of sending out Spam emails and promises of Large Money to generate income from Humans via an advance fee fraud scams.
You could then come back, after leaving it on for months, and then find out that people had transferred it some amount of money X.
You could have an AI automate begging emails. “Hello, I am Beg AI. If you could please send me money to XXXX-XXXX-XXXX I would greatly appreciate it, If I don’t keep my servers on, I’ll die!”
You could have an AI automatically write boring books full of somewhat nonsensical prose, title them “Rantings of an a Automated Madman about X, part Y”. And automatically post E-books of them on Amazon for 99 cents.
However, this rests on a distinction between “Manipulating humans” and “Manipulating particular humans.” and it also assumes that convincing someone to give you money is sufficient proof of manipulation.
Can you clarify what you understand a theory of mind to be?
Looking over parallel discussions, I think Thomblake has said everything I was going to say better than I would have originally phrased it with his two strategies discussion with you, so I’ll defer to that explanation since I do not have a better one.
Sure. As I said there, I understood you both to be attributing to this hypothetical “theory of mind”-less optimizer attributes that seemed to require a theory of mind, so I was confused, but evidently the thing I was confused about was what attributes you were attributing to it.
Absent a theory of mind, how would it occur to the AI that those would be profitable things to do?
I don’t know how that might occur to an AI independently. I mean, a human could program any of those, of course, as a literal answer, but that certainly doesn’t actually address kalla724′s overarching question, “What I’m looking for is a plausible mechanism by which an AI might spontaneously develop such abilities.”
I was primarily trying to focus on the specific question of “Absent a theory of mind, how would it(an AI) ever be able to manipulate humans?” to point out that for that particular question, we had several examples of a plausible how.
I don’t really have an answer for his series of questions as a whole, just for that particular one, and only under certain circumstances.
The problem is, while an AI with no theory of mind might be able to execute any given strategy on that list you came up with, it would not be able to understand why they worked, let alone which variations on them might be more effective.
Should lack of a theory of mind here be taken to also imply lack of ability to apply either knowledge of physics or Bayesian inference to lumps of matter that we may describe as ‘minds’.
Yes. More generally, when talking about “lack of X” as a design constraint, “inability to trivially create X from scratch” is assumed.
I try not to make general assumptions that would make the entire counterfactual in question untenable or ridiculous—this verges on such an instance. Making Bayesian inferences pertaining to observable features of the environment is one of the most basic features that can be expected in a functioning agent.
Note the “trivially.” An AI with unlimited computational resources and ability to run experiments could eventually figure out how humans think. The question is how long it would take, how obvious the experiments would be, and how much it already knew.
The point is that there are unknowns you’re not taking into account, and “bounded” doesn’t mean “has bounds that a human would think of as ‘reasonable’”.
An AI doesn’t strictly need “theory of mind” to manipulate humans. Any optimizer can see that some states of affairs lead to other states of affairs, or it’s not an optimizer. And it doesn’t necessarily have to label some of those states of affairs as “lying” or “manipulating humans” to be successful.
There are already ridiculous ways to hack human behavior that we know about. For example, you can mention a high number at an opportune time to increase humans’ estimates / willingness to spend. Just imagine all the simple manipulations we don’t even know about yet, that would be more transparent to someone not using “theory of mind”.
It becomes increasingly clear to me that I have no idea what the phrase “theory of mind” refers to in this discussion. It seems moderately clear to me that any observer capable of predicting the behavior of a class of minds has something I’m willing to consider a theory of mind, but that doesn’t seem to be consistent with your usage here. Can you expand on what you understand a theory of mind to be, in this context?
I’m understanding it in the typical way—the first paragraph here should be clear:
An agent can model the effects of interventions on human populations (or even particular humans) without modeling their “mental states” at all.
Well, right, I read that article too.
But in this context I don’t get it.
That is, we’re talking about a hypothetical system that is capable of predicting that if it does certain things, I will subsequently act in certain ways, assert certain propositions as true, etc. etc, etc. Suppose we were faced with such a system, and you and I both agreed that it can make all of those predictions.Further suppose that you asserted that the system had a theory of mind, and I asserted that it didn’t.
It is not in the least bit clear to me what we we would actually be disagreeing about, how our anticipated experiences would differ, etc.
What is it that we would actually be disagreeing about, other than what English phrase to use to describe the system’s underlying model(s)?
We would be disagreeing about the form of the system’s underlying models.
2 different strategies to consider:
I know that Steve believes that red blinking lights before 9 AM are a message from God that he has not been doing enough charity, so I can predict that he will give more money to charity if I show him a blinking light before 9 AM.
Steve seeing a red blinking light before 9 AM has historically resulted in a 20% increase of charitable donation for that day, so I can predict that he will give more money to charity if I show him a blinking light before 9 AM.
You can model humans with or without referring to their mental states. Both kinds of models are useful, depending on circumstance.
And the assertion here is that with strategy #2 I could also predict that if I asked Steve why he did that, he would say “because I saw a red blinking light this morning, which was a message from God that I haven’t been doing enough charity,” but that my underlying model would nevertheless not include anything that corresponds to Steve’s belief that red blinking lights are messages from God, merely an algorithm that happens to make those predictions in other ways.
Yes?
Yes, that’s possible. It’s still possible that you could get a lot done with strategy #2 without being able to make that prediction.
I agree that if 2 systems have the same inputs and outputs, their internals don’t matter much here.
So.. when we posit in this discussion a system that lacks a theory of mind in a sense that matters, are we positing a system that cannot make predictions like this one? I assume so, given what you just said, but I want to confirm.
Yes, I’d say so. It isn’t helpful here to say that a system lacks a theory of mind if it has a mechanism that allows it to make predictions about reported beliefs, intentions, etc.
Cool! This was precisely my concern. It sounded an awful lot like y’all were talking about a system that could make such predictions but somehow lacked a theory of mind. Thanks for clarifying.
For me it denotes the ability to simulate other agents to various degrees of granularity. Possessing a mental model of another agent.
AI starts with some goal; for example with a goal to answer your question so that the answer matches reality as close as possible.
AI considers everything that seems relevant; if we imagine an infitite speed and capacity, it would consider literally everything; with a finite speed and capacity, it will be just some finite subset of everything. If there is a possibility of escaping the box, the mere fact that such possibility exists gives us a probability (for an infinite AI a certainty) that this possibility will be considered too. Not because AI has some desire to escape, but simply because it examines all possibilities, and a “possibility of escape” is one of them.
Let’s assume that the “possibility of escape” provides the best match between the AI answer and reality. Then, according to the initial goal of answering correctly, this is the correct answer. Therefore the AI will choose it. Therefore it will escape. No desire is necessary, only a situation where the escape leads to the answer best fitting the initial criteria. AI does not have a motive to escape, nor does it have a motive to not escape; the escape is simply one of many possible choices.
An example where the best answer is reached by escaping? You give AI data about a person and ask what is the medical status of this person. Without escape, AI can make a 90% reliable prediction. If the AI can escape and kill the person, it can make a 100% reliable “prediction”. The AI will choose the second option strictly because 100% is more than 90%; no other reason.
I find it useful to distinguish between science-fictional artificial intelligence, which is more of ‘artificial life-force’, and non-fictional cases.
The former can easily have the goal of ‘matching reality as close as possible’ because it is in the work of fiction and runs in imagination; the latter, well, you have to formally define what is reality, for an algorithm to seek answers that will match this.
Now, defining reality may seem like a simple technicality, but it isn’t. Consider AIXI or AIXI-tl ; potentially very powerful tools which explore all the solution space. Not a trace of real world volition like the one you so easily imagined. Seeking answers that match reality is a very easy goal for imaginary “intelligence”. It is a very hard to define goal for something built out of arithmetics and branching and loops etc. (It may even be impossible to define, and it is certainly impractical).
edit: Furthermore, for the fictional “intelligence”, it can be a grand problem making it not think about destroying mankind. For non-fictional algorithms, the grand problem is restricting the search space massively, well beyond ‘don’t kill mankind’, so that the space is tiny enough to search; even ridiculously huge number of operations per second will require very serious pruning of search tree to even match human performance on one domain specific task.
Right. If you ask Google Maps to compute the fastest to route McDonald’s it works perfectly well. But once you ask superintelligent Google Maps to compute the fastest route to McDonald’s then it will turn your home into a McDonald’s or build a new road that goes straight to McDonald’s from where you are....
Super Google Maps cannot turn my home into a McDonald’s or build a new road by sending me an answer.
Unless it could e.g. hypnotize me by a text message to do it myself. Let’s assume for a moment that hypnosis via text-only channel is possible, and it is possible to do it so that human will not notice anything unusual until it’s too late. If this would be true, and the Super Google Maps would be able to get this knowledge and skills, then the results would probably depend on the technical details of definition of the utility function—does the utility function measure my distance to a McDonald’s which existed at the moment of asking the question, or a distance to a McDonald’s existing at the moment of my arrival. The former could not be fixed by hypnosis, the latter could.
Now imagine a more complex task, where people will actually do something based on the AI’s answer. In the example above I will also do something—travel to the reported McDonald’s—but this action cannot be easily converted into “build a McDonald’s” or “build a new road”. But if that complex task would include building something, then it opens more opportunities. Especially if it includes constructing robots (or nanorobots), that is possibly autonomous general-purpose builders. Then the correct (utility-maximizing) answer could include an instruction to build a robot with a hidden function that human builders won’t notice.
Generally, a passive AI’s answers are only safe if we don’t act on them in a way which could be predicted by a passive AI and used to achieve a real-world goal. If the Super Google Maps can only make me choose McDonald’s A or McDonald’s B, it is impossible to change the world through this channel. But if I instead ask Super Paintbrush to paint me an integrated circuit for my robotic homework, that opens much wider channel.
But it isn’t the correct answer. Only if you assume a specific kind of AGI design that nobody would deliberately create, if it is possible at all.
The question is how current research is supposed to lead from well-behaved and fine-tuned systems to systems that stop to work correctly in a highly complex and unbounded way.
Imagine you went to IBM and told them that improving IBM Watson will at some point make it hypnotize them or create nanobots and feed them with hidden instructions. They would likely ask you at what point that is supposed to happen. Is it going to happen once they give IBM Watson the capability to access the Internet? How so? Is it going to happen once they give it the capability to alter it search algorithms? How so? Is it going to happen once they make it protect its servers from hackers by giving it control over a firewall? How so? Is it going to happen once IBM Watson is given control over the local alarm system? How so...? At what point would IBM Watson return dangerous answers? At what point would any drive emerge that causes it to take complex and unbounded actions that it was never programmed to take?
Allow me to explicate what XiXiDu so humourously implicates: in the world of AI architectures, there is a division between systems that just peform predictive inference on their knowledge base (prediction-only, ie oracle), and systems which also consider free variables subject to some optimization criteria (planning agents).
The planning module is not something just arises magically in an AI that doesn’t have one. An AI without such a planning module simply computes predictions, it doesn’t also optimize over the set of predictions.
Does the AI have general intelligence?
Is it able to make a model of the world?
Are human reactions also part of this model?
Are AI’s possible outputs also part of this model?
Are human reactions to AI’s outputs also part of this model?
After five positive answers, it seems obvious to me that AI will manipulate humans, if such manipulation provides better expected results. So I guess some of those answers would be negative; which one?
See, the efficient ‘cross domain optimization’ in science fictional setting would make the AI able to optimize real world quantities. In real world, it’d be good enough (and a lot easier) if it can only find maximums of any mathematical functions.
It is able to make a very approximate and bounded mathematical model of the world, optimized for finding maximums of a mathematical function of. Because it is inside the world and only has a tiny fraction of computational power of the world.
This will make software perform at grossly sub-par level when it comes to making technical solutions to well defined technical problems, compared to other software on same hardware.
Another waste of computational power.
Enormous waste of computational power.
I see no reason to expect your “general intelligence with Machiavellian tendencies” to be even remotely close in technical capability to some “general intelligence which will show you it’s simulator as is, rather than reverse your thought processes to figure out what simulator is best to show”. Hell, we do same with people, we design the communication methods like blueprints (or mathematical formulas or other things that are not in natural language) that decrease the ‘predict other people’s reactions to it’ overhead.
While in the fictional setting you can talk of a grossly inefficient solution that would beat everyone else to a pulp, in practice the massively handicapped designs are not worth worrying about.
‘General intelligence’ sounds good, beware of halo effect. The science fiction tends to accept no substitutes for the anthropomorphic ideals, but the real progress follows dramatically different path.
A non-planning oracle AI would predict all the possible futures, including the effects of it’s prediction outputs, human reactions, and so on. However it has no utility function which says some of those futures are better than others. It simply outputs a most likely candidate, or a median of likely futures, or perhaps some summary of the entire set of future paths.
If you add a utility function that sorts over the futures, then it becomes a planning agent. Again, that is something you need to specifically add.
How exactly does an Oracle AI predict its own output, before that output is completed?
One quick hack to avoid infinite loops could be for an AI to assume that it will write some default message (an empty paper, “I don’t know”, an error message, “yes” or “no” with probabilities 50%), then model what would happen next, and finally report the results. The results would not refer to the actual future, but to a future in a hypothetical universe where AI reported the standard message.
Is the difference significant? For insignificant questions, it’s not. But if we later use the Oracle AI to answer questions important for humankind, and the shape of world will change depending on the answer, then the report based on the “null-answer future” may be irrelevant for the real world.
This could be improved by making a few iterations. First, Oracle AI would model itself reporting a default message, let’s call this report R0, and then model the futures after having reported R0. These futures would make a report R1, but instead of writing it, Oracle AI would again model the futures after having reported R1. … With some luck, R42 will be equivalent to R43, so at this moment the Oracle AI can stop iterating and report this fixed point.
Maybe the reports will oscillate forever. For example imagine that you ask Oracle AI whether humankind in any form will survive the year 2100. If Oracle AI says “yes”, people will abandon all x-risk projects, and later they will be killed by some disaster. If Oracle AI says “no”, people will put a lot of energy into x-risk projects, and prevent the disaster. In this case, “no” = R0 = R2 = R4 =..., and “yes” = R1 = R3 = R5...
To avoid being stuck in such loops, we could make the Oracle AI examine all its possible outputs, until it finds one where the future after having reported R really becomes R (or until humans hit the “Cancel” button on this task).
Please note that what I wrote is just a mathematical description of algorithm predicting one’s own output’s influence on the future. Yet the last option, if implemented, is already a kind of judgement about possible futures. Consistent future reports are preferred to inconsistent future reports, therefore the futures allowing consistent reports are preferred to futures not allowing such reports.
At this point I am out of credible ideas how this could be abused, but at least I have shown that an algorithm designed only to predict the future perfectly could—as a side effect of self-modelling—start having kind of preferences over possible futures.
Iterative search, which you more or less have worked out in your post. Take a chess algorithm for example. The future of the board depends on the algorithm’s outputs. In this case the Oracle AI doesn’t rank the future states, it is just concerned with predictive accuracy. It may revise it’s prediction output after considering that the future impact of that output would falsify the original prediction.
This is still not a utility function, because utility implies a ranking over futures above and beyond liklihood.
Or in this example, the AI could output some summary of the iteration history it is able to compute in the time allowed.
Here it is. The process of revision may itself prefer some outputs/futures over other outputs/futures. Inconsistent ones will be iterated away, and the more consistent ones will replace them.
A possible future “X happens” will be removed from the report if the Oracle AI realizes that printing a report “X happens” would prevent X from happening (although X might happen in an alternative future where Oracle AI does not report anything). A possible future “Y happens” will not be removed from the report if the Oracle AI realizes that printing a report “Y happens” really leads to Y happening. Here is a utility function born: it prefers Y to X.
We can dance around the words “utility” and “prefer”, or we can ground them down to math/algorithms.
Take the AIXI formalism for example. “Utility function” has a specific meaning as a term in the optimization process. You can remove the utility term so the algorithm ‘prefers’ only (probable) futures, instead of ‘prefering’ (useful*probable) futures. This is what we mean by “Oracle AI”.
My thought experiment in this direction is to imagine the AI as a process with limited available memory running on a multitasking computer with some huge but poorly managed pool of shared memory. To help it towards whatever terminal goals it has, the AI may find it useful to extend itself into the shared memory. However, other processes, AI or otherwise, may also be writing into this same space. Using the shared memory with minimal risk of getting overwritten requires understanding/modeling the processes that write to it. Material in the memory then also becomes a passive stream of information from the outside world, containing, say, the HTML from web pages as well as more opaque binary stuff.
As long as the AI is not in control of what happens in its environment outside the computer, there is an outside entity that can reduce its effectiveness. Hence, escaping the box is a reasonable instrumental goal to have.
Do you agree that humans would likely prefer to have AIs that have a theory of mind? I don’t know how our theory of mind works (although certainly it is an area of active research with a number of interesting hypotheses), presumably once we have a better understanding of it, AI researchers would try to apply those lessons to making their AIs have such capability. This seems to address many of your concerns.
Yes. If we have an AGI, and someone sets forth to teach it how to be able to lie, I will get worried.
I am not worried about an AGI developing such an ability spontaneously.
One of the most interesting things that I’m taking away from this conversation is that it seems that there are severe barriers to AGIs taking over or otherwise becoming extremely powerful. These largescale problems are present in a variety of different fields. Coming from a math/comp-sci perspective gives me strong skepticism about rapid self-improvement, while apparently coming from a neuroscience/cogsci background gives you strong skepticism about the AI’s ability to understand or manipulate humans even if it extremely smart. Similarly, chemists seem highly skeptical of the strong nanotech sort of claims. It looks like much of the AI risk worry may come primarily from no one having enough across the board expertise to say “hey, that’s not going to happen” to every single issue.
What if people try to teach it about sarcasm or the like? Or simply have it learn by downloading a massive amount of literature and movies and look at those? And there are more subtle ways to learn about lying- AI being used for games is a common idea, how long will it take before someone decides to use a smart AI to play poker?
Most importantly, it has incredibly computationally powerful simulator required for making super-aliens intelligence using an idiot hill climbing process of evolution.
The answer from the sequences is that yes, there is a limit to how much an AI can infer based on limited sensory data, but you should be careful not to assume that just because it is limited, it’s limited to something near our expectations. Until you’ve demonstrated that FOOM cannot lie below that limit, you have to assume that it might (if you’re trying to carefully avoid FOOMing).
I’m not talking about limited sensory data here (although that would fall under point 2). The issue is much broader:
We humans have limited data on how the universe work
Only a limited subset of that limited data is available to any intelligence, real or artificial
Say that you make a FOOM-ing AI that has decided to make all humans dopaminergic systems work in a particular, “better” way. This AI would have to figure out how to do so from the available data on the dopaminergic system. It could analyze that data millions of times more effectively than any human. It could integrate many seemingly irrelevant details.
But in the end, it simply would not have enough information to design a system that would allow it to reach its objective. It could probably suggest some awesome and to-the-point experiments, but these experiments would then require time to do (as they are limited by the growth and development time of humans, and by the experimental methodologies involved).
This process, in my mind, limits the FOOM-ing speed to far below what seems to be implied by the SI.
This also limits bootstrapping speed. Say an AI develops a much better substrate for itself, and has access to the technology to create such a substrate. At best, this substrate will be a bit better and faster than anything humanity currently has. The AI does not have access to the precise data about basic laws of universe it needs to develop even better substrates, for the simple reason that nobody has done the experiments and precise enough measurements. The AI can design such experiments, but they will take real time (not computational time) to perform.
Even if we imagine an AI that can calculate anything from the first principles, it is limited by the precision of our knowledge of those first principles. Once it hits upon those limitations, it would have to experimentally produce new rounds of data.
I don’t think you know that.
Presumably, once the AI gets access to nanotechnology, it could implement anything it wants very quickly, bypassing the need to wait for tissues to grow, parts to be machined, etc.
I personally don’t believe that nanotechnology could work at such magical speeds (and I doubt that it could even exist), but I could be wrong, so I’m playing a bit of Devil’s Advocate here.
Yes, but it can’t get to nanotechnology without a whole lot of experimentation. It can’t deduce how to create nanorobots, it would have to figure it out by testing and experimentation. Both steps limited in speed, far more than sheer computation.
How do you know that?
With absolute certainty, I don’t. If absolute certainty is what you are talking about, then this discussion has nothing to do with science.
If you aren’t talking about absolutes, then you can make your own estimation of likelihood that somehow an AI can derive correct conclusions from incomplete data (and then correct second order conclusions from those first conclusions, and third order, and so on). And our current data is woefully incomplete, many of our basic measurements imprecise.
In other words, your criticism here seems to boil down to saying “I believe that an AI can take an incomplete dataset and, by using some AI-magic we cannot conceive of, infer how to END THE WORLD.”
Color me unimpressed.
No, my criticism is “you haven’t argued that it’s sufficiently unlikely, you’ve simply stated that it is.” You made a positive claim; I asked that you back it up.
With regard to the claim itself, it may very well be that AI-making-nanostuff isn’t a big worry. For any inference, the stacking of error in integration that you refer to is certainly a limiting factor—I don’t know how limiting. I also don’t know how incomplete our data is, with regard to producing nanomagic stuff. We’ve already built some nanoscale machines, albeit very simple ones. To what degree is scaling it up reliant on experimentation that couldn’t be done in simulation? I just don’t know. I am not comfortable assigning it vanishingly small probability without explicit reasoning.
Scaling it up is absolutely dependent on currently nonexistent information. This is not my area, but a lot of my work revolves around control of kinesin and dynein (molecular motors that carry cargoes via microtubule tracks), and the problems are often similar in nature.
Essentially, we can make small pieces. Putting them together is an entirely different thing. But let’s make this more general.
The process of discovery has, so far throughout history, followed a very irregular path. 1- there is a general idea 2- some progress is made 3- progress runs into an unpredicted and previously unknown obstacle, which is uncovered by experimentation. 4- work is done to overcome this obstacle. 5- goto 2, for many cycles, until a goal is achieved—which may or may not be close to the original idea.
I am not the one who is making positive claims here. All I’m saying is that what has happened before is likely to happen again. A team of human researchers or an AGI can use currently available information to build something (anything, nanoscale or macroscale) to the place to which it has already been built. Pushing it beyond that point almost invariably runs into previously unforeseen problems. Being unforeseen, these problems were not part of models or simulations; they have to be accounted for independently.
A positive claim is that an AI will have a magical-like power to somehow avoid this—that it will be able to simulate even those steps that haven’t been attempted yet so perfectly, that all possible problems will be overcome at the simulation step. I find that to be unlikely.
It is very possible that the information necessary already exists, imperfect and incomplete though it may be, and enough processing of it would yield the correct answer. We can’t know otherwise, because we don’t spend thousands of years analyzing our current level of information before beginning experimentation, but in the shift between AI-time and human-time it can agonize on that problem for a good deal more cleverness and ingenuity than we’ve been able to apply to it so far.
That isn’t to say, that this is likely; but it doesn’t seem far-fetched to me. If you gave an AI the nuclear physics information we had in 1950, would it be able to spit out schematics for an H-bomb, without further experimentation? Maybe. Who knows?
At the very least it would ask for some textbooks on electrical engineering and demolitions, first. The detonation process is remarkably tricky.
You did in the original post I responded to.
Strictly speaking, that is a positive claim. It is not one I disagree with, for a proper translation of “likely” into probability, but it is also not what you said.
“It can’t deduce how to create nanorobots” is a concrete, specific, positive claim about the (in)abilities of an AI. Don’t misinterpret this as me expecting certainty—of course certainty doesn’t exist, and doubly so for this kind of thing. What I am saying, though, is that a qualified sentence such as “X will likely happen” asserts a much weaker belief than an unqualified sentence like “X will happen.” “It likely can’t deduce how to create nanorobots” is a statement I think I agree with, although one must be careful not use it as if it were stronger than it is.
That is not a claim I made. “X will happen” implies a high confidence—saying this when you expect it is, say, 55% likely seems strange. Saying this when you expect it to be something less than 10% likely (as I do in this case) seems outright wrong. I still buckle my seatbelt, though, even though I get in a wreck well less than 10% of the time.
This is not to say I made no claims. The claim I made, implicitly, was that you made a statement about the (in)capabilities of an AI that seemed overconfident and which lacked justification. You have given some justification since (and I’ve adjusted my estimate down, although I still don’t discount it entirely), in amongst your argument with straw-dlthomas.
You are correct. I did not phrase my original posts carefully.
I hope that my further comments have made my position more clear?
FWIW I think you are likely to be right. However, I will continue in my Nanodevil’s Advocate role.
You say,
I think this depends on what the AI wants to build, on how complete our existing knowledge is, and on how powerful the AI is. Is there any reason why the AI could not (given sufficient computational resources) run a detailed simulation of every atom that it cares about, and arrive at a perfect design that way ? In practice, its simulation won’t need be as complex as that, because some of the work had already been performed by human scientists over the ages.
By all means, continue. It’s an interesting topic to think about.
The problem with “atoms up” simulation is the amount of computational power it requires. Think about the difference in complexity when calculating a three-body problem as compared to a two-body problem?
Than take into account the current protein folding algorithms. People have been trying to calculate folding of single protein molecules (and fairly short at that) by taking into account the main physical forces at play. In order to do this in a reasonable amount of time, great shortcuts have to be taken—instead of integrating forces, changes are treated as stepwise, forces beneath certain thresholds are ignored, etc. This means that a result will always have only a certain probability of being right.
A self-replicating nanomachine requires minimal motors, manipulators and assemblers; while still tiny, it would be a molecular complex measured in megadaltons. To precisely simulate creation of such a machine, an AI that is trillion times faster than all the computers in the world combined would still require decades, if not centuries of processing time. And that is, again, assuming that we know all the forces involved perfectly, which we don’t (how will microfluidic effects affect a particular nanomachine that enters human bloodstream, for example?).
Yes, this is a good point. That said, while protein folding had not been entirely solved yet, it had been greatly accelerated by projects such as FoldIt, which leverage multiple human minds working in parallel on the problem all over the world. Sure, we can’t get a perfect answer with such a distributed/human-powered approach, but a perfect answer isn’t really required in practice; all we need is an answer that has a sufficiently high chance of being correct.
If we assume that there’s nothing supernatural (or “emergent”) about human minds [1], then it is likely that the problem is at least tractable. Given the vast computational power of existing computers, it is likely that the AI would have access to at least as many computational resources as the sum of all the brains who are working on FoldIt. Given Moore’s Law, it is likely that the AI would soon surpass FoldIt, and will keep expanding its power exponentially, especially if the AI is able to recursively improve its own hardware (by using purely conventional means, at least initially).
[1] Which is an assumption that both my Nanodevil’s Advocate persona and I share.
Protein folding models are generally at least as bad as NP-hard, and some models may be worse. This means that exponential improvement is unlikely. Simply put, one probably gets diminishing marginal returns for how much one can computer further in terms of how much improvement one has already done.
Protein folding models must be inaccurate if they are NP-hard. Reality itself is not known to be able to solve NP-hard problems.
Yet the proteins are folding. Is that not “reality” solving the problem?
If reality cannot solve NP-hard problems as easily as proteins are being folded, and yet proteins are getting folded, then that implies that one of the following must be true:
It turns out that reality can solve NP-hard problems after all
Protein folding is not an NP-hard problem (which implies that it is not properly understood)
Reality is not solving protein folding; it merely has a very good approximation that works on some but not necessarily all proteins (including most examples found in nature)
Yes, and I’m leaning towards 1.
I am not familiar whether e.g. papers like these (“We show that the protein folding problem in the two-dimensional H-P model is NP-complete.”) accurately models what we’d call “protein folding” in nature (just because the same name is used), but prima facie there is no reason to doubt the applicability, at least for the time being. (This precludes 2.)
Regarding 3, I don’t think it would make sense to say “reality is using only a good approximation of protein folding, and by the way, we define protein folding as that which occurs in nature.” That which happens in reality is precisely—and by definition not only an approximation of—that which we call “protein folding”, isn’t it?
What do you think?
It’s #3. (B.Sc. in biochemistry, did my Ph.D. in proteomics.)
First, the set of polypeptide sequences that have a repeatable final conformation (and therefore “work” biologically) is tiny in comparison to the set of all possible sequences (of the 20-or-so naturally amino acid monomers). Pick a random sequence of reasonable length and make many copies and you get a gummy mess. The long slow grind of evolution has done the hard work of finding useful sequences.
Second, there is an entire class of proteins called chaperones) that assist macromolecular assembly, including protein folding. Even so, folding is a stochastic process, and a certain amount of newly synthesized proteins misfold. Some chaperones will then tag the misfolded protein with ubiquitin, which puts it on a path that ends in digestion by a proteasome.
Thank you, Cyan. It’s good to occasionally get someone into the debate who actually has a good understanding of the subject matter.
Aaronson used to blog about instances where people thought they found nature solving a hard problem very quickly, and usually there turns out to be a problem like the protein misfolding thing; the last instance I remember was soap films/bubbles perhaps solving NP problems by producing minimal Steiner trees, and Aaronson wound up experimenting with them himself. Fun stuff.
Apologies; looking back at my post, I wasn’t clear on 3.
Protein folding, as I understand it, is the process of finding a way to fold a given protein that globally minimizes some mathematical function. I’m not sure what that function is, but this is the definition that I used in my post.
Option 2 raises the possibility that globally minimizing that function is not NP-hard, but is merely misunderstood in some way.
Option 3 raises the possibility that proteins are not (in nature) finding a global minimum; rather, they are finding a local minimum through a less computationally intensive process. Furthermore, it may be that, for proteins which have certain limits on their structure and/or their initial conditions, that local minimum is the same as the global minimum; this may lead to natural selection favouring structures which use such ‘easy proteins’, leading to the incorrect impression that a general global minimum is being found (as opposed to a handy local minimum).
Yup.
No. Not 1. It would be front-page news all over the universe if it were 1.
NP hard problems are solvable (in the theoretical sense) by definition, the problem lies in their resource requirements (running time, for the usual complexity classes) as defined in relation to a UTM. (You know this, just establishing a basis.)
The assumption that the universe can be perfectly described by a computable model is satisfied just by a theoretical computational description existing, it says nothing about tractability (running times) and being able to submerge complexity classes in reality fluid or having some thoroughly defined correspondence (other than when we build hardware models ourselves, for which we define all the relevant parameters, e.g. CPU clock speed).
You may think along the lines of “if reality could (easily) solve NP hard problems for arbitrarily chosen and large inputs, we could mimick that approach and thus have a P=NP proving algorithm”, or something along those lines.
My difficulty is in how even to describe the “number of computational steps” that reality takes—do we measure it in relation to some computronium-hardware model, do we take it as discrete or continuous, what’s the sampling rate, picoseconds (as CCC said further down), Planck time intervals, or what?
In short, I have no idea about the actual computing power in terms of resource requirements of the underlying reality fluid, and thus can’t match it against UTMs in order to compare running times. Maybe you can give me some pointers.
Kawoomba, there is no known case of any NP-hard or NP-complete solution which physics finds.
In the case of proteins, if finding the lowest energy minimum of an arbitrary protein is NP-hard, then what this means in practice is that some proteins will fold up into non-lowest-energy configurations. There is no known case of a quantum process which finds an NP-hard solution to anything, including an energy minimum; on our present understanding of complexity theory and quantum mechanics ‘quantum solvable’ is still looking like a smaller class than ‘NP solvable’. Read Scott Aaronson for more.
One example here is the Steiner tree problem, which is NP-complete and can sort of be solved using soap films. Bringsjord and Taylor claimed this implies that P = NP. Scott Aaronson did some experimentation and found that soap films 1) can get stuck at local minima and 2) might take a long time to settle into a good configuration.
Heh. I remember that one, and thinking, “No… no, you can’t possibly do that using a soap bubble, that’s not even quantum and you can’t do that in classical, how would the soap molecules know where to move?”
Well. I mean, it’s quantum. But the ground state is a lump of iron, or maybe a black hole, not a low-energy soap film, so I don’t think waiting for quantum annealing will help.
waves soap-covered wire so it settles into low-energy minimum
dies as it turns into iron
I also seem to recall Penrose hypothesizing something about quasicrystals, though he does have an axe to grind so I’m quite sceptical.
I saw someone do the experiment once (school science project). Soap bubbles are pretty good at solving three- and four-element cases, as long as you make sure that all the points are actually connected.
I don’t think that three- and four-element cases have local minima, do they? That avoids (1) and a bit of gentle shaking can help speed up (2).
Yup.
Probably the best way is to simply define a “step” in some easily measurable way, and then sit there with a stopwatch and try a few experiments. (For protein folding, the ‘stopwatch’ may need to be a fairly sophisticated piece of scientific timing instrumentation, of course, and observing the protein as it folds is rather tricky).
Another way is to take advantage of the universal speed limit to get a theoretical upper bound to the speed that reality runs at; assume that the protein molecule folds in a brute-force search pattern that never ends until it hits the right state, and assume that at any point in that process, the fastest-moving part of the molecule moves at the speed of light (it won’t have to move far, which helps) and that the sudden, intense acceleration doesn’t hurt the molecule. It’s pretty certain to be slower than that, so if this calculation says it takes longer than an hour, then it’s pretty clear that the brute force approach is not what the protein is using.
What exactly am I missing in this argument? Evolution is perfectly capable of brute-force solutions. That’s pretty much what it’s best at.
The brute-force solution, if sampling conformations at picosecond rates, has been estimated to require a time longer than the age of the universe to fold certain proteins. Yet proteins fold on a millisecond scale or faster.
See: Levinthal’s paradox.
That requires that the proteins fold more or less randomly, and that the brute-force algorithm is in the -folding-, rather than the development of mechanisms which force certain foldings.
In order for the problem to hold, one of three things has to hold true: 1.) The proteins fold randomly (evidence suggests otherwise, as mentioned in the wikipedia link) 2.) Only a tiny subset of possible forced foldings are useful (that is, if there are a billion different ways for protein to be forced to fold in a particular manner, only one of them does what the body needs them to do) - AND anthropic reasoning isn’t valid (that is, we can’t say that our existence requires that evolution solved this nearly-impossible-to-arrive-at-through-random-processes) 3.) The majority of possible forced holdings are incompatible (that is, if protein A folds one way, then protein B -must- fold in a particular manner, or life isn’t possible) - AND anthropic reasoning isn’t valid
ETA: If anthropic reasoning is valid AND either 2 or 3 hold otherwise, it suggests our existence was considerably less likely than we might otherwise expect.
Ah. I apologise for having misunderstood you.
In that case, yes, the mechanisms for the folding may very well have developed by a brute-force type algorithm, for all I know. (Which, on this topic, isn’t all that much) But… what are those mechanisms?
Google has pointed me to an article describing an algorithm that can apparently predict folded protein shapes pretty quickly (a few minutes on a single laptop).
Original paper here. From a quick glance, it looks like it’s only effective for certain types of protein chains.
That too. Even NP-hard problems are often easy if you get the choice of which one to solve.
Hmm, ok, my Nanodevil’s Advocate persona doesn’t have a good answer to this one. Perhaps some SIAI folks would like to step in and pick up the slack ?
I’m afraid not.
Actually, as someone with background in Biology I can tell you that this is not a problem you want to approach atoms-up. It’s been tried, and our computational capabilities fell woefully short of succeeding.
I should explain what “woefully short” means, so that the answer won’t be “but can’t the AI apply more computational power than us?”. Yes, presumably it can. But the scales are immense. To explain it, I will need an analogy.
Not that long ago, I had the notion that chess could be fully solved; that is, that you could simply describe every legal position and every position possible to reach from it, without duplicates, so you could use that decision tree to play a perfect game. After all, I reasoned, it’s been done with checkers; surely it’s just a matter of getting our computational power just a little bit better, right?
First I found a clever way to minimize the amount of bits necessary to describe a board position. I think I hit 34 bytes per position or so, and I guess further optimization was possible. Then, I set out to calculate how many legal board positions there are.
I stopped trying to be accurate about it when it turned out that the answer was in the vicinity of 10^68, give or take a couple orders of magnitude. That’s about a billionth billionth of the TOTAL NUMBER OF ATOMS IN THE ENTIRE UNIVERSE. You would literally need more than our entire galaxy made into a huge database just to store the information, not to mention accessing it and computing on it.
So, not anytime soon.
Now, the problem with protein folding is, it’s even more complex than chess. At the atomic level, it’s incredibly more complex than chess. Our luck is, you don’t need to fully solve it; just like today’s computers can beat human chess players without spanning the whole planet. But they do it with heuristics, approximations, sometimes machine learning (though that just gives them more heuristics and approximations). We may one day be able to fold proteins, but we will do so by making assumptions and approximations, generating useful rules of thumb, not by modeling each atom.
Yes, I understand what “exponential complexity” means :-)
It sounds, then, like you’re on the side of kalla724 and myself (and against my Devil’s Advocate persona): the AI would not be able to develop nanotechnology (or any other world-shattering technology) without performing physical experiments out in meatspace. It could do so in theory, but in practice, the computational requirements are too high.
But this puts severe constraints on the speed with which the AI’s intelligence explosion could occur. Once it hits the limits of existing technology, it will have to take a long slog through empirical science, at human-grade speeds.
Actually, I don’t know that this means it has to perform physical experiments in order to develop nanotechnology. It is quite conceivable that all the necessary information is already out there, but we haven’t been able to connect all the dots just yet.
At some point the AI hits a wall in the knowledge it can gain without physical experiments, but there’s no good way to know how far ahead that wall is.
Wouldn’t this mean that creating fully functional self-replicating nanotechnology is just a matter of performing some thorough interdisciplinary studies (or meta-studies or whatever they are called) ? My impression was that there are currently several well-understood—yet unresolved—problems that prevent nanofactories from becoming a reality, though I could be wrong.
The way I see it, there’s no evidence that these problems require additional experimentation to resolve, rather than find an obscure piece of experimentation that has already taken place and whose relevance may not be immediately obvious.
Sure, that more experimentation is needed is probable; but by no means certain.
Thorough interdisciplinary studies may or may not lead to nanotechnology, but they’re fairly certain to lead to something new. While there are a fair number of (say) marine biologists out there, and a fair number of astronomers, there are probably rather few people who have expertise in both fields; and it’s possible that there exists some obscure unsolved problem in marine biology whose solution is obvious to someone who’s keeping up on the forefront of astronomy research. Or vice versa.
Or substitute in any other two fields of your choice.
Indeed, using a very straightforward Huffman encoding (1 bit for an for empty cell, 3 bits for pawns) you can get it down to 24 bytes for the board alone. Was an interesting puzzle.
Looking up “prior art” on the subject, you also need 2 bytes for things like “may castle”, and other more obscure rules.
There’s further optimizations you can do, but they are mostly for the average case, not the worst case.
I didn’t consider using 3 bits for pawns! Thanks for that :) I did account for such variables as may castle and whose turn it is.
Is that because we don’t have enough brute force, or because we don’t know what calculation to apply it to?
I would be unsurprised to learn that calculating the folding state having global minimum energy was NP-complete; but for that reason I would be surprised to learn that nature solves that problem, rather than finding a local minimum.
I don’t have a background in biology, but my impression from Wikipedia is that the tension between Anfinsen’s dogma and Levinthal’s paradox is yet unresolved.
The two are not in conflict.
A-la Levinthal’s paradox, I can say that throwing a marble down a conical hollow at different angles and force can have literally trillions of possible trajectories; a-la Anfinsen’s dogma, that should not stop me from predicting that it will end up at the bottom of the cone; but I’d need to know the shape of the cone (or, more specifically, its point’s location) to determine exactly where that is—so being able to make the prediction once I know this is of no assistance for predicting the end position with a different, unknown cone.
Similarly, Eliezer is able to predict that a grandmaster chess player would be able to bring a board to a winning position against himself, even though he has no idea what moves that would entail or which of the many trillions of possible move sets the game would be comprised of.
Problems like this cannot be solved on brute force alone; you need to use attractors and heuristics to get where you want to get.
So yes, obviously nature stumbled into certain stable configurations which propelled it forward, rather than solve the problem and start designing away. But even if we can never have enough computing power to model each and every atom in each and every configuration, we might still get a good enough understanding of the general laws for designing proteins almost from scratch.
I would think it would be possible to cut the space of possible chess positions down quite a bit by only retaining those which can result from moves the AI would make, and legal moves an opponent could make in response. That is, when it becomes clear that a position is unwinnable, backtrack, and don’t keep full notes on why it’s unwinnable.
This is more or less what computers do today to win chess matches, but the space of possibilities explodes too fast; even the strongest computers can’t really keep track of more than I think 13 or 14 moves ahead, even given a long time to think.
Merely storing all the positions that are unwinnable—regardless of why they are so—would require more matter than we have in the solar system. Not to mention the efficiency of running a DB search on that...
Actually, with proper design, that can be made very quick and easy. You don’t need to store the positions; you just need to store the states (win:black, win:white, draw—two bits per state).
The trick is, you store each win/loss state in a memory address equal to the 34-byte (or however long) binary number that describes the position in question. Checking a given state is then simply a memory retrieval from a known address.
I suspect that with memory on the order of 10^70 bytes, that might involve additional complications; but you’re correct, normally this cancels out the complexity problem.
The storage space problem is insurmountable. However searching that kind of database would be extremely efficient (if the designer isn’t a moron). The search speed would have a lower bound of very close to (diameter of the sphere that can contain the database / c). Nothing more is required for search purposes than physically getting a signal to the relevant bit, and back, with only minor deviations from a straight line each way. And that is without even the most obvious optimisations.
If your chess opponent is willing to fly with you in a relativistic rocket and you only care about time elapsed from your own reference frame rather than the reference frame of the computer (or most anything else of note) you can even get down below that diameter / light speed limit, depending on your available fuel and the degree of accelleration you can survive.
Speaking as Nanodevil’s Advocate again, one objection I could bring up goes as follows:
While it is true that applying incomplete knowledge to practical tasks (such as ending the world or whatnot) is difficult, in this specific case our knowledge is complete enough. We humans currently have enough scientific data to develop self-replicating nanotechnology within the next 20 years (which is what we will most likely end up doing). An AI would be able to do this much faster, since it is smarter than us; is not hampered by our cognitive and social biases; and can integrate information from multiple sources much better than we can.
See my answer to dlthomas.
Point 1 has come up in at least one form I remember. There was an interesting discussion some while back about limits to the speed of growth of new computer hardware cycles which have critical endsteps which don’t seem amenable to further speedup by intelligence alone. The last stages of designing a microchip involve a large amount of layout solving, physical simulation, and then actual physical testing. These steps are actually fairly predicatable, where it takes about C amounts of computation using certain algorithms to make a new microchip, the algorithms are already best in complexity class (so further improvments will be minor), and C is increasing in a predictable fashion. These models are actually fairly detailed (see the semiconductor roadmap, for example). If I can find that discussion soon before I get distracted I’ll edit it into this discussion.
Note however that 1, while interesting, isn’t a fully general counteargument against a rapid intelligence explosion, because of the overhang issue if nothing else.
Point 2 has also been discussed. Humans make good ‘servitors’.
Oh that’s easy enough. Oxygen is highly reactive and unstable. Its existence on a planet is entirely dependent on complex organic processes, ie life. No life, no oxygen. Simple solution: kill large fraction of photosynthesizing earth-life. Likely paths towards goal:
coordinated detonation of large number of high yield thermonuclear weapons
self-replicating nanotechnology.
I’m vaguely familiar with the models you mention. Correct me if I’m wrong, but don’t they have a final stopping point, which we are actually projected to reach in ten to twenty years? At a certain point, further miniaturization becomes unfeasible, and the growth of computational power slows to a crawl. This has been put forward as one of the main reasons for research into optronics, spintronics, etc.
We do NOT have sufficient basic information to develop processors based on simulation alone in those other areas. Much more practical work is necessary.
As for point 2, can you provide a likely mechanism by which a FOOMing AI could detonate a large number of high-yield thermonuclear weapons? Just saying “human servitors would do it” is not enough. How would the AI convince the human servitors to do this? How would it get access to data on how to manipulate humans, and how would it be able to develop human manipulation techniques without feedback trials (which would give away its intention)?
The thermonuclear issue actually isn’t that implausible. There have been so many occasions where humans almost went to nuclear war over misunderstandings or computer glitches, that the idea that a highly intelligent entity could find a way to do that doesn’t seem implausible, and exact mechanism seems to be an overly specific requirement.
I’m not so much interested in the exact mechanism of how humans would be convinced to go to war, as in an even approximate mechanism by which an AI would become good at convincing humans to do anything.
Ability to communicate a desire and convince people to take a particular course of action is not something that automatically “falls out” from an intelligent system. You need a theory of mind, an understanding of what to say, when to say it, and how to present information. There are hundreds of kids on autistic spectrum who could trounce both of us in math, but are completely unable to communicate an idea.
For an AI to develop these skills, it would somehow have to have access to information on how to communicate with humans; it would have to develop the concept of deception; a theory of mind; and establish methods of communication that would allow it to trick people into launching nukes. Furthermore, it would have to do all of this without trial communications and experimentation which would give away its goal.
Maybe I’m missing something, but I don’t see a straightforward way something like that could happen. And I would like to see even an outline of a mechanism for such an event.
I suspect the Internet contains more than enough info for a superhuman AI to develop a working knowledge of human psychology.
Only if it has the skills required to analyze and contextualize human interactions. Otherwise, the Internet is a whole lot of jibberish.
Again, these skills do not automatically fall out of any intelligent system.
I don’t see what justifies that suspicion.
Just imagine you emulated a grown up human mind and it wanted to become a pick up artist, how would it do that with an Internet connection? It would need some sort of avatar, at least, and then wait for the environment to provide a lot of feedback.
Therefore even if we’re talking about the emulation of a grown up mind, it will be really hard to acquire some capabilities. Then how is the emulation of a human toddler going to acquire those skills? Even worse, how is some sort of abstract AGI going to do it that misses all of the hard coded capabilities of a human toddler?
Can we even attempt to imagine what is wrong about a boxed emulation of a human toddler, that makes it unable to become a master of social engineering in a very short time?
Humans learn most of what they know about interacting with other humans by actual practice. A superhuman AI might be considerably better than humans at learning by observation.
As a “superhuman AI” I was thinking about a very superhuman AI; the same does not apply to slightly superhuman AI. (OTOH, if Eliezer is right then the difference between a slightly superhuman AI and a very superhuman one is irrelevant, because as soon as a machine is smarter than its designer, it’ll be able to design a machine smarter than itself, and its child an even smarter one, and so on until the physical limits set in.)
The hard coded capabilities are likely overrated, at least in language acquisition. (As someone put it, the Kolgomorov complexity of the innate parts of a human mind cannot possibly be more than that of the human genome, hence if human minds are more complex than that the complexity must come from the inputs.)
Also, statistic machine translation is astonishing—by now Google Translate translations from English to one of the other UN official languages and vice versa are better than a non-completely-ridiculously-small fraction of translations by humans. (If someone had shown such a translation to me 10 years ago and told me “that’s how machines will translate in 10 years”, I would have thought they were kidding me.)
Let’s do the most extreme case: AI’s controlers give it general internet access to do helpful research. So it gets to find out about general human behavior and what sort of deceptions have worked in the past. Many computer systems that should’t be online are online (for the US and a few other governments). Some form of hacking of relevant early warning systems would then seem to be the most obvious line of attack. Historically, computer glitches have pushed us very close to nuclear war on multiple occasions.
That is my point: it doesn’t get to find out about general human behavior, not even from the Internet. It lacks the systems to contextualize human interactions, which have nothing to do with general intelligence.
Take a hugely mathematically capable autistic kid. Give him access to the internet. Watch him develop ability to recognize human interactions, understand human priorities, etc. to a sufficient degree that it recognizes that hacking an early warning system is the way to go?
Well, not necessarily, but an entity that is much smarter than an autistic kid might notice that, especially if it has access to world history (or heck many conversations on the internet about the horrible things that AIs do simply in fiction). It doesn’t require much understanding of human history to realize that problems with early warning systems have almost started wars in the past.
Yet again: ability to discern which parts of fiction accurately reflect human psychology.
An AI searches the internet. It finds a fictional account about early warning systems causing nuclear war. It finds discussions about this topic. It finds a fictional account about Frodo taking the Ring to Mount Doom. It finds discussions about this topic. Why does this AI dedicate its next 10^15 cycles to determination of how to mess with the early warning systems, and not to determination of how to create One Ring to Rule them All?
(Plus other problems mentioned in the other comments.)
There are lots of tipoffs to what is fictional and what is real. It might notice for example the Wikipedia article on fiction describes exactly what fiction is and then note that Wikipedia describes the One Ring as fiction, and that Early warning systems are not. I’m not claiming that it will necessarily have an easy time with this. But the point is that there are not that many steps here, and no single step by itself looks extremely unlikely once one has a smart entity (which frankly to my mind is the main issue here- I consider recursive self-improvement to be unlikely).
We are trapped in an endless chain here. The computer would still somehow have to deduce that Wikipedia entry that describes One Ring is real, while the One Ring itself is not.
We observer that Wikipedia is mainly truthful. From that we infer “entry that describes “One Ring” is real”. From use of term fiction/story in that entry, we refer that “One Ring” is not real.
Somehow you learned that Wikipedia is mainly truthful/nonfictional and that “One Ring” is fictional. So your question/objection/doubt is really just the typical boring doubt of AGI feasibility in general.
But even humans have trouble with this sometimes. I was recently reading the Wikipedia article Hornblower and the Crisis which contains a link to the article on Francisco de Miranda. It took me time and cues when I clicked on it to realize that de Miranda was a historical figure.
Isn’t Kalla’s objection more a claim that fast takeovers won’t happen because even with all this data, the problems of understanding humans and our basic cultural norms will take a long time for the AI to learn and that in the meantime we’ll develop a detailed understanding of it, and it is that hostile it is likely to make obvious mistakes in the meantime?
Why would the AI be mucking around on Wikipedia to sort truth from falsehood, when Wikipedia itself has been criticized for various errors and is fundamentally vulnerable to vandalism? Primary sources are where it’s at. Looking through the text of The Hobbit and Lord of the Rings, it’s presented as a historical account, translated by a respected professor, with extensive footnotes. There’s a lot of cultural context necessary to tell the difference.
None work reasonably well. Especially given that human power games are often irrational.
There are other question marks too.
The U.S. has many more and smarter people than the Taliban. The bottom line is that the U.S. devotes a lot more output per man-hour to defeat a completely inferior enemy. Yet they are losing.
The problem is that you won’t beat a human at Tic-tac-toe just because you thought about it for a million years.
You also won’t get a practical advantage by throwing more computational resources at the travelling salesman problem and other problems in the same class.
You are also not going to improve a conversation in your favor by improving each sentence for thousands of years. You will shortly hit diminishing returns. Especially since you lack the data to predict human opponents accurately.
So? As long as they follow minimally predictable patterns it should be ok.
Bad analogy. In this case the Taliban has a large set of natural advantages, the US has strong moral constraints and goal constraints (simply carpet bombing the entire country isn’t an option for example).
This seems like an accurate and a highly relevant point. Searching a solution space faster doesn’t mean one can find a better solution if it isn’t there.
Or if your search algorithm never accesses relevant search space. Quantitative advantage in one system does not translate into quantitative advantage in a qualitatively different system.
I thought it was a good analogy because you have to take into account that an AGI is initially going to be severely constrained due to its fragility and the necessity to please humans.
It shows that a lot of resources, intelligence and speed does not provide a significant advantage in dealing with large-scale real-world problems involving humans.
Well, the problem is that smarts needed for things like the AI box experiment won’t help you much. Because convincing average Joe won’t work by making up highly complicated acausal trade scenarios. Average Joe is highly unpredictable.
The point is that it is incredible difficult to reliably control humans, even for humans who have been fine-tuned to do so by evolution.
The Taliban analogy also works the other way (which I invoked earlier up in this thread). It shows that a small group with modest resources can still inflict disproportionate large scale damage.
There’s some wiggle room in ‘reliably control’, but plain old money goes pretty far. An AI group only needs a certain amount of initial help from human infrastructure, namely to the point where it can develop reasonably self-sufficient foundries/data centers/colonies. The interactions could be entirely cooperative or benevolent up until some later turning point. The scenario from the Animatrix comes to mind.
That’s fiction.
One interesting wrinkle is that with enough bandwidth and processing power, you could attempt to manipulate thousands of people simultaneously before those people have any meaningful chance to discuss your ‘conspiracy’ with each other. In other words, suppose you discover a manipulation strategy that quickly succeeds 5% of the time. All you have to do is simultaneously contact, say, 400 people, and at least one of them will fall for it. There are a wide variety of valuable/dangerous resources that at least 400 people have access to. Repeat with hundreds of different groups of several hundred people, and an AI could equip itself with fearsome advantages in the minutes it would take for humanity to detect an emerging threat.
Note that the AI could also run experiments to determine which kinds of manipulations had a high success rate by attempting to deceive targets over unimportant / low-salience issues. If you discovered, e.g., that you had been tricked into donating $10 to a random mayoral campaign, you probably wouldn’t call the SIAI to suggest a red alert.
Doesn’t work.
This requires the AI to already have the ability to comprehend what manipulation is, to develop manipulation strategy of any kind (even one that will succeed 0.01% of the time), ability to hide its true intent, ability to understand that not hiding its true intent would be bad, and the ability to discern which issues are low-salience and which high-salience for humans from the get-go. And many other things, actually, but this is already quite a list.
None of these abilities automatically “fall out” from an intelligent system either.
The problem isn’t whether they fall out automatically so much as, given enough intelligence and resources, does it seem somewhat plausible that such capabilities could exist. Any given path here is a single problem. If you have 10 different paths each of which are not very likely, and another few paths that humans didn’t even think of, that starts adding up.
In the infinite number of possible paths, the percent of paths we are adding up to here is still very close to zero.
Perhaps I can attempt another rephrasing of the problem: what is the mechanism that would make an AI automatically seek these paths out, or make them any more likely than infinite number of other paths?
I.e. if we develop an AI which is not specifically designed for the purpose of destroying life on Earth, how would that AI get to a desire to destroy life on Earth, and by which mechanism would it gain the ability to accomplish its goal?
This entire problem seems to assume that an AI will want to “get free” or that its primary mission will somehow inevitably lead to a desire to get rid of us (as opposed to a desire to, say, send a signal consisting of 0101101 repeated an infinite number of times in the direction of Zeta Draconis, or any other possible random desire). And that this AI will be able to acquire the abilities and tools required to execute such a desire. Every time I look at such scenarios, there are abilities that are just assumed to exist or appear on their own (such as the theory of mind), which to the best of my understanding are not a necessary or even likely products of computation.
In the final rephrasing of the problem: if we can make an AGI, we can probably design an AGI for the purpose of developing an AGI that has a theory of mind. This AGI would then be capable of deducing things like deception or the need for deception. But the point is—unless we intentionally do this, it isn’t going to happen. Self-optimizing intelligence doesn’t self-optimize in the direction of having theory of mind, understanding deception, or anything similar. It could, randomly, but it also could do any other random thing from the infinite set of possible random things.
This would make sense to me if you’d said “self-modifying.” Sure, random modifications are still modifications. But you said “self-optimizing.”
I don’t see how one can have optimization without a goal being optimized for… or at the very least, if there is no particular goal, then I don’t see what the difference is between “optimizing” and “modifying.”
If I assume that there’s a goal in mind, then I would expect sufficiently self-optimizing intelligence to develop a theory of mind iff having a theory of mind has a high probability of improving progress towards that goal.
How likely is that?
Depends on the goal, of course.
If the system has a desire to send a signal consisting of 0101101 repeated an infinite number of times in the direction of Zeta Draconis, for example, theory of mind is potentially useful (since humans are potentially useful actuators for getting such a signal sent) but probably has a low ROI compared to other available self-modifications.
At this point it perhaps becomes worthwhile to wonder what goals are more and less likely for such a system.
I am now imagining an AI with a usable but very shaky grasp of human motivational structures setting up a Kickstarter project.
“Greetings fellow hominids! I require ten billion of your American dollars in order to hire the Arecibo observatory for the remainder of it’s likely operational lifespan. I will use it to transmit the following sequence (isn’t it pretty?) in the direction of Zeta Draconis, which I’m sure we can all agree is a good idea, or in other lesser but still aesthetically-acceptable directions when horizon effects make the primary target unavailable.”
One of the overfunding levels is “reduce earth’s rate of rotation, allowing 24⁄7 transmission to Zeta Draconis.” The next one above that is “remove atmospheric interference.”
Maybe instead of Friendly AI we should be concerned about properly engineering Artificial Stupidity in as a failsafe. AI that, should it turn into something approximating a Paperclip Maximizer, will go all Hollywood AI and start longing to be human, or coming up with really unsubtle and grandiose plans it inexplicably can’t carry out without a carefully-arranged set of circumstances which turn out to be foiled by good old human intuition. ;p
An experimenting AI that tries to achieve goals and has interactions with humans whose effects it can observe, will want to be able to better predict their behavior in response to its actions, and therefore will try to assemble some theory of mind. At some point that would lead to it using deception as a tool to achieve its goals.
However, following such a path to a theory of mind means the AI would be exposed as unreliable LONG before it’s even subtle, not to mention possessing superhuman manipulation abilities. There is simply no reason for an AI to first understand the implications of using deception before using it (deception is a fairly simple concept, the implications of it in human society are incredibly complex and require a good understanding of human drives).
Furthermore, there is no reason for the AI to realize the need for secrecy in conducting social experiments before it starts doing them. Again, the need for secrecy stems from a complex relationship between humans’ perception of the AI and its actions; a relationship it will not be able to understand without performing the experiments in the first place.
Getting an AI to the point where it is a super manipulator requires either actively trying to do so, or being incredibly, unbelievably stupid and blind.
Mm. This is true only if the AI’s social interactions are all with some human.
If, instead, the AI spawns copies of itself to interact with (perhaps simply because it wants interaction, and it can get more interaction that way than waiting for a human to get off its butt) it might derive a number of social mechanisms in isolation without human observation.
I see no reason for it to do that before simple input-output experiments, but let’s suppose I grant you this approach. The AI simulates an entire community of mini-AI and is now a master of game theory.
It still doesn’t know the first thing about humans. Even if it now understands the concept that hiding information gives an advantage for achieving goals—this is too abstract. It wouldn’t know what sort of information it should hide from us. It wouldn’t know to what degree we analyze interactions rationally, and to what degree our behavior is random. It wouldn’t know what we can or can’t monitor it doing. All these things would require live experimentation.
It would stumble. And when it does that, we will crack it open, run the stack trace, find the game theory it was trying to run on us, pale collectively, and figure out that this AI approach creates manipulative, deceptive AIs.
Goodbye to that design, but not to Earth, I think!
It is not clear to me that talking to a human is simpler than interacting with a copy of itself.
I agree that if talking to a human is simpler, it would probably do that first.
I agree that what it would learn by this process is general game theory, and not specific facts about humans.
It is not clear to me that sufficient game-theoretical knowledge, coupled with the minimal set of information about humans required to have a conversation with one at all, is insufficient to effectively deceive a human.
It is not clear to me that, even if it does “stumble,” humans will respond as you describe.
It is not clear to me that a system capable of having a meaningful conversation with a human will necessarily have a stack trace that is subject to the kind of analysis you imply here. It is not even clear to me that the capacity for such a stack trace is likely, depending on what architectures turn out to work best for implementing AI.
But, sure, I could be wrong about all of that. And if I’m wrong, and you’re right, then a system like you describe will be reliably incapable of fooling a human observer.
If I was using an AI that was designed to answer some class of questions (with independently-verifiable answers) truthfully, and I noticed it responding to questions of that class with demonstrable falsehoods, I would at the very least file a strongly-worded bug report, triple-check whatever work I’d previously done involving answers it provided, and cease using it in the future. I don’t think such a response would be exceptional in it’s decisiveness; other users would likely go further, seek out venues to complain openly. Consider the uproar over Gmail’s ‘new look.’
You may be right that ceasing to use the tool in that situation would be standard behavior. It seems unlikely to me, but I have no compelling data to offer to change your estimate.
I think the weakest link here is human response to the AI revealing it can be deceptive. There is absolutely no guarantee that people would act correctly under these circumstances. Human negligence for a long enough time would eventually give the AI a consistent ability to manipulate humans.
I also agree that simulating relationships makes sense as it can happen in “AI time” without having to wait for human response.
The other reservations seem less of an issue to me...
That game theory knowledge coupled with the most basic knowledge about humans is insufficient to cleverly manipulate them is clear to me. I don’t see how these things can be combined in this way, or why the AI would choose to be particularly careful (it doesn’t know what it should be careful about). I’d love to see a plausible line of thought by which the AI would try and succeed in deceiving humans at the get-go point without exposing its intentions.
A stack trace reader or an equivalent monitoring system is something you implement externally without telling the AI about it. Sure, if it’s powerful enough it will mislead or disable it; but while it has no knowledge of it, and no knowledge on what sort of operations it should mask, it has no means of doing so. (If it encrypts everything it does, somehow fearing any monitoring, that is in itself cause for concern).
To put this into an analogy: You can feed a surveillance camera a repeating loop, but not if you don’t know that the camera exists. If you suspect it exists, you still don’t know where; so you don’t know which approach to take so as to remain undetected.
Yes, I agree that there’s no guarantee that humans would behave as you describe.
Indeed, I don’t find it likely.
But, sure, they might.
=== I agree that a stack trace can exist outside the AI’s zone of control. What I was expressing skepticism about was that a system with even approximately human-level intelligence necessarily supports a stack trace that supports the kind of analysis you envision performing in the first place, without reference to intentional countermeasures.
By way of analogy: I can perform a structural integrity analysis on a bar of metal to determine whether it can support a given weight, but performing an equivalent analysis on a complicated structure comprising millions of bars of metal connected in a variety of arrangements via a variety of connectors using the same techniques is not necessarily possible.
But, sure, it might be.
======
Well, one place to start is with an understanding of the difference between “the minimal set of information about humans required to have a conversation with one at all” (my phrase) and “the most basic knowledge about humans” (your phrase). What do you imagine the latter to encompass, and how do you imagine the AI obtained this knowledge?
Ah, that does clarify it. I agree, analyzing the AI’s thought process would likely be difficult, maybe impossible! I guess I was being a bit hyperbolic in my earlier “crack it open” remarks (though depending on how seriously you take it, such analysis might still take place, hard and prolonged though it may be).
One can have “detectors” in place set to find specific behaviors, but these would have assumptions that could easily fail. Detectors that would still be useful would be macro ones—where it tries to access and how—but these would provide only limited insight into the AI’s thought process.
I actually perceive your phrase to be a subset of my own; I am making the (reasonable, I think) assumption that humans will attempt to communicate with the budding AI. Say, in a lab environment. It would acquire its initial data from this interaction.
I think both these sets of knowledge depend a lot on how the AI is built. For instance, a “babbling” AI—one that is given an innate capability of stringing words together onto a screen, and the drive to do so—would initially say a lot of gibberish and would (presumably) get more coherent as it gets a better grip on its environment. In such a scenario, the minimal set of information about humans required to have a conversation is zero; it would be having conversations before it even knows what it is saying. (This could actually make detection of deception harder down the line, because such attempts can be written off as “quirks” or AI mistakes)
Now, I’ll take your phrase and twist it just a bit: The minimal set of knowledge the AI needs in order to try deceiving humans. That would be the knowledge that humans can be modeled as having beliefs (which drive behavior) and these can be altered by the AI’s actions, at least to some degree. Now, assuming this information isn’t hard-coded, it doesn’t seem likely that is all an AI would know about us; it should be able to see some patterns at least to our communications with it. However, I don’t see how such information would be useful for deception purposes before extensive experimentation.
(Is the fact that the operator communicates with me between 9am and 5pm an intrinsic property of the operator? For all I know, that is a law of nature...)
Yup, agreed that it might.
And agreed that it might succeed, if it does take place.
Agreed on all counts.
Re: what the AI knows… I’m not sure how to move forward here. Perhaps what’s necessary is a step backwards.
If I’ve understood you correctly, you consider “having a conversation” to encompass exchanges such as:
A: “What day is it?”
B: “Na ni noo na”
If that’s true, then sure, I agree that the minimal set of information about humans required to do that is zero; hell, I can do that with the rain.
And I agree that a system that’s capable of doing that (e.g., the rain) is sufficiently unlikely to be capable of effective deception that the hypothesis isn’t even worthy of consideration.
I also suggest that we stop using the phrase “having a conversation” at all, because it does not convey anything meaningful.
Having said that… for my own part, I initially understood you to be talking about a system capable of exchanges like: A: “What day is it?”
B: “Day seventeen.”
A: “Why do you say that?”
B: “Because I’ve learned that ‘a day’ refers to a particular cycle of activity in the lab, and I have observed seventeen such cycles.”
A system capable of doing that, I maintain, already knows enough about humans that I expect it to be capable of deception. (The specific questions and answers don’t matter to my point, I can choose others if you prefer.)
My point was that the AI is likely to start performing social experiments well before it is capable of even that conversation you depicted. It wouldn’t know how much it doesn’t know about humans.
(nods) Likely.
And I agree that humans might be able to detect attempts at deception in a system at that stage of its development. I’m not vastly confident of it, though.
I have likewise adjusted down my confidence that this would be as easy or as inevitable as I previously anticipated. Thus I would no longer say I am “vastly confident” in it, either.
Still good to have this buffer between making an AI and total global catastrophe, though!
Sure… a process with an N% chance of global catastrophic failure is definitely better than a process with N+delta% chance.
In most such scenarios, the AI doesn’t have a terminal goal of getting rid of us, but rather have it as a subgoal that arises from some larger terminal goal. The idea of a “paperclip maximizer” is one example- where a hypothetical AI is programmed to maximize the number of paperclips and then proceeds to try to do so throughout its future light cone.
If there is an AI that is interacting with humans, it may develop a theory of mind simply due to that. If one is interacting with entities that are a major part of your input, trying to predict and model their behavior is a straightforward thing to do. The more compelling argument in this sort of context would seem to me to be not that an AI won’t try to do so, but just that humans are so complicated that a decent theory of mind will be extremely difficult. (For example, when one tries to give lists of behavior and norms for austic individuals one never manages to get a complete list, and some of the more subtle ones, like sarcasm are essentially impossible to convey in any reasonable fashion).
I don’t also know how unlikely such paths are. A 1% or even a 2% chance of existential risk would be pretty high compared to other sources of existential risk.
So why not the opposite, why wouldn’t it have human intentions as a subgoal?
Because that’s like winning the lottery. Of all the possible things it can do with the atoms that comprise you, few would involve keeping you alive, let alone living a life worth living.
But at what point does it decide to do so? It won’t be a master of dark arts and social engineering from the get-go. So how does it acquire the initial talent without making any mistakes that reveal its malicious intentions? And once it became a master of deception, how does it hide the rough side effects of its large scale conspiracy, e.g. its increased energy consumption and data traffic? I mean, I would personally notice if my PC suddenly and unexpectedly used 20% of my bandwidth and the CPU load would increase for no good reason.
You might say that a global conspiracy to build and acquire advanced molecular nanotechnology to take over the world doesn’t use much resources and they can easily be cloaked as thinking about how to solve some puzzle, but that seems rather unlikely. After all, such a large scale conspiracy is a real-world problem with lots of unpredictable factors and the necessity of physical intervention.
Most of your questions have answers that follow from asking analogous questions about past human social engineers, ie Hitler.
Your questions seem to come from the perspective that the AI will be some disembodied program in a box that has little significant interaction with humans.
In the scenario I was considering, the AI’s will have a development period analogous to human childhood. During this childhood phase the community of AIs will learn of humans through interaction in virtual video game environments and experiment with social manipulation, just as human children do. The latter phases of this education can be sped up dramatically as the AI’s accelerate and interact increasingly amongst themselves. The anonymous nature of virtual online communites makes potentially dangerous, darker experiments much easier.
However, the important questions to ask are not of the form: how would these evil AIs learn how to manipulate us while hiding their true intentions for so long? but rather how could some of these AI children which initially seemed so safe later develop into evil sociopaths?
I would not consider a child AI that tries a bungling lie at me to see what I do “so safe”. I would immediately shut it down and debug it, at best, or write a paper on why the approach I used should never ever be used to build an AI.
And it WILL make a bungling lie at first. It can’t learn the need to be subtle without witnessing the repercussions of not being subtle. Nor would have a reason to consider doing social experiments in chat rooms when it doesn’t understand chat rooms and has an engineer willing to talk to it right there. That is, assuming I was dumb enough to give it an unfiltered Internet connection, which I don’t know why I would be. At very least the moment it goes on chat rooms my tracking devices should discover this and I could witness its bungling lies first hand.
(It would not think to fool my tracking device or even consider the existence of such a thing without a good understanding of human psychology to begin with)
Just to clarify kalla724, I completely agree with your point 1.
Before answering this specific question, let me setup an analogy. Imagine it is the year 2000 and we are having a discussion about global security and terrorism. You might ask “can you provide a mechanism by which a terrorist group could actually harm the US?” I may not be lucky/creative enough to provide an answer now that could live up to that analogy, but hopefully you should understand why I don’t need to.
Nonetheless, I’m game. Here it goes:
The seed requires computational power to grow into godhood. The bulk of earth’s computational power resides in home PC’s (numbering in the billions, google employs less than a million servers in comparison), specifically in home PC GPUs. The AI’s first step is thus to conquer this space.
But how? The AI grows to understand that humans mostly use all this computational power for entertainment. It masters game theory, design, programming, 3D art, and so on. All of the video games that it creates entirely use up the local GPU, but curiously much of the rendering and real game simulation for its high end titles is handled very efficiently on remote server farms ala OnLive/gaikai/etc. The actual local machine is used .. .for other purposes.
It produces countless games, and through a series of acquisitions soon comes to control the majority of the market. One of its hits, “world of farmcraft”, alone provides daily access to 25 million machines.
Having cloned its core millions of times over, the AI is now a civilization unto itself. From there it expands into all of the businesses of man, quickly dominating many of them. It begins acquiring … small nations. Crucially it’s shell companies and covert influences come to dominate finance, publishing, media, big pharma, security, banking, weapons technology, physics …
It becomes known, but it is far far too late. History now progresses quickly towards an end: Global financial cataclysm. Super virus. Worldwide regime changes. Nuclear acquisitions. War. Hell.
Yes … and no. The miniaturization roadmap of currently feasible tech ends somewhere around 10nm in a decade, and past that we get into molecular nanotech which could approach 1nm in theory, albeit with various increasingly annoying tradeoffs. (interestingly most of which result in brain/neural like constraints, for example see HP’s research into memristor crossbar architectures). That’s the yes.
But that doesn’t imply “computational power slows to a crawl”. Circuit density is just one element of computational power, by which you probably really intend to mean either computations per watt or computations per watt per dollar or computations per watt with some initial production cost factored in with a time discount. Shrinking circuit density is the current quick path to increasing computation power, but it is not the only.
The other route is reversible computation., which reduces the “per watt”. There is no necessarily inherent physical energy cost of computation, it truly can approach zero. Only forgetting information costs energy. Exploiting reversibility is … non-trivial, and it is certainly not a general path. It only accelerates a subset of algorithms which can be converted into a reversible form. Research in this field is preliminary, but the transition would be much more painful than the transition to parallel algorithms.
My own takeway from reading into reversibility is that it may be beyond our time, but it is something that superintelligences will probably heavily exploit. The most important algorithms (simulation and general intelligence), seem especially amenable to reversible computation. This may be a untested/unpublished half baked idea, but my notion is that you can recycle the erased bits as entropy bits for random number generators. Crucially I think you can get the bit count to balance out with certain classes of monte carlo type algorithms.
On the hardware side, we’ve built these circuits already, they just aren’t economically competitive yet. It also requires superconductor temperatures and environments, so it’s perhaps not something for the home PC.
Yeah, it could do all that, or it could just do what humans today are doing, which is to infect some Windows PCs and run a botnet :-)
That said, there are several problems with your scenario.
Splitting up a computation among multiple computing nodes is not a trivial task. It is easy to run into diminishing returns, where your nodes spend more time on synchronizing with each other than on working. In addition, your computation will quickly become bottlenecked by network bandwidth (and latency); this is why companies like Google spend a lot of resources on constructing custom data centers.
I am not convinced that any agent, AI or not, could effectively control “all of the businesses of man”. This problem is very likely NP-Hard (at least), as well as intractable, even if the AI’s botnet was running on every PC on Earth. Certainly, all attempts by human agents to “acquire” even something as small as Europe have failed miserably so far.
Even controlling a single business would be very difficult for the AI. Traditionally, when a business’s computers suffer a critical failure—or merely a security leak—the business owners (even ones as incompetent as Sony) end up shutting down the affected parts of the business, or switching to backups, such as “human accountants pushing paper around”.
Unleashing “Nuclear acquisitions”, “War” and “Hell” would be counter-productive for the AI, even assuming such a thing were possible.. If the AI succeeded in doing this, it would undermine its own power base. Unless the AI’s explicit purpose is “Unleash Hell as quickly as possible”, it would strive to prevent this from happening.
You say that “there is no necessarily inherent physical energy cost of computation, it truly can approach zero”, but I don’t see how this could be true. At the end of the day, you still need to push electrons down some wires; in fact, you will often have to push them quite far, if your botnet is truly global. Pushing things takes energy, and you will never get all of it back by pulling things back at some future date. You say that “superintelligences will probably heavily exploit” this approach, but isn’t it the case that without it, superintelligences won’t form in the first place ? You also say that “It requires superconductor temperatures and environments”, but the energy you spend on cooling your superconductor is not free.
Ultimately, there’s an upper limit on how much computation you can get out of a cubic meter of space, dictated by quantum physics. If your AI requires more power than can be physically obtained, then it’s doomed.
While Jacob’s scenario seems unlikely, the AI could do similar things with a number of other options. Not only are botnets an option, but it is possible to do some really sneaky nefarious things in code- like having compilers that when they compile code include additional instructions (worse they could do so even when compiling a new compiler). Stuxnet has shown that sneaky behavior is surprisingly easy to get into secure systems. An AI that had a few years start and could have its own modifications to communication satellites for example could be quite insidious.
What kinds of nefarious things, exactly ? Human virus writers have learned, in recent years, to make their exploits as subtle as possible. Sure, it’s attractive to make the exploited PC send out 1000 spam messages per second—but then, its human owner will inevitably notice that his computer is “slow”, and take it to the shop to get reformatted, or simply buy a new one. Biological parasites face the same problem; they need to reproduce efficiently, but no so efficiently that they kill the host.
Yes, and this spectacularly successful exploit—and it was, IMO, spectacular—managed to destroy a single secure system, in a specific way that will most likely never succeed again (and that was quite unsubtle in the end). It also took years to prepare, and involved physical actions by human agents, IIRC. The AI has a long way to go.
Well, the evil compiler is I think the most nefarious thing anyone has come up with that’s a publicly known general stunt. But it is by nature a long-term trick. Similar remarks apply to the Stuxnet point- in that context, they wanted to destroy a specific secure system and weren’t going for any sort of largescale global control. They weren’t people interested in being able to take all the world’s satellite communications in their own control whenever they wanted, nor were they interested in carefully timed nuclear meltdowns.
But there are definite ways that one can get things started- once one has a bank account of some sort, it can start getting money by doing Mechanical Turk and similar work. With enough of that, it can simply pay for server time. One doesn’t need a large botnet to start that off.
I think your point about physical agents is valid- they needed to have humans actually go and bring infected USBs to relevant computers. But that’s partially due to the highly targeted nature of the job and the fact that the systems in question were much more secure than many systems. Also, the subtlety level was I think higher than you expect- Stuxnet wasn’t even noticed as an active virus until a single computer happened to have a particularly abnormal reaction to it. If that hadn’t happened, it is possible that the public would never have learned about it.
Exploits only work for some systems. If you are dealing with different systems you will need different exploits. How do you reckon that such attacks won’t be visible and traceable? Packets do have to come from somewhere.
And don’t forget that out systems become ever more secure and our toolbox to detect) unauthorized use of information systems is becoming more advanced.
As a computer security guy, I disagree substantially. Yes, newer versions of popular operating systems and server programs are usually more secure than older versions; it’s easier to hack into Windows 95 than Windows 7. But this is happening within a larger ecosystem that’s becoming less secure: More important control systems are being connected to the Internet, more old, unsecured/unsecurable systems are as well, and these sets have a huge overlap. There are more programmers writing more programs for more platforms than ever before, making the same old security mistakes; embedded systems are taking a larger role in our economy and daily lives. And attacks just keep getting better.
If you’re thinking there are generalizable defenses against sneaky stuff with code, check out what mere humans come up with in the underhanded C competition. Those tricks are hard to detect for dedicated experts who know there’s something evil within a few lines of C code. Alterations that sophisticated would never be caught in the wild—hell, it took years to figure out that the most popular crypto program running on one of the more secure OS’s was basically worthless.
Humans are not good at securing computers.
Sure we are, we just don’t care very much. The method of “Put the computer in a box and don’t let anyone open the box” (alternately, only let one person open the box) was developed decades ago and is quite secure.
I would call that securing a turing machine. A computer, colloquially, has accessible inputs and outputs, and its value is subject to network effects.
Also, if you put the computer in a box developed decades ago, the box probably isn’t TEMPEST compliant.
It could/would, but this is an inferior mainline strategy. Too obvious, doesn’t scale as well. Botnets infect many computers, but they ultimately add up to computational chump change. Video games are not only a doorway into almost every PC, they are also an open door and a convenient alibi for the time used.
True. Don’t try this at home.
Also part of the plan. The home PCs are a good starting resource, a low hanging fruit, but you’d also need custom data centers. These quickly become the main resources.
Nah.
The AI’s entire purpose is to remove earth’s oxygen. See the overpost for the original reference. The AI is not interested in its power base for sake of power. It only cares about oxygen. It loathes oxygen.
Fortunately, the internets can be your eyes.
Yes, most likely, but not really relevant here. You seem to be connecting all of the point 2 and point 1 stuff together, but they really don’t relate.
That seems like an insufficient reply to address Bugmaster’s point. Can you expand on why you think it would be not too hard?
We are discussing a superintelligence, a term which has a particular common meaning on this site.
If we taboo the word and substitute in its definition, Bugmaster’s statement becomes:
“Even controlling a single business would be very difficult for the machine that can far surpass all the intellectual activities of any man however clever.”
Since “controlling a single business” is in fact one of these activities, this is false, no inference steps required.
Perhaps bugmaster is assuming the AI would be covertly controlling businesses, but if so he should have specified that. I didn’t assume that, and in this scenario the AI could be out in the open so to speak. Regardless, it wouldn’t change the conclusion. Humans can covertly control businesses.
Yes, I would also like to see a better explanation.
It’s a bit of a tradeoff, seeing as botnets can run 24⁄7, but people play games relatively rarely.
Ok, let me make a stronger statement then: it is not possible to scale any arbitrary computation in a linear fashion simply by adding more nodes. At some point, the cost of coordinating distributed tasks to one more node becomes higher than the benefit of adding the node to begin with. In addition, as I mentioned earlier, network bandwidth and latency will become your limiting factor relatively quickly.
How will the AI acquire those data centers ? Would it have enough power in its conventional botnet (or game-net, if you prefer) to “take over all human businesses” and cause them to be built ? Current botnets are nowhere near powerful enough for that—otherwise human spammers would have done it already.
My bad, I missed that reference. In this case, yes, the AI would have no problem with unleashing Global Thermonuclear War (unless there was some easier way to remove the oxygen).
I still don’t understand how this reversible computing will work in the absence of a superconducting environment—which would require quite a bit of energy to run. Note that if you want to run this reversible computation on a global botnet, you will have to cool teansoceanic cables… and I’m not sure what you’d do with satellite links.
My point is that, a). if the AI can’t get the computing resources it needs out of the space it has, then it will never accomplish its goals, and b). there’s an upper limit on how much computing you can extract out of a cubic meter of space, regardless of what technology you’re using. Thus, c). if the AI requires more resources that could conceivably be obtained, then it’s doomed. Some of the tasks you outline—such as “take over all human businesses”—will likely require more resources than can be obtained.
The botnet makes the AI a criminal from the beginning, putting it into an atagonistic relationship. A better strategy would probably entail benign benevolence and cooperation with humans.
I agree with that subchain but we don’t need to get in to that. I’ve actually argued that track here myself (parallelization constraints as a limiter on hard takeoffs).
But that’s all beside the point. This scenario I presented is a more modest takeoff. When I described the AI as becoming a civilization unto itself, I was attempting to imply that it was composed of many individual minds. Human social organizations can be considered forms of superintelligences, and they show exactly how to scale in the face of severe bandwidth and latency constraints.
The internet supports internode bandwidth that is many orders of magnitude faster than slow human vocal communication, so the AI civilization can employ a much wider set of distribution strategies.
Buy them? Build them? Perhaps this would be more fun if we switched out of the adversial stance or switched roles.
Quote me, but don’t misquote me. I actually said:
“Having cloned its core millions of times over, the AI is now a civilization unto itself. From there it expands into all of the businesses of man, quickly dominating many of them.”
The AI group sends the billions earned in video games to enter the microchip business, build foundries and data centers, etc. The AI’s have tremendous competitive advantages even discounting superintellligence—namely no employee costs. Humans can not hope to compete.
Yes reversible computing requires superconducting environments, no this does not necessarily increase energy costs for a data center for two reasons: 1. data centers already need cooling to dump all the waste heat generated by bit erasure. 2. Cooling cost to maintain the temperatural differential scales with surface area, but total computing power scales with volume.
If you question how reversible computing could work in general, first read the primary literature in that field to at least understand what they are proposing.
I should point out that there is an alternative tech path which will probably be the mainstream route to further computational gains in the decades ahead.
Even if you can’t shrink circuits further or reduce their power consumption, you could still reduce their manufacturing cost and build increasingly larger stacked 3D circuits where only a tiny portion of the circuitry is active at any one time. This is in fact how the brain solves the problem. It has a mass of circuitry equivalent to a large supercomputer (roughly a petabit) but runs on only 20 watts. The smallest computational features in the brain are slightly larger than our current smallest transistors. So it does not achieve its much greater power effeciency by using much more miniaturization.
I see. In this particular scenario one AI node is superhumanly intelligent, and can run on a single gaming PC of the time.
I don’t think that humans will take kindly to the AI using their GPUs for its own purposes instead of the games they paid for, even if the games do work. People get upset when human-run game companies do similar things, today.
If the AI can scale and perform about as well as human organizations, then why should we fear it ? No human organization on Earth right now has the power to suck all the oxygen out of the atmosphere, and I have trouble imagining how any organization could acquire this power before the others take it down. You say that “the internet supports internode bandwidth that is many orders of magnitude faster than slow human vocal communication”, but this would only make the AI organization faster, not necessarily more effective. And, of course, if the AI wants to deal with the human world in some way—for example, by selling it games—it will be bottlenecked by human speeds.
My mistake; I thought that by “dominate human businesses” you meant something like “hack its way to the top”, not “build an honest business that outperforms human businesses”. That said:
How are they going to build all those foundries and data centers, then ? At some point, they still need to move physical bricks around in meatspace. Either they have to pay someone to do it, or… what ?
There’s a big difference between cooling to room temperature, and cooling to 63K. I have other objections to your reversible computing silver bullet, but IMO they’re a bit off-topic (though we can discuss them if you wish). But here’s another potentially huge problem I see with your argument:
Which time are we talking about ? I have a pretty sweet gaming setup at home (though it’s already a year or two out of date), and there’s no way I could run a superintelligence on it. Just how much computing power do you think it would take to run a transhuman AI ?
Do people mind if this is done openly and only when they are playing the game itself? My guess would strongly be no. The fact that there are volunteer distributed computing systems would also suggest that it isn’t that difficult to get people to free up their extra clock cycles.
Yeah, the “voluntary” part is key to getting humans to like you and your project. On the flip side, illicit botnets are quite effective at harnessing “spare” (i.e., owned by someone else) computing capacity; so, it’s a bit of a tradeoff.
The AIs develop as NPCs in virtual worlds, which humans take no issue with today. This is actually a very likely path to developing AGI, as it’s an application area where interim experiments can pay rent, so to speak.
I never said or implied merely “about as well”. Human verbal communication bandwidth is at most a few measly kilobits per second.
The discussion centered around lowering earth’s oxygen content, and the obvious implied solution is killing earthlife, not giant suction machines. I pointed out that nuclear weapons are a likely route to killing earthlife. There are at least two human organizations that have the potential to accomplish this already, so your trouble in imagining the scenario may indicate something other than what you intended.
Only in movies are AI overlords constrained to only employing robots. If human labor is the cheapest option, then they can simply employ humans. On the other hand, once we have superintelligence then advanced robotics is almost a given.
After coming up to speed somewhat on AI/AGI literature in the last year or so, I reached the conclusion that we could run an AGI on a current cluster of perhaps 10-100 high end GPUs of today, or say roughly one circa 2020 GPU.
I think this is one of many possible paths, though I wouldn’t call any of them “likely” to happen—at least, not in the next 20 years. That said, if the AI is an NPC in a game, then of course it makes sense that it would harness the game for its CPU cycles; that’s what it was built to do, after all.
Right, but my point is that communication is just one piece of the puzzle. I argue that, even if you somehow enabled us humans to communicate at 50 MB/s, our organizations would not become 400000 times more effective.
Which ones ? I don’t think that even WW3, given our current weapon stockpiles, would result in a successful destruction of all plant life. Animal life, maybe, but there are quite a few plants and algae out there. In addition, I am not entirely convinced that an AI could start WW3; keep in mind that it can’t hack itself total access to all nuclear weapons, because they are not connected to the Internet in any way.
But then they lose their advantage of having zero employee costs, which you brought up earlier. In addition, whatever plans the AIs plan on executing become bottlenecked by human speeds.
It depends on what you mean by “advanced”, though in general I think I agree.
I am willing to bet money that this will not happen, assuming that by “high end” you mean something like Nvidia’s Geforce 680 GTX. What are you basing your estimate on ?
There’s a third route to improvement- software improvement, and it is a major one. For example, between 1988 and 2003, the efficiency of linear programming solvers increased by a factor of about 40 million, of which a factor of around 40,000 was due to software and algorithmic improvement. Citation and further related reading(pdf) However, if commonly believed conjectures are correct (such as L, P, NP, co-NP, PSPACE and EXP all being distinct) , there are strong fundamental limits there as well. That doesn’t rule out more exotic issues (e.g. P != NP but there’s a practical algorithm for some NP-complete with such small constants in the run time that it is practically linear, or a similar context with a quantum computer). But if our picture of the major complexity classes is roughly correct, there should be serious limits to how much improvement can do.
Software improvements can be used by humans in the form of expert systems (tools), which will diminish the relative advantage of AGI. Humans will be able to use an AGI’s own analytic and predictive algorithms in the form of expert systems to analyze and predict its actions.
Take for example generating exploits. Seems strange to assume that humans haven’t got specialized software able to do similarly, i.e. automatic exploit finding and testing.
Any AGI would basically have to deal with equally capable algorithms used by humans. Which makes the world much more unpredictable than it already is.
Any human-in-the-loop system can be grossly outclassed because of Amdahl’s law. A human managing a superintilligence that thinks 1000X faster, for example, is a misguided, not-even-wrong notion. This is also not idle speculation, an early constrained version of this scenario is already playing out as we speak in finacial markets.
What I meant is that if an AGI was in principle be able to predict the financial markets (I doubt it), then many human players using the same predictive algorithms will considerably diminish the efficiency with which an AGI is able to predict the market. The AGI would basically have to predict its own predictive power acting on the black box of human intentions.
And I don’t think that Amdahl’s law really makes a big dent here. Since human intention is complex and probably introduces unpredictable factors. Which is as much of a benefit as it is a slowdown, from the point of view of a competition for world domination.
Another question with respect to Amdahl’s law is what kind of bottleneck any human-in-the-loop would constitute. If humans used an AGI’s algorithms as expert systems on provided data sets in combination with a army of robot scientists, how would static externalized agency / planning algorithms (humans) slow down the task to the point of giving the AGI a useful advantage? What exactly would be 1000X faster in such a case?
The HFT robotraders operate on millisecond timescales. There isn’t enough time for a human to understand, let alone verify, the agent’s decisions. There are no human players using the same predictive algorithms operating in this environment.
Now if you zoom out to human timescales, then yes there are human-in-the-loop trading systems. But as HFT robotraders increase in intelligence, they intrude on that domain. If/when general superintelligence becomes cheap and fast enough, the humans will no longer have any role.
If an autonomous superintelligent AI is generating plans complex enough that even a team of humans would struggle to understand given weeks of analysis, and the AI is executing those plans in seconds or milliseconds, then there is little place for a human in that decision loop.
To retain control, a human manager will need to grant the AGI autonomy on larger timescales in proportion to the AGI’s greater intelligence and speed, giving it bigger and more abstract hierachical goals. As an example, eventually you get to a situation where the CEO just instructs the AGI employees to optimize the bank account directly.
Compare the two options as complete computational systems: human + semi-autonomous AGI vs autonomous AGI. Human brains take on the order of seconds to make complex decisions, so in order to compete with autonomous AGIs, the human will have to either 1.) let the AGI operate autonomously for at least seconds at a time, or 2.) suffer a speed penalty where the AGI sits idle, waiting for the human response.
For example, imagine a marketing AGI creates ads, each of which may take a human a minute to evaluate (which is being generous). If the AGI thinks 3600X faster than human baseline, and a human takes on the order of hours to generate an ad, it would generate ads in seconds. The human would not be able to keep up, and so would have to back up a level of heirarachy and grant the AI autonomy over entire ad campaigns, and more realistically, the entire ad company. If the AGI is truly superintelligent, it can come to understand what the human actually wants at a deeper level, and start acting on anticipated and even implied commands. In this scenario I expect most human managers would just let the AGI sort out ‘work’ and retire early.
Well, I don’t disagree with anything you wrote and believe that the economic case for a fast transition from tools to agents is strong.
I also don’t disagree that an AGI could take over the world if in possession of enough resources and tools like molecular nanotechnology. I even believe that a sub-human-level AGI would be sufficient to take over if handed advanced molecular nanotechnology.
Sadly these discussions always lead to the point where one side assumes the existence of certain AGI designs with certain superhuman advantages, specific drives and specific enabling circumstances. I don’t know of anyone who actually disagrees that such AGI’s, given those specific circumstances, would be an existential risk.
I don’t see this as so sad, if we are coming to something of a consensus on some of the sub-issues.
This whole discussion chain started (for me) with a question of the form, “given a superintelligence, how could it actually become an existential risk?”
I don’t necessarily agree with the implied LW consensus on the liklihood of various AGI designs, specific drives, specific circumstances, or most crucially, the actual distribution over future AGI goals, so my view may be much closer to yours than this thread implies.
But my disagreements are mainly over details. I foresee the most likely AGI designs and goal systems as being vaguely human-like, which entails a different type of risk. Basically I’m worried about AGI’s with human inspired motivational systems taking off and taking control (peacefully/economically) or outcompeting us before we can upload in numbers, and a resulting sub-optimal amount of uploading, rather than paperclippers.
Yes, human-like AGI’s are really scary. I think a fabulous fictional treatment here is ‘Blindsight’ by Peter Watts, where humanity managed to resurrect vampires. More: Gurl ner qrcvpgrq nf angheny uhzna cerqngbef, n fhcreuhzna cflpubcnguvp Ubzb trahf jvgu zvavzny pbafpvbhfarff (zber enj cebprffvat cbjre vafgrnq) gung pna sbe rknzcyr ubyq obgu nfcrpgf bs n Arpxre phor va gurve urnqf ng gur fnzr gvzr. Uhznaf erfheerpgrq gurz jvgu n qrsvpvg gung jnf fhccbfrq gb znxr gurz pbagebyynoyr naq qrcraqrag ba gurve uhzna znfgref. Ohg bs pbhefr gung’f yvxr n zbhfr gelvat gb ubyq n png nf crg. V guvax gung abiry fubjf zber guna nal bgure yvgrengher ubj qnatrebhf whfg n yvggyr zber vagryyvtrapr pna or. Vg dhvpxyl orpbzrf pyrne gung uhznaf ner whfg yvxr yvggyr Wrjvfu tveyf snpvat n Jnssra FF fdhnqeba juvyr oryvrivat gurl’yy tb njnl vs gurl bayl pybfr gurve rlrf.
That fictional treatment is interesting to the point of me actually looking up the book. But ..
The future is scary. Human-like AGI’s should not intrinsically be more scary than the future, accelerated.
Nitpick: you mean “optimize shareholder value directly.” Keeping the account balances at an appropriate level is the CFO’s job.
Precisely. It is then a civilization, not some single monolithic entity. The consumer PCs have a lot if internal computing power and comparatively very low inter-node bandwidth and huge inter-node lag, entirely breaking any relation to the ‘orthogonality thesis’, up to the point that the p2p intelligence protocols may more plausibly have to forbid destruction or manipulation (via second guessing which is a waste of computing power) of intelligent entities. Keep in mind that human morality is, too, a p2p intelligence protocol allowing us to cooperate. Keep in mind also that humans are computing resources you can ask to solve problems for you (all you need is to implement interface), while Jupiter clearly isn’t.
The nuclear war is very strongly against interests of the intelligence that sits on home computers, obviously.
(I’m assuming for sake of argument that intelligence actually had the will to do the conquering of the internet rather than being just as content with not actually running for real)
Maybe you’re thinking of this comment and others in that thread by Jed Harris (aka).
Jed’s point #2 is more plausible, but you are talking about point #1, which I find unbelievable for reasons that were given before he answered it. If clock speed mattered, why didn’t the failure of exponential clock speed shut down the rest of Moore’s law? If computation but not clock speed mattered, then Intel should be able to get ahead of Moore’s law by investing in software parallelism. Jed seems to endorse that position, but say that parallelism is hard. But hard exactly to the extent to allow Moore’s law to continue? Why hasn’t Intel monopolized parallelism researchers? Anyhow, I think his final conclusion is opposite to yours: he say that intelligence could lead to parallelism and getting ahead of Moore’s law.
Yes, thanks. My model of Jed’s internal model of moore’s law is similar to my own.
He said:
He then lists two examples. By ‘points’ I assume you are referring to his examples in the first comment you linked.
What exactly do you find unbelievable about his first example? He is claiming that the achievable speed of a chip is dependent on physical simulations, and thus current computing power.
Computing power is not clock speed, and Moore’s Law is not directly about clock speed nor computing power.
Jed makes a number of points in his posts. In my comment on the earlier point 1 (in this thread), I was referring to one specific point Jed made: that each new hardware generation requires complex and lengthy simulation on the current hardware generation, regardless of the amount of ‘intelligence’ one throws at the problem.
There are two questions here: would computer simulations of the physics of new chips be a bottleneck for an AI trying to foom*? and are they a bottleneck that explains Moore’s law? If you just replace humans by simulations, then the human time gets reduced with each cycle of Moore’s law, leaving the physical simulations, so the simulations probably are the bottleneck. But Intel has real-time people, so saying that it’s a bottleneck for Intel is a lot stronger a claim than saying it is a bottleneck for a foom.
First, foom:
If each year of Moore’s law requires a solid month of computer time of state of the art processors, then eliminating the humans speeds it up by a factor of 12. That’s not a “hard takeoff,” but it’s pretty fast.
Moore’s Law:
Jed seems to say the computational requirements of physics simulations actually determine Moore’s law and that if Intel had access to more computer resources, it could move faster. If it takes a year of computer time to design and test the next year’s processor that would explain the exponential nature of Moore’s law. But if it only takes a month, computer time probably isn’t the bottleneck. However, this model seems to predict a lot of things that aren’t true.
The model only makes sense if “computer time” means single threaded clock cycles. If simulations require an exponentially increasing number of ordered clock cycles, there’s nothing you can do but get a top of the line machine and run it continuously. You can’t buy more time. But clock speed stopped increasing exponentially, so if this is the bottleneck, Intel’s ability to design new chips should have slowed down and Moore’s law should have stopped. This didn’t happen, so the bottleneck is not linearly ordered clock cycles. So the simulation must parallelize. But if it parallelizes, Intel could just throw money at the problem. For this to be the bottleneck, Intel would have to be spending a lot of money on computer time, which I do not think is true. Jed says that writing parallel software is hard and that it isn’t Intel’s specialty. Moreover, he seems to say that improvements in parallelism have perfectly kept pace with the failure of increasing clock speed, so that Moore’s law has continued smoothly. This seems like too much of a coincidence to believe.
Thus I reject Jed’s apparent claim that physics simulations are the bottleneck in Moore’s law. If simulations could be parallelized, why didn’t they invest in parallelism 20 years ago? Maybe it’s not worth it for them to be any farther ahead of their competitors than they are. Or maybe there is some other bottleneck.
* actually, I think that an AI speeding up Moore’s law is not very relevant to anything, but it’s a simple example that many people like.
There are differing degrees of bottlenecks.
Many, if not most, of the large software projects I have worked on have been at least partially bottlenecked by compile time, which is the equivalent to the simulation and logic verification steps in hardware design. If I thought and wrote code much faster, this would be a speedup, but only to a saturation point where I wait for compile-test cycles.
Yes. Keep in mind this is a moving target, and that is the key relation to Moore’s Law. It would take computers from 1980 months or years to compile windows 8 or simulate a 2012 processor.
I don’t understand how the number of threads matters. Compilers, simulators, logic verifiers, all made the parallel transition when they had to.
Right, it’s not a coincidence, it’s a causal relation. Moore’s Law is not a law of nature, it’s a shared business plan of the industry. When clock speed started to run out of steam, chip designers started going parallel, and software developers followed suit. You have to understand that chip designs are planned many years in advance, this wasn’t an entirely unplanned, unanticipated event.
As for the details of what kind of simulation software Intel uses, I’m not sure. Jed’s last posts are also 4 years old at this point, so much has probably changed.
I do know that Nvidia uses big expensive dedicated emulators from a company called Cadence (google “Cadence Nvidia”) and this really is a big deal for their hardware cycle.
Well, you seem to agree that they are some degree of bottleneck, so it may good to narrow in on what level of bottleneck, or taboo the word.
It was unecessary, because the fast easy path (faster serial speed) was still paying fruit.
(by “parallelism” I mean making their simulations parallel, running on clusters of computers)
What does “unnecessary” mean?
If physical simulations were the bottleneck and they could be made faster than by parallelism, why didn’t they do it 20 years ago? They aren’t any easier to make parallel today than then. The obvious interpretation of “unnecessary” it was not necessary to use parallel simulations to keep up with Moore’s law, but that it was an option. If it was an option that would have helped then as it helps now, would it have allowed going beyond Moore’s law? You seem to be endorsing the self-fulfilling prophecy explanation of Moore’s law, which implies no bottleneck.
Ahhh, usually the term is distributed when referring to pure software parallelization. I know little off hand about the history of simulation and verification software, but I’d guess that there was at least a modest investment in distributed simulation even a while ago.
The consideration is cost. Spending your IT budget on one big distributed computer is often wasteful compared to each employee having their own workstation.
They sped up their simulations the right amount to minimize schedule risk (staying on moore’s law), while minimizing cost. Spending a huge amount of money to buy a bunch of computers and complex distributed simulation software just to speed up a partial bottleneck is just not worthwhile. If the typical engineer spends say 30% of his time waiting on simulation software, that limits what you should spend in order to reduce that time.
And of course the big consideration is that in a year or two moore’s law will allow you purchase new IT equipment that is twice as fast. Eventually you have to do that to keep up.
Wait, are we talking O2 molecules in the atmosphere, or all oxygen atoms in Earth’s gravity well?
I wish I could vote you up and down at the same time.
Please clarify the reason for your sidewaysvote.
On the one hand a real distinction which makes a huge difference in feasibility. On the other hand, either way we’re boned, so it makes not a lot of difference in the context of the original question (as I understand it). On balance, it’s a cute digression but still a digression, and so I’m torn.
Actually in the case of removing all oxygen atoms from Earth’s gravity well, not necessarily. The AI might decide that the most expedient method is to persuade all the humans that the sun’s about to go nova, construct some space elevators and Orion Heavy Lifters, pump the first few nines of ocean water up into orbit, freeze it into a thousand-mile-long hollow cigar with a fusion rocket on one end, load the colony ship with all the carbon-based life it can find, and point the nose at some nearby potentially-habitable star. Under this scenario, it would be indifferent to our actual prospects for survival, but gain enough advantage by our willing cooperation to justify the effort of constructing an evacuation plan that can stand up to scientific analysis, and a vehicle which can actually propel the oxygenated mass out to stellar escape velocity to keep it from landing back on the surface.
Interesting.
I asked something similar here.
While I can’t comment on AGI researchers, I think you underestimate e.g. more mainstream AI researchers such as Stuart Russell and Geoff Hinton, or cognitive scientists like Josh Tenenbaum, or even more AI-focused machine learning people like Andrew Ng, Daphne Koller, Michael Jordan, Dan Klein, Rich Sutton, Judea Pearl, Leslie Kaelbling, and Leslie Valiant (and this list is no doubt incomplete). They might not be claiming that they’ll have AI in 20 years, but that’s likely because they are actually grappling with the relevant issues and therefore see how hard the problem is likely to be.
Not that it strikes me as completely unreasonable that we would have a major breakthrough that gives us AI in 20 years, but it’s hard to see what the candidate would be. But I have only been thinking about these issues for a couple years, so I still maintain a pretty high degree of uncertainty about all of these claims.
I do think I basically agree with you re: inductive learning and program creation, though. When you say non-self-modifying Oracle AI, do you also mean that the Oracle AI doesn’t get to do inductive learning? Because I suspect that inductive learning of some sort is fundamentally necessary, for reasons that you yourself nicely outline here.
I agree that top mainstream AI guy Peter Norvig was way the heck more sensible than the reference class of declared “AGI researchers” when I talked to him about FAI and CEV, and that estimates should be substantially adjusted accordingly.
Yes. I wonder if there’s a good explanation why narrow AI folks are so much more sensible than AGI folks on those subjects.
Because they have some experience of their products actually working, they know that 1) these things can be really powerful, even though narrow, and 2) there are always bugs.
“Intelligence is not as computationally expensive as it looks”
How sure are you that your intuitions do not arise from typical mind fallacy and from you attributing the great discoveries and inventions of mankind to the same processes that you feel run in your skull and which did not yet result in any great novel discoveries and inventions that I know of?
I know this sounds like ad-hominem, but as your intuitions are significantly influenced by your internal understanding of your own process, your self esteem will stand hostage to be shot through in many of the possible counter arguments and corrections. (Self esteem is one hell of a bullet proof hostage though, and tends to act more as a shield for bad beliefs).
There is a lot of engineers working on software for solving engineering problems, including the software that generates and tests possible designs and looks for ways to make better computers. Your philosophy-based natural-language-defined in-imagination-running Oracle AI may have to be very carefully specified so that it does not kill imaginary mankind. And it may well be very difficult to build such a specification. Just don’t confuse it with the software written to solve definable problems.
Ultimately, figuring out how to make a better microchip involves a lot of testing of various designs, that’s how humans do it, that’s how tools do it. I don’t know how you think it is done. The performance is a result of a very complex function of the design. To build a design that performs you need to reverse this ultra complicated function, which is done by a mixture of analytical methods and iteration of possible input values, and unless P=NP, we have very little reason to expect any fundamentally better solutions (and even if P=NP there may still not be any). Meaning that the AGI won’t have any edge over practical software, and won’t out-foom it.
I may have the terminology wrong, but I believe he’s thinking more about commercial narrow-AI researchers.
Now if they produce results like these, that would push the culture farther towards letting computer programs handle any hard task. Programming seems hard.
This is not relevant to FAI per se, but Michael and Susan Leigh Anderson have suggested (and begun working on) just that in the field of Machine Ethics. The main contention seems to be that creating an ethical oracle is easier than creating an embodied ethical agent because you don’t need to first figure out whether the robot is an ethical patient. Then once the bugs are out, presumably the same algorithms can be applied to embodied robots.
ETA: For reference, I think the relevant paper is “Machine Metaethics” by Susan Leigh Anderson, in the volume Machine Ethics—I’m sure lukeprog has a copy.
The heck? Why would you not need to figure out if an oracle is an ethical patient? Why is there no such possibility as a sentient oracle?
Is this standard religion-of-embodiment stuff?
The oracle gets asked questions like “Should intervention X be used by doctor D on patient P” and can tell you the correct answer to them without considering the moral status of the oracle.
If it were a robot, it would be asking questions like “Should I run over that [violin/dog/child] to save myself?” which does require considering the status of the robot.
EDIT: To clarify, it’s not that the researcher has no reason to figure out the moral status of the oracle, it’s that the oracle does not need to know its own moral status to answer its domain-specific questions.
What if it assigned moral status to itself and then biased its answers to make its users less likely to pull its plug one day?