It need not amount to anything more complex than obey all instructions on this channel, where the instructions are no more complex than “shut yourself down”
And “always keep this channel open” and “don’t corrupt any sensor data that outputs to this channel” and “don’t send yourself commands on this channel” and “don’t build anything so that it will send you a signal on this channel” and “don’t build anything that will build anything that will eventually send you a signal on this channel unless a signal on this channel tells you to do it”.
… and I can STILL think of more ways to corrupt that kind of hack.
Not to mention that if you don’t want script kiddies to have too much fun, you will need to authenticate the instructions on that channel which another very large can of very wriggly worms...
The problem is not to “Solve Human Morality”, the problem is to make an AI that will do what humans end up having wanted. Since this is a problem for which we can come up with solid definitions (just to plug my own work :-p), it must be a solvable problem. If it looks impossible or infeasible, that is simply because you are taking the wrong angle of attack.
Stop trying to figure out a way to avoid the problem and solve it.
For one thing, taboo the words “morality” and “ethics”, and solve the simpler, realer problem: how do you make an AI do what you intend it to do when you convey some wish or demand in words? As Eliezer has said, humans are Friendly to each-other in this sense: when I ask another human to get me a pizza, the entire apartment doesn’t get covered in a maximal number of pizzas. Another human understands what I really mean.
So just solve that: what reasoning structures does another agent need to understand what I really mean when I ask for a pizza?
But at least stop blatantly trolling LessWrong by trying to avoid the problem by saying blatantly stupid stuff like “Oh, I’ll just put an off-switch on an AI, because obviously no agent of human-level intelligence would ever try to prevent the use of an off-switch by, you know, breaking it, or covering it up with a big metal box for protection.”
The problem is not to “Solve Human Morality”, the problem is to make an AI that will do what humans end up having wanted.
Is it? Why take on either of those gargantuan challenges? Another perfectly reasonable approach is to task the AI with nothing more than data processing with no effectors in the real world (Oracle AI), and watch it like a hawk. And no one at MIRI or on LW has proved this approach dangerous except by making crazy unrealistic assumptions, e.g. in this case why would you ever put the off-switch in a region of the AI’s environment?
As you and Eliezer say, humans are Friendly to each other already. So have humans moderate the actions of the AI, in a controlled setup designed to prevent AI learning to manipulate the humans (break the feedback loop).
Another perfectly reasonable approach is to task the AI with nothing more than data processing with no effectors in the real world (Oracle AI), and watch it like a hawk.
I consider this semi-reasonable, and in fact, wouldn’t even feel the need to watch it like a hawk. Without a decision-outputting algorithm, it’s not an agent, it’s just a learner: it can’t possibly damage human interests.
I say “semi” reasonable, because there is still the issue of understanding debug output from the Oracle’s internal knowledge representations, and putting it to some productive usage.
I also consider a proper Friendly AI to be much more “morally profitable”, in the sense of yielding a much greater benefit than usage of an Oracle Learner by untrustworthy humans.
This becomes an issue of strategy. I assume the end goal is a positive singularity. The MIRI approach seems to be: design and build a provably “safe” AGI, then cede all power to it and hope for the best as it goes “FOOM” and moves us through the singularity. A strategy I would advocate for instead is: build an Oracle AI as soon as it is possible to do so with adequate protections, and use its super-intelligence to design singularity technologies which enable (augmented?) humans to pass through the singularity.
I prefer the latter approach as it can be done with today’s knowledge and technology, and does not rely on mathematical breakthroughs on an indeterminate timescale which may or may not even be possible or result in a practical AGI design. The latter approach instead depends on straight-forward computer science and belts-and-suspenders engineering on a predictable timescale.
If I were executive director of MIRI, I would continue the workshops, because there is a non-zero probability that breakthrough might be made that radically simplifies the safe AGI design space. However I’d definitely spend more than half of the organizations budget and time on a strategy with a definable time-scale and an articulatable project plan, such as the Oracle-AGI-to-Intelligence-Augmentation approach I advocate, although others are possible.
Well that’s where the “positive singularity” and “Friendly (enough) AGI” goals separate: if you choose the route to a “positive singularity” of human intelligence augmentation, you still face the problems of human irrationality, of human moral irrationality (lack of moral caring, moral akrasia, morals that are not aligned with yours, etc), but you now also face the issue of what happens to human evaluative judgement under the effects of intelligence augmentation. Can humans be modified while maintaining their values? We honestly don’t know.
(And I for one am reasonably sure that nobody wise should ever make me their Singularity-grade god-leader, on grounds that my shouldness function, while not nearly as completely alien as Clippy’s, is still relatively unusual, somewhere on an edge of a bell curve, and should therefore not be trusted with the personal or collective future of anyone who doesn’t have a similar shouldness function. Sure, my meta-level awareness of this makes me Friendly, loosely speaking, but we humans are very bad at exercising perfect meta-level awareness of others’ values all the time, and often commit evaluative mind-projection fallacies.)
What I would personally do, at this stage, is just to maintain a distribution (you know probability was gonna enter somewhere) over potential routes to a positive outcome. Plan and act according to the full distribution, through institutions like FHI and FLI and such, while still focusing the specific, achieve-a-single-narrow-outcome optimization power of MIRI’s mathematical talents on building provably Friendly AGIs. Update early and often on whatever new information is available.
For instance, the more I look into AGI and cognitive science research, the more I genuinely feel the “Friendly AI route” can work quite well. From my point of view, it looks more like a research program than an impossible Herculian task (admittedly, the difference is often kinda hard to see to those who’ve never served time in a professional research environment), whereas something like safe human augmentation is currently full of unknown unknowns that are difficult to plan around.
And as much as I generally regard wannabe-ems with a little disdain for their flippant “what do I need reality for!?” views, I do think that researching human mind uploading would help discover a lot of the neurological and cognitive principles needed to build a Friendly AI (ie: what cognitive algorithms are we using to make evaluative judgements?), while also helping create avenues for agents with human motivations to “go FOOM” themselves, just in case, so that’s worthwhile too.
The important thing to note about the problems you identified is how they differ from the problem domains of basic research. What happens to human evaluative judgement under the effects of intelligence augmentation? That’s an experimental question. Can we trust a single individual to be enhanced? Almost certainly not. So perhaps we need to pick 100 or 1,000 people, wired into an shared infrastructure which enhances them in lock-step, and has incentives in place to ensure collaboration over competition, and consensus over partisanship in decision making protocols. Designing these protocols and safeguards takes a lot of work, but both the scale and the scope of that work is fairly well quantified. We can make a project plan and estimate with a high degree of accuracy how long and how much money it would take to design sufficiently safe oracle AI and intelligence augmentation projects.
FAI theory, on the other hand, is like the search for a grand unified theory of physics. We presume such a theory exists. We even have an existence proof of sorts (the human mind for FAI, the universe itself in physics). But the discovery of a solution is something that will or will not happen, and if it does it will be on an unpredictable time scale. Maybe it will take 5 years. Maybe 50, maybe 500. Who knows? After the rapid advances of the early 20th century, I’m sure most physicists thought a grand unified theory must be within reach; Einstein certainly did. Yet here we are nearly 100 years after the publication of the general theory of relativity, 85 years after most of the major discoveries of quantum mechanics, and yet in many ways we seem no closer to a theory of everything than we were some 40 years ago when the standard model was largely finalized.
It could be that at the very next MIRI workshop some previously unknown research associate solves the FAI problem conclusively. That’d be awesome. Or maybe she proves it impossible, which would be an equally good outcome because then we could at least refocus our efforts. Far worse, it might be that 50 years from now all MIRI has accumulated is a thoroughly documented list of dead-ends.
But that’s not the worst case, because in reality UFAI will appear within the next decade or two, whether we want it to or not. So unless we are confident that we will solve the FAI problem and build out the solution before the competition, we’d better start investing heavily in alternatives.
The AI winter is over. Already multiple very well funded groups are rushing forward to generalize already super-human narrow AI techniques. AGI is finally a respectable field again, and there are multiple teams making respectable progress towards seed AI. And parallel hardware and software tools have finally gotten to the point where a basement AGI breakthrough is a very real and concerning possibility.
We don’t have time to be dicking around doing basic research on whiteboards.
Aaaand there’s the “It’s too late to start researching FAI, we should’ve started 30 years ago, we may as well give up and die” to go along with the “What’s the point of starting now, AGI is too far away, we should start 30 years later because it will only take exactly that amount of time according to this very narrow estimate I have on hand.”
If the overlap between your credible intervals on “How much time we have left” and “How much time it will take” do not overlap, then you either know a heck of a lot I don’t, or you are very overconfident. I usually try not to argue from “I don’t know and you can’t know either” but for the intersection of research and AGI timelines I can make an exception.
Admittedly my own calculation looks less like an elaborate graph involving supposed credibility intervals, and, “Do we need to do this? Yes. Can we realistically avoid having to do this? No. Let’s start now EOM.”
I think that’s a gross simplification of the possible outcomes.
Admittedly my own calculation looks less like an elaborate graph involving supposed credibility intervals, and, “Do we need to do this? Yes. Can we realistically avoid having to do this? No. Let’s start now EOM.”
I think you need better planning.
There’s a great essay that has been a featured article on the main page for some time now called Levels of Action. Applied to FAI theory:
Level 1: Directly ending human suffering.
Level 2: Constructing an AGI capable of ending human suffering for us.
Level 3: Working on the computer science aspects of AGI theory.
Level 4: Researching FAI theory, which constrains the Level 3 AGI theory.
But for that high-level basic research to have any utility, these levels must be connected to each other: there must be a firm chain where FAI theory informs AGI designs, which are actually used in the construction of an AGI tasked with ending human suffering in a friendly way.
From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!
That makes a certain amount of intuitive sense, having stages laid out end-to-end in chronological order. However as a trained project manager I must tell you this is a recipe for disaster! The problem is that the design space branches out at each link, but without the feedback of follow-on steps, inefficient decision making will occur at earlier stages. The space of working FAI theories is much, much larger than the FAI-theory-space which results in practical AGI designs which can be implemented prior to the UFAI competition and are suitable for addressing real-world issues of human suffering as quickly as possible.
Some examples from the comparably large programs of the Manhattan project and Apollo moonshot are appropriate, if you’ll forgive the length (skip to the end for a conclusion):
The Manhattan project had one driving goal: drop a bomb on Berlin and Tokyo before the GIs arrived, hopefully ending the war early. (Of course Germany surrendered before the bomb was finished, and Tokyo ended up so devastated by conventional firebombing that Hiroshima and Nagasaki were selected instead, but the original goal is what matters here.) The location of the targets meant that the bomb had to be small enough to fit in a conventional long-distance bomber, and the timeline meant that the simpler but less efficient U-235 designs were preferred. A program was designed, adequate resources allocated, and the goal achieved on time.
On the other hand it is easy to imagine how differently things might have gone if the strategy was reversed; if instead the US military decided to institute a basic research program into nuclear physics and atomic structure, before deciding on the optimal bomb reactions, then doing detailed bomb design before creating the industry necessary to produce enough material for a working weapon. Just looking at the first stage, there is nothing a priori which makes it obvious that U-235 and Pu-239 are the “interesting” nuclear fuels to focus on. Thorium, for example, was more naturally abundant and already being extracted as a by product of rare earth metal extraction, its reactions generate less lethal radiation and long-lasting waste products, and does generate U-233 which could be used in a nuclear bomb. However the straight-forward military and engineering requirements of making a bomb on schedule, and successfully delivering it on target favored U-235 and Pu-239 based weapon designs, which focused focused the efforts of the physicists involved on those fuel pathways. The rest is history.
The Apollo moonshot is another great example. NASA had a single driving goal: deliver a man to the moon before 1970, and return him safely to Earth. There’s a lot of decisions that were made in the first few years driven simply by time and resources available: e.g. heavy-lift vs orbital assembly, direct return vs lunar rendezvous, expendable vs. reuse, staging vs. fuel depots. Ask Wernher von Braun what he imagined an ideal moon mission would look like, and you would have gotten something very different than Apollo. But with Apollo NASA made the right tradeoffs with respect to schedule constraints and programmatic risk.
The follow-on projects of Shuttle and Station are a completely different story, however. They were designed with no articulated long-term strategy, which meant they tried to be everything to everybody and as a result were useful to no one. Meanwhile the basic research being carried out at NASA has little, if anything to do with the long-term goals of sending humans to Mars. There’s an entire division, the Space Biosciences group, which does research on Station about the long-term effects of microgravity and radiation on humans, supposedly to enable a long-duration voyage to Mars. Never mind that the microgravity issue is trivially solved by spinning the spacecraft with nothing more than a strong steel rope as a tether, and the radiation issue is sufficiently mitigated by having a storm shelter en route and throwing a couple of Martian sandbags on the roof once you get there.
There’s an apocryphal story about the US government spending millions of dollars to develop the “Space Pen”—a ballpoint pen with ink under pressure to enable writing in microgravity environments. Much later at some conference an engineer in that program meets his Soviet counterpart and asks how they solved that difficult problem. The cosmonauts used a pencil.
Sadly the story is not true—the “Space Pen” was a successful marketing ploy by inventor Paul Fisher without any ties to NASA, although it was used by NASA and the Russians on later missions—but it does serve to illustrate the point very succinctly. I worry that MIRI is spending its days coming up with space pens when a pencil would have done just fine.
Let me provide some practical advice. If I were running MIRI, I would still employ mathematicians working on the hail-Mary of a complete FAI theory—avoiding the Löbian obstacle etc. -- and run the very successful workshops, though maybe just two a year. But beyond that I would spend all remaining resources on a pragmatic AGI design programme:
1) Have a series of workshops with AGI people to do a review of possible AI-influenced strategies for a positive singulatiry—top-down FAI, seed AI to FAI, Oracle AI to FAI, Oracle AI to human augmentation, teaching a UFAI morals in a nursery environment, etc.
2) Have a series of workshops, again with AGI people to review tactics: possible AGI architectures & the minimal seed AI for each architecture, probabilistically reliable boxing setups, programmatic security, etc.
Then use the output of these workshops—including reliable constraints on timelines—to drive most of the research done by MIRI. For example, I anticipate that reliable unfriendly Oracle AI setups will require probabilistically auditable computation, which itself will require a strongly typed, purely functional virtual machine layer from which computation traces can be extracted and meaningfully analyzed in isolation. This is the sort of research MIRI could sponsor a grad student or Ph.d postdoc to perform.
BTW, other gripe: I have yet to see adequate arguments for the “can we realistically avoid having to do this?” from MIRI which aren’t strawman arguments.
While I don’t know much about your AGi expertise, I agree that MIRI is missing an experienced top-level executive who knows how to structure, implement and risk-mitigate an ambitious project like FAI and has a track record to prove it. Such a person would help prevent flailing about and wasting time and resources. I am not sure what other projects are in this reference class and whether MIRI can find and hire a person like that, so maybe they are doing what they can with the meager budget they’ve got. Do you think that the Manhattan project and the Space Shuttle are in the ballpark of the FAI? My guess is that they don’t even come close in terms of ambition, risk, effort or complexity.
I am not sure what other projects are in this reference class and whether MIRI can find and hire a person like that, so maybe they are doing what they can with the meager budget they’ve got.
Project managers are typically expensive because they are senior people before they enter management. Someone who has never actually worked at the bottom rung of the ladder is often quite useless in a project management role. But that’s not to say that you can’t find someone young who has done a short stint at the bottom, got PMP certified (or whatever), and has 1-2 projects under their belt. It wouldn’t be cheap, but not horribly expensive either.
On the other hand, Luke seems pretty on the ball with respect to administrative stuff. It may be sufficient to get him some project manager training and some very senior project management advisers.
Neither one of these would be a long-term adequate solution. You need very senior, very experienced project management people in order to tackle something as large as FAI, and stay on schedule and on budget. But in terms of just making sure the organization is focused on the right issues, either of the above would be a drastic improvement, and enough for now.
Do you think that the Manhattan project and the Space Shuttle are in the ballpark of the FAI? My guess is that they don’t even come close in terms of ambition, risk, effort or complexity.
60 years ago, maybe. However these days advances in cognitive science, narrow AI, and computational tools are advancing at rapid paces on their own. The problem for MIRI should be that of ensuring a positive singularity via careful leverage of the machine intelligence already being developed for other purposes. That’s a much smaller project, and something I think a small but adequately funded organization should be able to pull off.
From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!
Yes, dear, some of us are programmers, we know about waterfalls. Our approach is more like, “Attack the most promising problems that present themselves, at every point; don’t actually build things which you don’t yet know how to make not destroy the world, at any point.” Right now this means working on unbounded problems because there are no bounded problems which seem more relevant and more on the critical path. If at any point we can build something to test ideas, of course we will; unless our state of ignorance is such that we can’t test that particular idea without risking destroying the world, in which case we won’t, but if you’re really setting out to test ideas you can probably figure out some other way to test them, except for very rare highly global theses like “The intelligence explosion continues past the human level.” More local theses should be testable.
See also Ch. 22 from HPMOR, and keep in mind that I am not Harry, I contain Harry, all the other characters, their whole universe, and everything that happens inside it. In other words, I am not Harry, I am the universe that responded to Harry.
I’ll have to review Ch. 22 later as it is quite long.
If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.
For example, take oracle AI. The sequences do a good job of showing how a black box AI can’t be safely boxed, nor can any of its recommendations be trusted. But those arguments don’t generalize to when we can see and understand the inner workings of the AI. Yes engineering challenges apply as you can’t demand a computational trace of the entire returned result, as that would require an even more powerful AI to analyze, and then it’d be turtles all the way down. However you can do something like the Fiat-Shamir transform for selecting branches of the computational trace to audit. In essence, use the cryptographic hash of the result in order to choose which traces of the audit log to reveal. This allows the audit log to be only a tiny, tiny slice of the entire computation, yet it can be shown that faking such an audit log is computationally infeasible, meaning that it requires a large multiple more computation than went into the original result, which means it would be energetically detectable.
Of course you would also have to have the AI be written for a virtual machine which is pure, strongly typed and directly supports the native abstraction primitives of the AGI design (e.g. if it were OpenCog, the virtual machine would exist at the abstraction level of the atomspace), and have a computational stack underneath that which prevents VM breakout, and other protective measures. But these are surmountable engineering challenges, the difficulty of which can be realistically quantified.
So how much more or less difficult would it be to accomplish such an untrusted oracle AI vs the stable self-modifying agent and value-loading approach? Which one is more likely to occur before the “competition”?
I’m not demanding a full waterfall project plan, but even agile requires convincing arguments about critical paths and relative priorities. I for one am not convinced.
If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.
See also Ch. 22 from HPMOR, and keep in mind that I am not Harry, I contain Harry, all the other characters, their whole universe, and everything that happens inside it. In other words, I am not Harry, I am the universe that responded to Harry.
Badass boasting from fictional evidence?
Yes, dear, some of us are programmers, we know about waterfalls.
If anyone here knew anything about the Waterfall Model, they’d know it was only ever proposed sarcastically, as a perfect example of how real engineering projects never work. “Agile” is pretty goddamn fake, too. There’s no replacement for actually using your mind to reason about what project-planning steps have the greatest expected value at any given time, and to account for unknown unknowns (ie: debugging, other obstacles) as well.
If anyone here knew anything about the Waterfall Model, they’d know it was only ever proposed sarcastically, as a perfect example of how real engineering projects never work
Yes, and I used it in that context: “We know about waterfalls” = “We know not to do waterfalls, so you don’t need to tell us that”. Thank you for that very charitable interpretation of my words.
FAI has definite subproblems. It is not a matter of scratching away at a chalkboard hoping to make some breakthrough in “philosophy” or some other proto-sensical field that will Elucidate Everything and make the problem solvable at all. FAI, right now, is a matter of setting researchers to work on one subproblem after another until they are all solved.
In fact, when I do literature searches for FAI/AGI material, I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about. This is my view: there is low-hanging fruit in applying existing academic knowledge to FAI problems. Where such low-hanging fruit does not exist, the major open problems can largely be addressed by recourse to higher-hanging fruit within mathematics, or even to empirical science.
Since you believe it’s all so wide-open, I’d like to know what you think of as “the FAI problem”.
If you have an Oracle AI you can trust, you can use it to solve FAI problems for you. This is a fine approach.
We don’t have time to be dicking around doing basic research on whiteboards.
In-context, what was meant by “Oracle AI” is a very general learning algorithm with some debug output, but no actual decision-theory or utility function whatsoever built in. That would be safe, since it has no capability or desire to do anything.
Ok, but a system like you’ve described isn’t likely to think about what you want it to think about or produce output that’s actually useful to you either.
Well yes. That’s sort of the problem with building one. Utility functions are certainly useful for specifying where logical uncertainty should be reduced.
Well, I don’t know about the precise construction that would be used. Certainly I could see a human being deliberately focusing the system on some things rather than others.
All existing learning algorithms I know of, and I dare say all that exist, have at least an utility function, and also something that could be interpreted as a decision theory. Consider for example support vector machines, which explicitly try to maximize a margin (that would be the utility function), and any algorithm for computing SVMs can be interpreted as a decision theory. Similar considerations hold for neural networks, genetic algorithms, and even the minimax algorithm.
Thus, I strongly doubt that the notion of a learning algorithm with no utility function makes any sense.
Those are optimization criteria, but they are not decision algorithms in the sense that we usually talk about them in AI. A support vector machine is just finding the extrema of a cost function via its derivative, not planning a sequence of actions.
The most popular algorithm for SVMs does plan a sequence of actions, complete with heuristics as to which action to take. True, the “actions” are internal : they are changes to some data structure within the computer’s memory, rather than changes to the external world. But that is not so different from e.g. a chess AI, which assigns some heuristic score to chess positions and attempts to maximize it using a decision algorithm (to decide which move to make), even though the chessboard is just a data structure within the computer memory.
“Internal” to the “agent” is very different from having an external output to a computational system outside the “agent”. “Actions” that come from an extremely limited, non-Turing-complete “vocabulary” (really: programming language or computational calculus (those two are identical)) are also categorically different from a Turing complete calculus of possible actions.
The same distinction applies for hypothesis class that the learner can learn: if it’s not Turing complete (or some approximation thereof, like a total calculus with coinductive types and corecursive programs), then it is categorically not general learning or general decision-making.
This is why we all employ primitive classifiers every day without danger, and you need something like Solomonoff’s algorithmic probability in order to build AGI.
I agree, of course, that none of the examples I gave (“primitive classifiers”) are dangerous. Indeed, the “plans” they are capable of considering are too simple to pose any threat (they are, as you say, not Turing complete).
But, that doesn’t seem to relevant to the argument at all. You claimed
a very general learning algorithm with some debug output, but no actual decision-theory or utility function
whatsoever built in. That would be safe, since it has no capability or desire to do anything.
You claimed that a general learning algorithm without decision-theory or utility function is possible.
I pointed out that all (harmless) practical learning algorithms we know of do in fact have decision theories and utility functions.
What would “a learning algorithm without decision-theory or utility function, something that has no desire to do anything” even look like? Does the concept even make sense? Eliezer writes here
A string of zeroes down an output line to a motorized arm is just as much an output as any other output;
there is no privileged null, there is no such thing as ‘no action’ among all possible outputs.
To ‘do nothing’ is just another string of English words, that would be interpreted the same as
any other English words, with latitude.
You claimed that a general learning algorithm without decision-theory or utility function is possible. I pointed out that all (harmless) practical learning algorithms we know of do in fact have decision theories and utility functions.
/facepalm
There is in fact such a thing as a null output. There is in fact such a thing as a learner with a sub-Turing hypothesis class. Such a learner with such a primitive output as “in the class” or “not in the class” does not engage in world optimization, that is: its actions do not, to its own knowledge, skew any probability distribution over future states of any portion of the world outside itself.
It does not narrow the future.
Now, what we’ve been proposing as an Oracle is even less capable. It would truly have no outputs whatsoever, only input and a debug view. It would, by definition, be incapable of narrowing the future of anything, even its own internal states.
Perhaps I have misused terminology, but that is what I was referring to: inability to narrow the outer world’s future.
This thing you are proposing, an “oracle” that is incapable of modeling itself and incapable of modeling its environment (either would require turing-complete hypotheses), what could it possibly be useful for? What could it do that today’s narrow AI can’t?
You seem to have lost the thread of the conversation. The proposal was to build a learner that can model the environment using Turing-complete models, but which has no power to make decisions or take actions. This would be a Solomonoff Inducer approximation, not an AIXI approximation.
There is in fact such a thing as a learner with a sub-Turing hypothesis class. Such a learner
with such a primitive output as “in the class” or “not in the class” does not engage in
world optimization, that is: its actions do not, to its own knowledge,
skew any probability distribution over future states of any portion of the world outside itself.
…
Now, what we’ve been proposing as an Oracle is even less capable.
which led me to think you were talking about an oracle even less capable than a learner with a sub-Turing hypothesis class.
It would truly have no outputs whatsoever, only input and a debug view. It would, by definition, be
incapable of narrowing the future of anything, even its own internal states.
If the hypotheses it considers are turing-complete, then, given enough information (and someone would give it enough information, otherwise they couldn’t do anything useful with it), it could model itself, its environment, the relation between its internal states and what shows up on the debug view, and the reactions of its operators on the information they learn from that debug view. Its (internal) actions very much would, to its own knowledge, skew the probability distribution over future states of the outer world.
I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about.
Name three. FAI contains a number of counterintuitive difficulties and it’s unlikely for someone to do FAI work successfully by accident. On the other hand, someone with a fuzzier model believing that a paper they found sure sounds relevant, why isn’t MIRI citing it, is far more probable from my perspective and prior.
I wouldn’t say that there’s someone out there directly solving FAI problems without having explicitly intended to do so. I would say there’s a lot we can build on.
Keep in mind, I’ve seen enough of a sample of Eld Science being stupid to understand how you can have a very low prior on Eld Science figuring out anything relevant. But lacking more problem guides from you on the delta between plain AI problems and FAI problems, we go on what we can.
One paper on utility learning that relies on a supervised-learning methodology (pairwise comparison data) rather than a de-facto reinforcement learning methodology (which can and will go wrong in well-known ways when put into AGI). One paper on progress towards induction algorithms that operate at multiple levels of abstraction, which could be useful for naturalized induction if someone put more thought and expertise into it.
That’s only two, but I’m a comparative beginner at this stuff and Eld Science isn’t very good at focusing on our problems, so I expect that there’s actually more to discover and I’m just limited by lack of time and knowledge to do the literature searches.
By the way, I’m already trying to follow the semi-official MIRI curriculum, but if you could actually write out some material on the specific deltas where FAI work departs from the preexisting knowledge-base of academic science, that would be really helpful.
Since you believe it’s all so wide-open, I’d like to know what you think of as “the FAI problem”.
1) Designing a program capable of arbitrary self-modification, yet maintaining guarantees of “correct” behavior according to a goal set that is by necessity included in the modifications as well.
2) Designing such a high level set of goals which ensure “friendliness”.
That seems a circular argument. How do you use a self-modifying evolutionary search to find a program whose properties remain stable under self-modifying evolutionary search? Unless you started with the right answer, the search AI would quickly rewrite or reinterpret its own driving goals in a non-friendly way, and who knows what you’d end up with.
It’s how you draw your system box. Evolutionary search is equivalent to a self-modifying program, if you think of the whole search process as the program. The same issues apply.
I think the sequences do a good job at demolishing the idea that human testers can possibly judge friendliness directly, so long as the AI operates as a black box. If you have a debug view into the operation of the AI that is a different story, but then you don’t need friendliness anyway.
Great, you’ve got names for answers you are looking for. That doesn’t mean the answers are any easier to find. You’ve attached a label to the declarative statement which specifies the requirements a solution must meet, but that doesn’t make the search for a solution suddenly have a fixed timeline. It’s uncertain research: it might take 5 years, 10 years, or 50 years, and throwing more people at the problem won’t necessarily make the project go any faster.
And how is trying to build a safe Oracle AI that can solve FAI problems for us not basic research? Or, to make a better statement: how is trying to build an Unfriendly superintelligent paperclip maximizer not basic research, at today’s research frontier?
Logical uncertainty, for example, is a plain, old-fashioned AI problem. We need it for FAI, we’re pretty sure, but it’s turning out UFAI might need it, too.
“Basic research is performed without thought of practical ends.”
“Applied research is systematic study to gain knowledge or understanding necessary to determine the means by which a recognized and specific need may be met.”
-National Science Foundation.
We need to be doing applied research, not basic research. What MIRI should do is construct a complete roadmap to FAI, or better: a study exhaustively listing strategies for achieving a positive singularity, and tactics for achieving friendly or unfriendly AGI, and concluding with a small set of most-likely scenarios. MIRI should then have identified risk factors which affect either the friendliness of the AGI in each scenario, or the capability of the UFAI to do damage (in boxing setups). These risk factors should be prioritized based on how much it is expected knowing more about each would bias the outcome in a positive direction, and it should be these problems as the topics of MIRI workshops.
Instead MIRI is performing basic research. It’s basic research not because it is useless, but because we are not certain at this point in time what relative utility it will have. And if we don’t have a grasp on expected utility, how can we prioritize? There’s a hundred avenues of research which are important to varying degrees to the FAI project. I worked for a number of years at NASA-Ames Research Center, and in the same building as me was the Space Biosciences Division. Great people, don’t get me wrong, and for decades they have funded really cool research on the effects of microgravity and radiation on living organisms, with the justification that such effects and counter-measures need to be known for long duration space voyages, e.g. a 2-year mission to Mars. Never mind that the microgravity issue is trivially solved with a few thousand dollar steel tether connecting the upper stage to the space craft as they spin to create artificial gravity, and the radiation exposure is mitigated by having a storm shelter in the craft and throwing a couple of Martian sandbags on the roof once you get there. It’s spending millions of dollars to develop the pressurized-ink “Space Pen”, when the humble pencil would have done just fine.
Sadly I think MIRI is doing the same thing, and it is represented in one part of your post I take huge issue with:
Logical uncertainty, for example, is a plain, old-fashioned AI problem. We need it for FAI, we’re pretty sure...
If we’re only “pretty sure” it’s needed for FAI, if we can’t quantify exactly what its contribution will be, and how important that contribution is relative to other possible things to be working on.. then we have some meta-level planning to do first. Unfortunately I don’t see MIRI doing any planning like this (or if they are, it’s not public).
Are you on the “Open Problems in Friendly AI” Facebook group? Because much of the planning is on there.
If we’re only “pretty sure” it’s needed for FAI, if we can’t quantify exactly what its contribution will be, and how important that contribution is relative to other possible things to be working on.. then we have some meta-level planning to do first. Unfortunately I don’t see MIRI doing any planning like this (or if they are, it’s not public).
Logical uncertainty lets us put probabilities to sentences in logics. This, supposedly, can help get us around the Loebian Obstacle to proving self-referencing statements and thus generating stable self-improvement in an agent. Logical uncertainty also allows for making techniques like Updateless Decision Theory into real algorithms, and this too is an AI problem: turning planning into inference.
The cognitive stuff about human preferences is the Big Scary Hard Problem of FAI, but utility learning (as Stuart Armstrong has been posting about lately) is a way around that.
If you can create a stably self-improving agent that will learn its utility function from human data, equipped with a decision theory capable of handling both causative games and Timeless situations correctly… then congratulations, you’ve got a working plan for a Friendly AI and you can start considering the expected utility of actually building it (at least, to my limited knowledge).
Around here you should usually clarify whether your uncertainty is logical or indexical ;-).
Or.. you could use a boxed oracle AI to develop singularity technologies for human augmentation, or other mechanisms to keep moral humans in the loop through the whole process, and sidestep the whole issue of FAI and value loading in the first place.
Which approach do you think can be completed earlier with similar probabilities of success? What data did you use to evaluate that, and how certain are you of its accuracy and completeness?
I actually really do think that de novo AI is easier than human intelligence augmentation. We have good cognitive theories for how an agent is supposed to work (including “ideal learner” models of human cognitive algorithms). We do not have very good theories of in-vitro neuroengineering.
This assumes that you have usable, safe Oracle AI which then takes up your chosen line of FAI or neuroengineering problems for you. You are conditioning the hard part on solving the hard part.
You don’t need to solve philosophy to solve FAI, but philosophy is relevant to figuring out, in broad terms, the relative livelihoods of various problems and solutions.
I’m not arguing that AI will necessary be safe. I am arguing that the failure modes in’vestigated by MIRI aren’t likely. It is worthwhile to research effectivev off switches. It is not worthwhile to endlessly refer to a dangerous AI of a kind no one with a smidgeon of sense would build.
Bzzzt. Wrong. You still haven’t explained how to create an agent that will faithfully implement my verbal instruction to bring me a pizza. You have a valid case in the sense of pointing out that there can easily exist a “middle ground” between the Superintelligent Artificial Ethicist (Friendly AI in its fullest sense), the Superintelligent Paper Clipper (a perverse, somewhat unlikely malprogramming of a real superintelligence), and the Reward-Button Addicted Reinforcement Learner (the easiest unfriendly AI to actually build). What you haven’t shown is how to actually get around the Addicted Reinforcement Learner and the paper-clipper and actually build an agent that can be sent out for pizza without breaking down at all.
Your current answers seem to be, roughly, “We get around the problem by expecting future AI scientists to solve it for us.” However, we are the AI scientists: if we don’t figure out how to make AI deliver pizza on command, who will?
You keep misreading me. I am not claiming that to gave a solution. I am claiming that MIRI is overly pessimistic about the problem, and offering an over engineered solution. Inasmuch ad you say there is a middle ground, you kind if agree.
The thing is, MIRI doesn’t claim that a superintelligent world-destroying paperclipper is the most likely scenario. It’s just illustrative of why we have an actual problem: because you don’t need malice to create an Unfriendly AI that completely fucks everything up.
So how did you like CATE, over in that other thread? That AI is non-super-human, doesn’t go FOOM, doesn’t acquire nanotechnology, can’t do anything a human upload couldn’t do… and still can cause quite a lot of damage simply because it’s more dedicated than we are, suffers fewer cognitive flaws than us, has more self-knowledge than us, and has no need for rest or food.
I mean, come on: what if a non-FOOMed but Unfriendly AI becomes as rich as Bill Gates? After all, if Bill Gates did it while human, than surely an AI as smart as Bill Gates but without his humanity can do the same thing, while causing a bunch more damage to human values because it simply does not feel Gates’ charitable inclinations.
And “always keep this channel open” and “don’t corrupt any sensor data that outputs to this channel” and “don’t send yourself commands on this channel” and “don’t build anything so that it will send you a signal on this channel” and “don’t build anything that will build anything that will eventually send you a signal on this channel unless a signal on this channel tells you to do it”.
… and I can STILL think of more ways to corrupt that kind of hack.
Not to mention that if you don’t want script kiddies to have too much fun, you will need to authenticate the instructions on that channel which another very large can of very wriggly worms...
Yep, lots of stuff which very difficult in absolute terms, but not obviously more difficult relatively than Solve Human Morality.
The problem is not to “Solve Human Morality”, the problem is to make an AI that will do what humans end up having wanted. Since this is a problem for which we can come up with solid definitions (just to plug my own work :-p), it must be a solvable problem. If it looks impossible or infeasible, that is simply because you are taking the wrong angle of attack.
Stop trying to figure out a way to avoid the problem and solve it.
For one thing, taboo the words “morality” and “ethics”, and solve the simpler, realer problem: how do you make an AI do what you intend it to do when you convey some wish or demand in words? As Eliezer has said, humans are Friendly to each-other in this sense: when I ask another human to get me a pizza, the entire apartment doesn’t get covered in a maximal number of pizzas. Another human understands what I really mean.
So just solve that: what reasoning structures does another agent need to understand what I really mean when I ask for a pizza?
But at least stop blatantly trolling LessWrong by trying to avoid the problem by saying blatantly stupid stuff like “Oh, I’ll just put an off-switch on an AI, because obviously no agent of human-level intelligence would ever try to prevent the use of an off-switch by, you know, breaking it, or covering it up with a big metal box for protection.”
Is it? Why take on either of those gargantuan challenges? Another perfectly reasonable approach is to task the AI with nothing more than data processing with no effectors in the real world (Oracle AI), and watch it like a hawk. And no one at MIRI or on LW has proved this approach dangerous except by making crazy unrealistic assumptions, e.g. in this case why would you ever put the off-switch in a region of the AI’s environment?
As you and Eliezer say, humans are Friendly to each other already. So have humans moderate the actions of the AI, in a controlled setup designed to prevent AI learning to manipulate the humans (break the feedback loop).
I consider this semi-reasonable, and in fact, wouldn’t even feel the need to watch it like a hawk. Without a decision-outputting algorithm, it’s not an agent, it’s just a learner: it can’t possibly damage human interests.
I say “semi” reasonable, because there is still the issue of understanding debug output from the Oracle’s internal knowledge representations, and putting it to some productive usage.
I also consider a proper Friendly AI to be much more “morally profitable”, in the sense of yielding a much greater benefit than usage of an Oracle Learner by untrustworthy humans.
This becomes an issue of strategy. I assume the end goal is a positive singularity. The MIRI approach seems to be: design and build a provably “safe” AGI, then cede all power to it and hope for the best as it goes “FOOM” and moves us through the singularity. A strategy I would advocate for instead is: build an Oracle AI as soon as it is possible to do so with adequate protections, and use its super-intelligence to design singularity technologies which enable (augmented?) humans to pass through the singularity.
I prefer the latter approach as it can be done with today’s knowledge and technology, and does not rely on mathematical breakthroughs on an indeterminate timescale which may or may not even be possible or result in a practical AGI design. The latter approach instead depends on straight-forward computer science and belts-and-suspenders engineering on a predictable timescale.
If I were executive director of MIRI, I would continue the workshops, because there is a non-zero probability that breakthrough might be made that radically simplifies the safe AGI design space. However I’d definitely spend more than half of the organizations budget and time on a strategy with a definable time-scale and an articulatable project plan, such as the Oracle-AGI-to-Intelligence-Augmentation approach I advocate, although others are possible.
Well that’s where the “positive singularity” and “Friendly (enough) AGI” goals separate: if you choose the route to a “positive singularity” of human intelligence augmentation, you still face the problems of human irrationality, of human moral irrationality (lack of moral caring, moral akrasia, morals that are not aligned with yours, etc), but you now also face the issue of what happens to human evaluative judgement under the effects of intelligence augmentation. Can humans be modified while maintaining their values? We honestly don’t know.
(And I for one am reasonably sure that nobody wise should ever make me their Singularity-grade god-leader, on grounds that my shouldness function, while not nearly as completely alien as Clippy’s, is still relatively unusual, somewhere on an edge of a bell curve, and should therefore not be trusted with the personal or collective future of anyone who doesn’t have a similar shouldness function. Sure, my meta-level awareness of this makes me Friendly, loosely speaking, but we humans are very bad at exercising perfect meta-level awareness of others’ values all the time, and often commit evaluative mind-projection fallacies.)
What I would personally do, at this stage, is just to maintain a distribution (you know probability was gonna enter somewhere) over potential routes to a positive outcome. Plan and act according to the full distribution, through institutions like FHI and FLI and such, while still focusing the specific, achieve-a-single-narrow-outcome optimization power of MIRI’s mathematical talents on building provably Friendly AGIs. Update early and often on whatever new information is available.
For instance, the more I look into AGI and cognitive science research, the more I genuinely feel the “Friendly AI route” can work quite well. From my point of view, it looks more like a research program than an impossible Herculian task (admittedly, the difference is often kinda hard to see to those who’ve never served time in a professional research environment), whereas something like safe human augmentation is currently full of unknown unknowns that are difficult to plan around.
And as much as I generally regard wannabe-ems with a little disdain for their flippant “what do I need reality for!?” views, I do think that researching human mind uploading would help discover a lot of the neurological and cognitive principles needed to build a Friendly AI (ie: what cognitive algorithms are we using to make evaluative judgements?), while also helping create avenues for agents with human motivations to “go FOOM” themselves, just in case, so that’s worthwhile too.
The important thing to note about the problems you identified is how they differ from the problem domains of basic research. What happens to human evaluative judgement under the effects of intelligence augmentation? That’s an experimental question. Can we trust a single individual to be enhanced? Almost certainly not. So perhaps we need to pick 100 or 1,000 people, wired into an shared infrastructure which enhances them in lock-step, and has incentives in place to ensure collaboration over competition, and consensus over partisanship in decision making protocols. Designing these protocols and safeguards takes a lot of work, but both the scale and the scope of that work is fairly well quantified. We can make a project plan and estimate with a high degree of accuracy how long and how much money it would take to design sufficiently safe oracle AI and intelligence augmentation projects.
FAI theory, on the other hand, is like the search for a grand unified theory of physics. We presume such a theory exists. We even have an existence proof of sorts (the human mind for FAI, the universe itself in physics). But the discovery of a solution is something that will or will not happen, and if it does it will be on an unpredictable time scale. Maybe it will take 5 years. Maybe 50, maybe 500. Who knows? After the rapid advances of the early 20th century, I’m sure most physicists thought a grand unified theory must be within reach; Einstein certainly did. Yet here we are nearly 100 years after the publication of the general theory of relativity, 85 years after most of the major discoveries of quantum mechanics, and yet in many ways we seem no closer to a theory of everything than we were some 40 years ago when the standard model was largely finalized.
It could be that at the very next MIRI workshop some previously unknown research associate solves the FAI problem conclusively. That’d be awesome. Or maybe she proves it impossible, which would be an equally good outcome because then we could at least refocus our efforts. Far worse, it might be that 50 years from now all MIRI has accumulated is a thoroughly documented list of dead-ends.
But that’s not the worst case, because in reality UFAI will appear within the next decade or two, whether we want it to or not. So unless we are confident that we will solve the FAI problem and build out the solution before the competition, we’d better start investing heavily in alternatives.
The AI winter is over. Already multiple very well funded groups are rushing forward to generalize already super-human narrow AI techniques. AGI is finally a respectable field again, and there are multiple teams making respectable progress towards seed AI. And parallel hardware and software tools have finally gotten to the point where a basement AGI breakthrough is a very real and concerning possibility.
We don’t have time to be dicking around doing basic research on whiteboards.
Aaaand there’s the “It’s too late to start researching FAI, we should’ve started 30 years ago, we may as well give up and die” to go along with the “What’s the point of starting now, AGI is too far away, we should start 30 years later because it will only take exactly that amount of time according to this very narrow estimate I have on hand.”
If the overlap between your credible intervals on “How much time we have left” and “How much time it will take” do not overlap, then you either know a heck of a lot I don’t, or you are very overconfident. I usually try not to argue from “I don’t know and you can’t know either” but for the intersection of research and AGI timelines I can make an exception.
Admittedly my own calculation looks less like an elaborate graph involving supposed credibility intervals, and, “Do we need to do this? Yes. Can we realistically avoid having to do this? No. Let’s start now EOM.”
I think that’s a gross simplification of the possible outcomes.
I think you need better planning.
There’s a great essay that has been a featured article on the main page for some time now called Levels of Action. Applied to FAI theory:
Level 1: Directly ending human suffering.
Level 2: Constructing an AGI capable of ending human suffering for us.
Level 3: Working on the computer science aspects of AGI theory.
Level 4: Researching FAI theory, which constrains the Level 3 AGI theory.
But for that high-level basic research to have any utility, these levels must be connected to each other: there must be a firm chain where FAI theory informs AGI designs, which are actually used in the construction of an AGI tasked with ending human suffering in a friendly way.
From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!
That makes a certain amount of intuitive sense, having stages laid out end-to-end in chronological order. However as a trained project manager I must tell you this is a recipe for disaster! The problem is that the design space branches out at each link, but without the feedback of follow-on steps, inefficient decision making will occur at earlier stages. The space of working FAI theories is much, much larger than the FAI-theory-space which results in practical AGI designs which can be implemented prior to the UFAI competition and are suitable for addressing real-world issues of human suffering as quickly as possible.
Some examples from the comparably large programs of the Manhattan project and Apollo moonshot are appropriate, if you’ll forgive the length (skip to the end for a conclusion):
The Manhattan project had one driving goal: drop a bomb on Berlin and Tokyo before the GIs arrived, hopefully ending the war early. (Of course Germany surrendered before the bomb was finished, and Tokyo ended up so devastated by conventional firebombing that Hiroshima and Nagasaki were selected instead, but the original goal is what matters here.) The location of the targets meant that the bomb had to be small enough to fit in a conventional long-distance bomber, and the timeline meant that the simpler but less efficient U-235 designs were preferred. A program was designed, adequate resources allocated, and the goal achieved on time.
On the other hand it is easy to imagine how differently things might have gone if the strategy was reversed; if instead the US military decided to institute a basic research program into nuclear physics and atomic structure, before deciding on the optimal bomb reactions, then doing detailed bomb design before creating the industry necessary to produce enough material for a working weapon. Just looking at the first stage, there is nothing a priori which makes it obvious that U-235 and Pu-239 are the “interesting” nuclear fuels to focus on. Thorium, for example, was more naturally abundant and already being extracted as a by product of rare earth metal extraction, its reactions generate less lethal radiation and long-lasting waste products, and does generate U-233 which could be used in a nuclear bomb. However the straight-forward military and engineering requirements of making a bomb on schedule, and successfully delivering it on target favored U-235 and Pu-239 based weapon designs, which focused focused the efforts of the physicists involved on those fuel pathways. The rest is history.
The Apollo moonshot is another great example. NASA had a single driving goal: deliver a man to the moon before 1970, and return him safely to Earth. There’s a lot of decisions that were made in the first few years driven simply by time and resources available: e.g. heavy-lift vs orbital assembly, direct return vs lunar rendezvous, expendable vs. reuse, staging vs. fuel depots. Ask Wernher von Braun what he imagined an ideal moon mission would look like, and you would have gotten something very different than Apollo. But with Apollo NASA made the right tradeoffs with respect to schedule constraints and programmatic risk.
The follow-on projects of Shuttle and Station are a completely different story, however. They were designed with no articulated long-term strategy, which meant they tried to be everything to everybody and as a result were useful to no one. Meanwhile the basic research being carried out at NASA has little, if anything to do with the long-term goals of sending humans to Mars. There’s an entire division, the Space Biosciences group, which does research on Station about the long-term effects of microgravity and radiation on humans, supposedly to enable a long-duration voyage to Mars. Never mind that the microgravity issue is trivially solved by spinning the spacecraft with nothing more than a strong steel rope as a tether, and the radiation issue is sufficiently mitigated by having a storm shelter en route and throwing a couple of Martian sandbags on the roof once you get there.
There’s an apocryphal story about the US government spending millions of dollars to develop the “Space Pen”—a ballpoint pen with ink under pressure to enable writing in microgravity environments. Much later at some conference an engineer in that program meets his Soviet counterpart and asks how they solved that difficult problem. The cosmonauts used a pencil.
Sadly the story is not true—the “Space Pen” was a successful marketing ploy by inventor Paul Fisher without any ties to NASA, although it was used by NASA and the Russians on later missions—but it does serve to illustrate the point very succinctly. I worry that MIRI is spending its days coming up with space pens when a pencil would have done just fine.
Let me provide some practical advice. If I were running MIRI, I would still employ mathematicians working on the hail-Mary of a complete FAI theory—avoiding the Löbian obstacle etc. -- and run the very successful workshops, though maybe just two a year. But beyond that I would spend all remaining resources on a pragmatic AGI design programme:
1) Have a series of workshops with AGI people to do a review of possible AI-influenced strategies for a positive singulatiry—top-down FAI, seed AI to FAI, Oracle AI to FAI, Oracle AI to human augmentation, teaching a UFAI morals in a nursery environment, etc.
2) Have a series of workshops, again with AGI people to review tactics: possible AGI architectures & the minimal seed AI for each architecture, probabilistically reliable boxing setups, programmatic security, etc.
Then use the output of these workshops—including reliable constraints on timelines—to drive most of the research done by MIRI. For example, I anticipate that reliable unfriendly Oracle AI setups will require probabilistically auditable computation, which itself will require a strongly typed, purely functional virtual machine layer from which computation traces can be extracted and meaningfully analyzed in isolation. This is the sort of research MIRI could sponsor a grad student or Ph.d postdoc to perform.
BTW, other gripe: I have yet to see adequate arguments for the “can we realistically avoid having to do this?” from MIRI which aren’t strawman arguments.
While I don’t know much about your AGi expertise, I agree that MIRI is missing an experienced top-level executive who knows how to structure, implement and risk-mitigate an ambitious project like FAI and has a track record to prove it. Such a person would help prevent flailing about and wasting time and resources. I am not sure what other projects are in this reference class and whether MIRI can find and hire a person like that, so maybe they are doing what they can with the meager budget they’ve got. Do you think that the Manhattan project and the Space Shuttle are in the ballpark of the FAI? My guess is that they don’t even come close in terms of ambition, risk, effort or complexity.
Project managers are typically expensive because they are senior people before they enter management. Someone who has never actually worked at the bottom rung of the ladder is often quite useless in a project management role. But that’s not to say that you can’t find someone young who has done a short stint at the bottom, got PMP certified (or whatever), and has 1-2 projects under their belt. It wouldn’t be cheap, but not horribly expensive either.
On the other hand, Luke seems pretty on the ball with respect to administrative stuff. It may be sufficient to get him some project manager training and some very senior project management advisers.
Neither one of these would be a long-term adequate solution. You need very senior, very experienced project management people in order to tackle something as large as FAI, and stay on schedule and on budget. But in terms of just making sure the organization is focused on the right issues, either of the above would be a drastic improvement, and enough for now.
60 years ago, maybe. However these days advances in cognitive science, narrow AI, and computational tools are advancing at rapid paces on their own. The problem for MIRI should be that of ensuring a positive singularity via careful leverage of the machine intelligence already being developed for other purposes. That’s a much smaller project, and something I think a small but adequately funded organization should be able to pull off.
Yes, dear, some of us are programmers, we know about waterfalls. Our approach is more like, “Attack the most promising problems that present themselves, at every point; don’t actually build things which you don’t yet know how to make not destroy the world, at any point.” Right now this means working on unbounded problems because there are no bounded problems which seem more relevant and more on the critical path. If at any point we can build something to test ideas, of course we will; unless our state of ignorance is such that we can’t test that particular idea without risking destroying the world, in which case we won’t, but if you’re really setting out to test ideas you can probably figure out some other way to test them, except for very rare highly global theses like “The intelligence explosion continues past the human level.” More local theses should be testable.
See also Ch. 22 from HPMOR, and keep in mind that I am not Harry, I contain Harry, all the other characters, their whole universe, and everything that happens inside it. In other words, I am not Harry, I am the universe that responded to Harry.
I’ll have to review Ch. 22 later as it is quite long.
If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.
For example, take oracle AI. The sequences do a good job of showing how a black box AI can’t be safely boxed, nor can any of its recommendations be trusted. But those arguments don’t generalize to when we can see and understand the inner workings of the AI. Yes engineering challenges apply as you can’t demand a computational trace of the entire returned result, as that would require an even more powerful AI to analyze, and then it’d be turtles all the way down. However you can do something like the Fiat-Shamir transform for selecting branches of the computational trace to audit. In essence, use the cryptographic hash of the result in order to choose which traces of the audit log to reveal. This allows the audit log to be only a tiny, tiny slice of the entire computation, yet it can be shown that faking such an audit log is computationally infeasible, meaning that it requires a large multiple more computation than went into the original result, which means it would be energetically detectable.
Of course you would also have to have the AI be written for a virtual machine which is pure, strongly typed and directly supports the native abstraction primitives of the AGI design (e.g. if it were OpenCog, the virtual machine would exist at the abstraction level of the atomspace), and have a computational stack underneath that which prevents VM breakout, and other protective measures. But these are surmountable engineering challenges, the difficulty of which can be realistically quantified.
So how much more or less difficult would it be to accomplish such an untrusted oracle AI vs the stable self-modifying agent and value-loading approach? Which one is more likely to occur before the “competition”?
I’m not demanding a full waterfall project plan, but even agile requires convincing arguments about critical paths and relative priorities. I for one am not convinced.
Well that makes three of us...
Badass boasting from fictional evidence?
If anyone here knew anything about the Waterfall Model, they’d know it was only ever proposed sarcastically, as a perfect example of how real engineering projects never work. “Agile” is pretty goddamn fake, too. There’s no replacement for actually using your mind to reason about what project-planning steps have the greatest expected value at any given time, and to account for unknown unknowns (ie: debugging, other obstacles) as well.
Yes, and I used it in that context: “We know about waterfalls” = “We know not to do waterfalls, so you don’t need to tell us that”. Thank you for that very charitable interpretation of my words.
Well, when you start off a sentence with “Yes, dear”, the dripping sarcasm can be read multiple ways, none of them very useful or nice.
Whatever. No point fighting over tone given shared goals.
Do we need to do this = wild guess.
The whole things a Drake Equation
Ok, let me finally get around to answering this.
FAI has definite subproblems. It is not a matter of scratching away at a chalkboard hoping to make some breakthrough in “philosophy” or some other proto-sensical field that will Elucidate Everything and make the problem solvable at all. FAI, right now, is a matter of setting researchers to work on one subproblem after another until they are all solved.
In fact, when I do literature searches for FAI/AGI material, I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about. This is my view: there is low-hanging fruit in applying existing academic knowledge to FAI problems. Where such low-hanging fruit does not exist, the major open problems can largely be addressed by recourse to higher-hanging fruit within mathematics, or even to empirical science.
Since you believe it’s all so wide-open, I’d like to know what you think of as “the FAI problem”.
If you have an Oracle AI you can trust, you can use it to solve FAI problems for you. This is a fine approach.
Luckily, we don’t need to dick around.
That’s a large portion of the FAI problem right there.
EDIT: To clarify, by this I don’t mean to imply that FAI is easy, but that (trustworthy) Oracle AI is hard.
In-context, what was meant by “Oracle AI” is a very general learning algorithm with some debug output, but no actual decision-theory or utility function whatsoever built in. That would be safe, since it has no capability or desire to do anything.
You have to give it a set of directed goals and a utility function which favors achieving those goals, in order for the oracle AI to be of any use.
Why? How are you structuring your Oracle AI? This sounds like philosophical speculation, not algorithmic knowledge.
Ok, but a system like you’ve described isn’t likely to think about what you want it to think about or produce output that’s actually useful to you either.
Well yes. That’s sort of the problem with building one. Utility functions are certainly useful for specifying where logical uncertainty should be reduced.
Well, ok, but if you agree with this then I don’t see how you can claim that such a system would be particularly useful for solving FAI problems.
Well, I don’t know about the precise construction that would be used. Certainly I could see a human being deliberately focusing the system on some things rather than others.
All existing learning algorithms I know of, and I dare say all that exist, have at least an utility function, and also something that could be interpreted as a decision theory. Consider for example support vector machines, which explicitly try to maximize a margin (that would be the utility function), and any algorithm for computing SVMs can be interpreted as a decision theory. Similar considerations hold for neural networks, genetic algorithms, and even the minimax algorithm.
Thus, I strongly doubt that the notion of a learning algorithm with no utility function makes any sense.
Those are optimization criteria, but they are not decision algorithms in the sense that we usually talk about them in AI. A support vector machine is just finding the extrema of a cost function via its derivative, not planning a sequence of actions.
The most popular algorithm for SVMs does plan a sequence of actions, complete with heuristics as to which action to take. True, the “actions” are internal : they are changes to some data structure within the computer’s memory, rather than changes to the external world. But that is not so different from e.g. a chess AI, which assigns some heuristic score to chess positions and attempts to maximize it using a decision algorithm (to decide which move to make), even though the chessboard is just a data structure within the computer memory.
“Internal” to the “agent” is very different from having an external output to a computational system outside the “agent”. “Actions” that come from an extremely limited, non-Turing-complete “vocabulary” (really: programming language or computational calculus (those two are identical)) are also categorically different from a Turing complete calculus of possible actions.
The same distinction applies for hypothesis class that the learner can learn: if it’s not Turing complete (or some approximation thereof, like a total calculus with coinductive types and corecursive programs), then it is categorically not general learning or general decision-making.
This is why we all employ primitive classifiers every day without danger, and you need something like Solomonoff’s algorithmic probability in order to build AGI.
I agree, of course, that none of the examples I gave (“primitive classifiers”) are dangerous. Indeed, the “plans” they are capable of considering are too simple to pose any threat (they are, as you say, not Turing complete).
But, that doesn’t seem to relevant to the argument at all. You claimed
You claimed that a general learning algorithm without decision-theory or utility function is possible. I pointed out that all (harmless) practical learning algorithms we know of do in fact have decision theories and utility functions. What would “a learning algorithm without decision-theory or utility function, something that has no desire to do anything” even look like? Does the concept even make sense? Eliezer writes here
/facepalm
There is in fact such a thing as a null output. There is in fact such a thing as a learner with a sub-Turing hypothesis class. Such a learner with such a primitive output as “in the class” or “not in the class” does not engage in world optimization, that is: its actions do not, to its own knowledge, skew any probability distribution over future states of any portion of the world outside itself.
It does not narrow the future.
Now, what we’ve been proposing as an Oracle is even less capable. It would truly have no outputs whatsoever, only input and a debug view. It would, by definition, be incapable of narrowing the future of anything, even its own internal states.
Perhaps I have misused terminology, but that is what I was referring to: inability to narrow the outer world’s future.
This thing you are proposing, an “oracle” that is incapable of modeling itself and incapable of modeling its environment (either would require turing-complete hypotheses), what could it possibly be useful for? What could it do that today’s narrow AI can’t?
A) It wasn’t my proposal.
B) The proposed software could model the outer environment, but not act on it.
Physics is turing-complete, so no, a learner that did not consider turing complete hypotheses could not model the outer environment.
You seem to have lost the thread of the conversation. The proposal was to build a learner that can model the environment using Turing-complete models, but which has no power to make decisions or take actions. This would be a Solomonoff Inducer approximation, not an AIXI approximation.
You said
which led me to think you were talking about an oracle even less capable than a learner with a sub-Turing hypothesis class.
If the hypotheses it considers are turing-complete, then, given enough information (and someone would give it enough information, otherwise they couldn’t do anything useful with it), it could model itself, its environment, the relation between its internal states and what shows up on the debug view, and the reactions of its operators on the information they learn from that debug view. Its (internal) actions very much would, to its own knowledge, skew the probability distribution over future states of the outer world.
Name three. FAI contains a number of counterintuitive difficulties and it’s unlikely for someone to do FAI work successfully by accident. On the other hand, someone with a fuzzier model believing that a paper they found sure sounds relevant, why isn’t MIRI citing it, is far more probable from my perspective and prior.
I wouldn’t say that there’s someone out there directly solving FAI problems without having explicitly intended to do so. I would say there’s a lot we can build on.
Keep in mind, I’ve seen enough of a sample of Eld Science being stupid to understand how you can have a very low prior on Eld Science figuring out anything relevant. But lacking more problem guides from you on the delta between plain AI problems and FAI problems, we go on what we can.
One paper on utility learning that relies on a supervised-learning methodology (pairwise comparison data) rather than a de-facto reinforcement learning methodology (which can and will go wrong in well-known ways when put into AGI). One paper on progress towards induction algorithms that operate at multiple levels of abstraction, which could be useful for naturalized induction if someone put more thought and expertise into it.
That’s only two, but I’m a comparative beginner at this stuff and Eld Science isn’t very good at focusing on our problems, so I expect that there’s actually more to discover and I’m just limited by lack of time and knowledge to do the literature searches.
By the way, I’m already trying to follow the semi-official MIRI curriculum, but if you could actually write out some material on the specific deltas where FAI work departs from the preexisting knowledge-base of academic science, that would be really helpful.
Define doing FAI work successfully....
1) Designing a program capable of arbitrary self-modification, yet maintaining guarantees of “correct” behavior according to a goal set that is by necessity included in the modifications as well.
2) Designing such a high level set of goals which ensure “friendliness”.
Designing, not evolving?
That seems a circular argument. How do you use a self-modifying evolutionary search to find a program whose properties remain stable under self-modifying evolutionary search? Unless you started with the right answer, the search AI would quickly rewrite or reinterpret its own driving goals in a non-friendly way, and who knows what you’d end up with.
I don’t see why the search algorithm would need to be self modifying.
I don’t see why you would be searching for stability as opposed to friendliNess. Human testers can judge friendliness directly.
It’s how you draw your system box. Evolutionary search is equivalent to a self-modifying program, if you think of the whole search process as the program. The same issues apply.
I think the sequences do a good job at demolishing the idea that human testers can possibly judge friendliness directly, so long as the AI operates as a black box. If you have a debug view into the operation of the AI that is a different story, but then you don’t need friendliness anyway.
If I draw a box around the selection algorithm and find there is nothing self modifying inside …where’s the circularity?
(1) is naturalized induction, logical uncertainty, and getting around the Loebian Obstacle.
(2) is the cognitive science of evaluative judgements.
Great, you’ve got names for answers you are looking for. That doesn’t mean the answers are any easier to find. You’ve attached a label to the declarative statement which specifies the requirements a solution must meet, but that doesn’t make the search for a solution suddenly have a fixed timeline. It’s uncertain research: it might take 5 years, 10 years, or 50 years, and throwing more people at the problem won’t necessarily make the project go any faster.
And how is trying to build a safe Oracle AI that can solve FAI problems for us not basic research? Or, to make a better statement: how is trying to build an Unfriendly superintelligent paperclip maximizer not basic research, at today’s research frontier?
Logical uncertainty, for example, is a plain, old-fashioned AI problem. We need it for FAI, we’re pretty sure, but it’s turning out UFAI might need it, too.
“Basic research is performed without thought of practical ends.”
“Applied research is systematic study to gain knowledge or understanding necessary to determine the means by which a recognized and specific need may be met.”
-National Science Foundation.
We need to be doing applied research, not basic research. What MIRI should do is construct a complete roadmap to FAI, or better: a study exhaustively listing strategies for achieving a positive singularity, and tactics for achieving friendly or unfriendly AGI, and concluding with a small set of most-likely scenarios. MIRI should then have identified risk factors which affect either the friendliness of the AGI in each scenario, or the capability of the UFAI to do damage (in boxing setups). These risk factors should be prioritized based on how much it is expected knowing more about each would bias the outcome in a positive direction, and it should be these problems as the topics of MIRI workshops.
Instead MIRI is performing basic research. It’s basic research not because it is useless, but because we are not certain at this point in time what relative utility it will have. And if we don’t have a grasp on expected utility, how can we prioritize? There’s a hundred avenues of research which are important to varying degrees to the FAI project. I worked for a number of years at NASA-Ames Research Center, and in the same building as me was the Space Biosciences Division. Great people, don’t get me wrong, and for decades they have funded really cool research on the effects of microgravity and radiation on living organisms, with the justification that such effects and counter-measures need to be known for long duration space voyages, e.g. a 2-year mission to Mars. Never mind that the microgravity issue is trivially solved with a few thousand dollar steel tether connecting the upper stage to the space craft as they spin to create artificial gravity, and the radiation exposure is mitigated by having a storm shelter in the craft and throwing a couple of Martian sandbags on the roof once you get there. It’s spending millions of dollars to develop the pressurized-ink “Space Pen”, when the humble pencil would have done just fine.
Sadly I think MIRI is doing the same thing, and it is represented in one part of your post I take huge issue with:
If we’re only “pretty sure” it’s needed for FAI, if we can’t quantify exactly what its contribution will be, and how important that contribution is relative to other possible things to be working on.. then we have some meta-level planning to do first. Unfortunately I don’t see MIRI doing any planning like this (or if they are, it’s not public).
Are you on the “Open Problems in Friendly AI” Facebook group? Because much of the planning is on there.
Logical uncertainty lets us put probabilities to sentences in logics. This, supposedly, can help get us around the Loebian Obstacle to proving self-referencing statements and thus generating stable self-improvement in an agent. Logical uncertainty also allows for making techniques like Updateless Decision Theory into real algorithms, and this too is an AI problem: turning planning into inference.
The cognitive stuff about human preferences is the Big Scary Hard Problem of FAI, but utility learning (as Stuart Armstrong has been posting about lately) is a way around that.
If you can create a stably self-improving agent that will learn its utility function from human data, equipped with a decision theory capable of handling both causative games and Timeless situations correctly… then congratulations, you’ve got a working plan for a Friendly AI and you can start considering the expected utility of actually building it (at least, to my limited knowledge).
Around here you should usually clarify whether your uncertainty is logical or indexical ;-).
Or.. you could use a boxed oracle AI to develop singularity technologies for human augmentation, or other mechanisms to keep moral humans in the loop through the whole process, and sidestep the whole issue of FAI and value loading in the first place.
Which approach do you think can be completed earlier with similar probabilities of success? What data did you use to evaluate that, and how certain are you of its accuracy and completeness?
I actually really do think that de novo AI is easier than human intelligence augmentation. We have good cognitive theories for how an agent is supposed to work (including “ideal learner” models of human cognitive algorithms). We do not have very good theories of in-vitro neuroengineering.
Yes, but those details would be handled by the post-”FOOM” boxed AI. You get to greatly discount their difficulty.
This assumes that you have usable, safe Oracle AI which then takes up your chosen line of FAI or neuroengineering problems for you. You are conditioning the hard part on solving the hard part.
You don’t need to solve philosophy to solve FAI, but philosophy is relevant to figuring out, in broad terms, the relative livelihoods of various problems and solutions.
I’m not arguing that AI will necessary be safe. I am arguing that the failure modes in’vestigated by MIRI aren’t likely. It is worthwhile to research effectivev off switches. It is not worthwhile to endlessly refer to a dangerous AI of a kind no one with a smidgeon of sense would build.
Bzzzt. Wrong. You still haven’t explained how to create an agent that will faithfully implement my verbal instruction to bring me a pizza. You have a valid case in the sense of pointing out that there can easily exist a “middle ground” between the Superintelligent Artificial Ethicist (Friendly AI in its fullest sense), the Superintelligent Paper Clipper (a perverse, somewhat unlikely malprogramming of a real superintelligence), and the Reward-Button Addicted Reinforcement Learner (the easiest unfriendly AI to actually build). What you haven’t shown is how to actually get around the Addicted Reinforcement Learner and the paper-clipper and actually build an agent that can be sent out for pizza without breaking down at all.
Your current answers seem to be, roughly, “We get around the problem by expecting future AI scientists to solve it for us.” However, we are the AI scientists: if we don’t figure out how to make AI deliver pizza on command, who will?
You keep misreading me. I am not claiming that to gave a solution. I am claiming that MIRI is overly pessimistic about the problem, and offering an over engineered solution. Inasmuch ad you say there is a middle ground, you kind if agree.
The thing is, MIRI doesn’t claim that a superintelligent world-destroying paperclipper is the most likely scenario. It’s just illustrative of why we have an actual problem: because you don’t need malice to create an Unfriendly AI that completely fucks everything up.
To make reliable predictions, more realistic examples are needed.
So how did you like CATE, over in that other thread? That AI is non-super-human, doesn’t go FOOM, doesn’t acquire nanotechnology, can’t do anything a human upload couldn’t do… and still can cause quite a lot of damage simply because it’s more dedicated than we are, suffers fewer cognitive flaws than us, has more self-knowledge than us, and has no need for rest or food.
I mean, come on: what if a non-FOOMed but Unfriendly AI becomes as rich as Bill Gates? After all, if Bill Gates did it while human, than surely an AI as smart as Bill Gates but without his humanity can do the same thing, while causing a bunch more damage to human values because it simply does not feel Gates’ charitable inclinations.