The recommended order for the papers seems really useful.
Thanks! :-D Let me know if you want any tips/advice if and when you start on another read-through.
The old course list mentioned many more courses … Is this change mainly due to the different aims of the guides, or does it reflect an opinion in MIRI that those areas are not more likely to be useful than what a potential researcher would have studied otherwise?
Mostly different aims of the guides. I think Louie’s criterion was “subjects that seem useful or somewhat relevant to FAI research,” and was developed before MIRI pivoted towards examining the technical questions.
My criterion is “prerequisites that are directly necessary to learning and understanding our active technical research,” which is a narrower target.
(esp. there is no AI book mentioned)
This is representative of the difference—it’s quite nice to know what modern AI can do, but that doesn’t have too much relevance to the current open technical FAI problems, which are more geared towards things like putting foundations under fields where it seems possible to get “good enough to run but not good enough to be safe” heuristics. Knowing how MDPs work is useful, but it isn’t really necessary to understand our active research.
I also notice that within the subfields of Logic, Model Theory seems to be replaced by Type Theory.
Not really. Rather, that particular Model Theory textbook was rather brutal, and you only need the first two or so chapters to understand our Tiling Agents research, and it’s much easier to pick up that knowledge using an “intro to logic” textbook. The “model theory” section is still quite important, though!
if you’re interested in Type Theory in the foundational sense the Homotopy Type Theory book is probably more exciting
It may be more exciting, but the HoTT book has a bad habit of sending people down the homotopy rabbit hole. People with CS backgrounds will probably find it easier to pick up other type theories. (In fact, Church’s “simple type theory” paper may be enough instead of an entire textbook… maybe I’ll update the suggestions.)
But yeah, HoTT certainly is pretty exciting these days, and the HoTT book is a fine substitute for the one in the guide :-)
It may be more exciting, but the HoTT book has a bad habit of sending people down the homotopy rabbit hole. People with CS backgrounds will probably find it easier to pick up other type theories. (In fact, Church’s “simple type theory” paper may be enough instead of an entire textbook… maybe I’ll update the suggestions.)
Yeah, it could quite easily sidetrack people. But simple type theory, simply wouldn’t do for foundations since you can’t do much mathematics without quantifiers, or dependent types in the case of type theory. Further, IMHO, the univalence axiom is the largest selling point of type theory as foundations. Perhaps a reading guide to the relevant bits of the HoTT book would be useful for people?
it’s quite nice to know what modern AI can do, but that doesn’t have too much relevance to the current open technical FAI problems.
So you claim. It is extremely hard to take that claim seriously when the work you are doing is so far removed from practical application and ignorant of the tradeoffs made in actual AI work. Knowing what is possible with real hardware and known algorithms, and being aware of what sort of pragmatic simplifications are made in order to make general mathmatical theories computable is of prime importance. A general theory of FAI is absolutely useless if it requires simplifications—which all real AI programs do—that invalidate its underlying security assumptions. At the very least Artificial Intelligence: A Modern Approach should be on your basis list.
Changing tacks: I’m a basement AGI hacker working on singularity technology in my admitedly less than copious spare time between a real job and my family life. But I am writing code, trying to create an AGI. I am MIRI’s nightmare scenario. Yet I care enough to be clued into what MIRI is doing, and still find your work to be absolutely without value and irrelevant to me. This should be very concerning to you.
I think you may be overestimating the current state of FAI research.
Knowing what is possible with real hardware and known algorithms, and being aware of what sort of pragmatic simplifications are made in order to make general mathmatical theories computable is of prime importance.
Ultimately, yes. But we aren’t anywhere near the stage of “trying to program an FAI.” We’re still mostly at the stage where we identify fields of study (decision theory, logical uncertainty, tiling agents, etc.) in which it seems possible to stumble upon heuristics that are “good enough” to specify an intelligent system before understanding the field formally on a fundamental level. (For example, it seems quite possible to develop an intelligent self-modifying system before understanding “reflective trust” in theory; but it seems unlikely that such a system will be safe.)
Our research primarily focuses on putting foundations under various fields of mathematics, so that we can understand safety in principle before trying to develop safe practical algorithms, and therefore is not much aided by knowledge of the tradeoffs made in actual AI work.
By analogy, consider Shannon putting the foundations under computer chess in 1950, by specifying an algorithm capable of playing perfect chess. This required significant knowledge of computer science and a lot of ingenuity, but it didn’t require 1950′s-era knowledge of game-playing heuristics, nor knowledge of what then-modern programs could or could not do. He was still trying to figure out the idealized case, to put foundations under the field in a world where it was not yet known that a computer could play perfect chess. From that state of confusion, knowledge of 1950′s game-playing heuristics wasn’t all that necessary.
Now, of course, it took half a century to go from Shannon’s idealized algorithm to a practical program that beat Kasparov, and that required intensive knowledge of modern practical algorithms and the relevant tradeoffs. Similarly, I do expect that we’ll get to a point in FAI research where knowledge of modern AI algorithms and the tradeoffs involved is crucial.
However, we just aren’t there yet.
It may look like all the hard work goes into the space between 1950 and Deep Blue, but in fact, the sort of foundational work done by Shannon was very important. If you tried to write a chess program capable of beating human champions before anyone understood an algorithm that would play perfect chess in the ideal case, you’d probably have a bad time.
We’re still trying to do the FAI-equivalent of designing an impractical algorithm which plays perfect chess, to put foundations under what it means to ask for safety from a system. Otherwise, “friendliness” is just this amorphous blob with good affect attached: I don’t think it’s very useful to even talk about “friendly practical systems” before first developing a formal understanding of what we’re asking for. As such, knowledge of modern AI heuristics just isn’t directly important to understanding our current research.
That said, I will again note that knowledge of modern AI capabilities is net positive, and can be an asset even when doing our brand of foundational research. I appreciate your suggestion, and I may well add “AI: A modern approach” to the “other tools” section. However, it isn’t currently a prereq for understanding or contributing to our active research—FAI as a field is just that far behind.
And I think you are underestimating what we know about AGI and friendliness (maybe not MIRI’s conception of it).
Regardless, one of the things you would learn from the first few chapters of AI: A Modern Approach is the principles of search, of particular note that algorithms which combine forwards search (start to goal) with backwards search (goal to start) perform best. A bidirectional search which looks from both the starting point and the ending point and meets in the middle achieves sqrt reduction in time compared to either unidirectional case (2b^k/2 vs b^k). You are advocating starting from first principles and searching ideaspace for a workable FAI design (forwards search). You should also consider simultaneously searching through the goal space of possible AGI designs for progressively more friendly architectures (backwards search, approximately), with progress in each search informing the other—e.g. theoretical exploration guided in the direction of implementable designs, and only considering practical designs which are formally analyzable.
If the sqrt reduction seems not obvious, one way to prime your intuition is to observe that the space of implementable FAI designs is much smaller than the space of theoretical FAI designs, and that by ignoring practical concerns you are aiming for the larger, less constrained set, and possibly landing at a region of design space which is disjoint from implementable designs.
EDIT: To give an example, further work on decision theory is pretty much a useless as advanced decision theory is super-exponential in time complexity. It’s the first thing to become a heuristic in real world implementations, and existing decision theory is sufficient for learning new heuristics from an agent’s experience.
Regardless, one of the things you would learn from the first few chapters of AI: A Modern Approach is the principles of search
I have personally studied modern AI theory (not via this specific textbook, but via others), and I happen to know a fair amount about various search algorithms. I’m confused as to why you think that knowledge of search algorithms is important for FAI research, though.
I mean, many fields teach you basic principles that are applicable outside of that field, but this is true of evolutionary biology and physics as well as modern AI.
I don’t deny that some understanding of search algorithms is useful, I’m just confused as to why you think it’s more useful than, say, the shifts in worldview that you could get from a physics textbook.
You are advocating starting from first principles and searching ideaspace for a workable FAI design
Hmm, it appears that I failed to get my point across. We’re not currently searching for workable FAI designs. We’re searching for a formal description of “friendly behavior.” Once we have that, we can start searching for FAI designs. Before we have that, the word “Friendly” doesn’t mean anything specific.
the space of implementable FAI designs is much smaller than the space of theoretical FAI designs
Yes, to be sure! “Implementable FAI designs” compose the bullseye on the much wider “all FAI designs” target. But we’re not at the part yet where we’re creating and aiming the arrows. Rather, we’re still looking for the target!
We don’t know what “Friendly” means, in a formal sense. If we did know what it meant, we would be able to specify an impractical brute force algorithm specifying a system which could take unbounded finite computing power and reliably undergo an intelligence explosion and have a beneficial impact upon humanity; because brute force is powerful. We’re trying to figure out how to write the unbounded solutions not because they’re practical, but because this is how you figure out what “friendly” means in a formal sense.
(Or, in other words, the word “Friendly” is still mysterious, we’re trying to ground it out. The way you do that is by figuring out what you mean given unbounded computing power and an understanding of general intelligence. Once you have that, you can start talking about “FAI designs,” but not before.)
By contrast, we do have an understanding of “intelligence” in this very weak sense, in that we can specify things like AIXI (which act very “intelligently” in our universe given a pocket universe that runs hypercomputers). Clearly, there is a huge gap between the “infinite computer” understanding and understanding sufficient to build practical systems, but we don’t even have the first type of understanding of what it means to ask for a “Friendly” system.
I definitely acknowledge that when you’re searching in FAI design space, it is very important to keep practicality in mind. But we aren’t searching in FAI design space, we’re searching for it.
I’m confused as to why you think that knowledge of search algorithms is important for FAI research, though.
I don’t think he meant to say that “knowledge of search algorithms is important for FAI research”, I think he meant to say “by analogy from search algorithms, you’re going to make progress faster if you research the abstract formal theory and the concrete implementation at the same time, letting progress in one guide work in the other”.
I’m personally sympathetic to your argument, that there’s no point in looking at the concrete implementations before we understand the formal specification in good enough detail to know what to look for in the concrete implementations… but on the other hand, I’m also sympathetic to the argument that if you do not also look at the concrete implementations, you may never hit the formal specifications that are actually correct.
To stretch the chess analogy, even though Shannon didn’t use any 1950s knowledge of game-playing heuristics, he presumably did use something like the knowledge of chess being a two-player game that’s played by the two taking alternating turns in moving different kinds of pieces on a board. If he didn’t have this information to ground his search, and had instead tried to come up with a general formal algorithm for winning in any game (including football, tag, and 20 questions), it seems much less likely that he would have come up with anything useful.
As a more relevant example, consider the discussion about VNM rationality. Suppose that you carry out a long research program focused on understanding how to specify Friendliness in a framework built around VNM rationality, all the while research in practical AI reveals that VNM rationality is a fundamentally flawed approach for looking at decision-making, and discovers a superior framework that’s much more suited for both AI design and Friendliness research. (I don’t expect this to necessarily happen, but I imagine that something like that could happen.) If your work on Friendliness research continues while you remain ignorant of this discovery, you’ll waste time pursuing a direction that can never produce a useful result, even on the level of an “infinite computer” understanding.
To stretch the chess analogy, even though Shannon didn’t use any 1950s knowledge of game-playing heuristics, he presumably did use something like the knowledge of chess being a two-player game that’s played by the two taking alternating turns in moving different kinds of pieces on a board.
I agree, and I think it is important to understand computation, logic, foundations of computer science, etc. in doing FAI research. Trying to do FAI theory with no knowledge of computers is surely a foolish endeavor. My point was more along the lines of “modern AI textbooks mostly contain heuristics and strategies for getting good behavior out of narrow systems, and this doesn’t seem like the appropriate place to get the relevant low-level knowledge.”
To continue abusing the chess analogy, I completely agree that Shannon needed to know things about chess, but I don’t think he needed to understand 1950′s-era programming techniques (such as the formal beginnings of assembler languages and the early attempts to construct compilers). It seems to me that the field of modern AI is less like “understanding chess” and more like “understanding assembly languages” in this particular analogy.
That said, I am not trying to say that this is the only way to approach friendliness research. I currently think that it’s one of the most promising methods, but I certainly won’t discourage anyone who wants to try to do friendliness research from a completely different direction.
The only points I’m trying to make here are that (a) I think MIRI’s approach is fairly promising, and (b) within this approach, an understanding of modern AI is not a prerequisite to understanding our active research.
Are there other approaches to FAI that would make significantly more use of modern narrow AI techniques? Yes, of course. (Nick Hay and Stuart Russell are poking at some of those topics today, and we occasionally get together and chat about them.) Would it be nice if MIRI could take a number of different approaches all at the same time? Yes, of course! But there are currently only three of us. I agree that it would be nice to be in a position where we had enough resources to try many different approaches at once, but it is currently a factual point that, in order to understand our active research, you don’t need much narrow AI knowledge.
Kaj seems to have understood perfectly the point I was making, so I will simply point to his sibling comment. Thank you Kaj.
However your response I think reveals an even deeper disconnect. MIRI claims not to have a theory of friendliness, yet also presupposes what that theory will look like. I’m not sure what definition of friendliness you have in mind, but mine is roughly “the characteristics of an AGI which ensure it helps humanity through the singularity rather than be subsumed by it.” Such a definition would include an oracle AI → intelligence amplification approach, for example. MIRI on the other hand appears to be aiming towards the benevolent god model in exclusion to everything else (“turn it on and walk away”).
I’m not going to try advocating for any particular approach—I’ve done that before to Luke without much success. What I do advocate is that you do the same thing I have done and continue to do: take the broader definition of success (surviving the singularity), look at what is required to achieve that in practice, and do whatever gets us across the finish line the fastest. This is a race, both against UFAI and the inaction which costs the daily suffering of the present human condition.
When I did that analysis, I concluded that the benevolent god approach favored by MIRI has both the longest lead time and the lowest probability of success. Other approaches—whose goal states are well defined—could be achieved in the span of time it takes just to define what a benevolent god friendly AGI would look like.
I’m curious what conclusions you came to after your own assessment, assuming you did one at all.
Hmm, I’m not feeling like you’re giving me any charity here. Comments such as the following all give me an impression that you’re not open to actually engaging with my points:
So you claim.
...
Yet I care enough to be clued into what MIRI is doing, and still find your work to be absolutely without value and irrelevant to me. This should be very concerning to you.
...
one of the things you would learn from the first few chapters of AI: A Modern Approach is the principles of search,
...
further work on decision theory is pretty much a useless
...
assuming you did one at all.
None of these are particularly egregious, but they are all phrased somewhat aggressively, and add up to an impression that you’re mostly just trying to vent. I’m trying to interpret you charitably here, but I don’t feel like that’s being reciprocated, and this lowers my desire to engage with your concerns (mostly by lowering my expectation that you are trying to see my viewpoint).
I also feel a bit like you’re trying to argue against other people’s points through me. For example, I do not see MIRI’s active research as a “benevolent god only” approach, and I personally place low probability on a “turn it on and walk away” scenario.
Other approaches—whose goal states are well defined—could be achieved in the span of time it takes just to define what a benevolent god friendly AGI would look like.
Analogy: let’s say that someone is trying really hard to build a system that takes observations and turns them into a very accurate world-model, and that the fate of humanity rides on the resulting world-model being very accurate. If someone claimed that they had a very good model-building heuristic, while lacking an understanding of information theory and Bayesian reasoning, then I would be quite skeptical of claims like “don’t worry, I’m very sure that it won’t get stuck at the wrong solution.” Until they have a formal understanding of what it means for a model to get stuck, of what it means to use all available information, I would not be confident in their system. (How do you evaluate a heuristic before understanding what the heuristic is intended to approximate?)
Further, it seems to me implausible that they could become very confident in their model-building heuristic without developing a formal understanding of information theory along the way.
For similar reasons, I would be quite skeptical of any system purported to be “safe” by people with no formal understanding of what “safety” meant, and it seems implausible to me that they could become confident in the system’s behavior without first developing a formal understanding of the intended behavior.
My apologies. I have been fruitlessly engaging with SIAI/MIRI representatives longer than you have been involved in the organization, in the hope of seing sponsored work on what I see to far more useful lines of research given the time constraints we are all working with, e.g. work on AI boxing instead of utility functions and tiling agents.
I started by showing how many of the standard arguments extracted from the sequences used in favor of MIRI’s approach of direct FAI are fallacious, or at least presented unconvincingly. This didn’t work out very well for either side; in retrospect I think we mostly talked past each other.
I then argued based on timelines, showing that based on available tech and information and the weak inside view that UFAI could be as close as 5-20 years away, and MIRI’s own timelines did not, and still does not to my knowledge expect practical results in that short a time horizon. The response was a citation to Stuart Armstrong’s paper showing an average expert opinon of AI being 50-70 years away… which was stunning considering the thesis of the paper was about just how bad it is to ask experts about questions like the ETA for human-level AI.
I then asked MIRI to consider hiring a project manager, i.e. a professional whose job it is to keep projects on time and on budget, to help make these decisions in coordinating and guiding research efforts. This suggestion was received about as well as the others.
And then I saw an updated course list from the machine intelligence research institute which dropped from consideration the sixty year history of people actually writing machine intelligences, both the theory and the practice that has accumulated.
So if it seemed a bit like I was trying to argue against other people’s points through you, I’m sorry I guess I was. I was arguing with MIRI, which you now represent.
Regarding your example, I understand what you are saying but I don’t think you are arguing against me. One way of making sure something is safe is making it unable to take actions with irreversible consequences. You don’t need to be confident in a system’s behavior, if that system’s behavior has no effectors on the real world.
I’m all for a trustless AI running as a “physical operating system” for a positive universe. But we have time to figure out how to do that post-singularity.
Thanks—I don’t really want to get into involved arguments about overall strategy on this forum, but I can address each of your points in turn. I’m afraid I only have time to sketch my justifications rather than explain them in detail, as I’m quite pressed for time these days.
I understand that these probably won’t convince you, but I hope to at least demonstrate that I have been giving these sorts of things some thought.
work on AI boxing
My conclusions:
Most types of boxing research would be premature, given how little we know about what early AGI architectures will look like.
That said, there is some boxing work that could be done nowadays. We do touch upon a lot of this stuff (or things in nearby spaces) under the “Corrigibility” banner. (See also Stuart’s “low impact” posts.)
Furthermore, some of our decision theory research does include early work on boxing problems (e.g., how do you build an oracle that does not try to manipulate the programmers into giving it easier questions? Turns out this has a lot to do with how the oracle evaluates its decisions.)
I agree that there is more work that could be done on boxing that would be positive value, but I expect it would be more speculative than, say, the tiling work.
the weak inside view that UFAI could be as close as 5-20 years away
My thoughts: “could” is ambiguous here. What probability do you put on AGI in 5 years? My personal 95% confidence interval is 5 to 150 years (including outside view, model uncertainty, etc) with a mean around 50 years and skewed a bit towards the front, and I am certainly not shooting for a strategy that has us lose 50% of the time, so I agree that we damn well better be on a 20-30 year track.
I then asked MIRI to consider hiring a project manager
I think MIRI made the right choice here. There are only three full-time FAI researchers at MIRI right now, and we’re good at coordinating with each other and holding ourselves to deadlines. A project manager would be drastic overkill.
And then I saw an updated course list from the machine intelligence research institute which dropped from consideration the sixty year history of people actually writing machine intelligences
To be clear, this is no longer a course list, it’s a research guide. The fact of the matter is, modern narrow AI is not necessary to understand our active research. Is it still useful for a greater context? Yes. Is there FAI work that would depend upon modern narrow AI research? Of course! But the subject simply isn’t a prerequisite for understanding our current active research.
I do understand how this omission galled you, though. Apologies for that.
One way of making sure something is safe is making it unable to take actions with irreversible consequences.
I’m not sure what this means or how it’s safe. (I wouldn’t, for example, be comfortable constructing a machine that forcibly wireheads me, just because it believes it can reverse the process.)
I think that there’s something useful in the space of “low impact” / “domesticity,” but suggestions like “just make everything reversible” don’t seem to engage with the difficulty of the problem.
You don’t need to be confident in a system’s behavior, if that system’s behavior has no effectors on the real world.
I also don’t understand what it meas to have “no effectors in the real world.”
There is no such thing as “not effecting the world.” Running a processor has electrical and gravitational effects on everything in the area. Wifi exists. This paper describes how an evolved clock re-purposed the circuits on its motherboard to magnify signals from nearby laptops to use as an oscillator, stunning everyone involved. Weak genetic programming algorithms ended up using hardware in completely unexpected ways to do things the programmers expected was impossible. So yes, we really do need to worry about strong intelligent processes using their hardware in unanticipated ways in order to affect the world.
(Furthermore, anything that interacts with humans is essentially using humans as effectors. There’s no such thing as “not having effectors on the real world.”)
I agree that there’s something interesting in the space of “just build AIs that don’t care about the real world” (for example, constructing an AI which is only trying to affect the platonic output of its turing machine and does not care about its physical implementation), but even this has potential security holes if you look closely. Some of our decision theory work does touch upon this sort of possibility, but there’s definitely other work to be done in this space that we aren’t looking at.
But again, suggestions like “just don’t let it have effectors” fail to engage with the difficulty of the problem.
The “cookbook with practical FAI advice” is in some way an optimization of a general solution. Adding constraints corresponding to limited resources / specific approaches. Making the task harder. Like adding friction to a “calculate where the trains will collide” problem.
It seems like a good idea to have a solution to the general problem, something which provably has the properties we want it to, before we deal with how that something survives the transition to actual AGI implementations.
Skipping this step (the theoretical foundation) would be tantamount to “this trick seems like it could work, for ill-defined values of ‘could’”.
Also, for such a cookbook to be taken seriously, and not just be more speculating, a “this derives from a general safety theorem, see the greek alphabet soup (sterile) in the appendix” would provide a much larger incentive to take the actual guidelines seriously.
In Zen, you first must know how to grow the tree before you alpha-beta prune it.
ETA: There is a case to be made that no general solution can exist (a la Halting Problem) or is practically unattainable or cannot be ‘dumbed down’ to work for actual approaches, and that we therefore must focus on solving only specific problems. We’re not yet at that point though, IME.
Thanks! :-D Let me know if you want any tips/advice if and when you start on another read-through.
Mostly different aims of the guides. I think Louie’s criterion was “subjects that seem useful or somewhat relevant to FAI research,” and was developed before MIRI pivoted towards examining the technical questions.
My criterion is “prerequisites that are directly necessary to learning and understanding our active technical research,” which is a narrower target.
This is representative of the difference—it’s quite nice to know what modern AI can do, but that doesn’t have too much relevance to the current open technical FAI problems, which are more geared towards things like putting foundations under fields where it seems possible to get “good enough to run but not good enough to be safe” heuristics. Knowing how MDPs work is useful, but it isn’t really necessary to understand our active research.
Not really. Rather, that particular Model Theory textbook was rather brutal, and you only need the first two or so chapters to understand our Tiling Agents research, and it’s much easier to pick up that knowledge using an “intro to logic” textbook. The “model theory” section is still quite important, though!
It may be more exciting, but the HoTT book has a bad habit of sending people down the homotopy rabbit hole. People with CS backgrounds will probably find it easier to pick up other type theories. (In fact, Church’s “simple type theory” paper may be enough instead of an entire textbook… maybe I’ll update the suggestions.)
But yeah, HoTT certainly is pretty exciting these days, and the HoTT book is a fine substitute for the one in the guide :-)
Yeah, it could quite easily sidetrack people. But simple type theory, simply wouldn’t do for foundations since you can’t do much mathematics without quantifiers, or dependent types in the case of type theory. Further, IMHO, the univalence axiom is the largest selling point of type theory as foundations. Perhaps a reading guide to the relevant bits of the HoTT book would be useful for people?
So you claim. It is extremely hard to take that claim seriously when the work you are doing is so far removed from practical application and ignorant of the tradeoffs made in actual AI work. Knowing what is possible with real hardware and known algorithms, and being aware of what sort of pragmatic simplifications are made in order to make general mathmatical theories computable is of prime importance. A general theory of FAI is absolutely useless if it requires simplifications—which all real AI programs do—that invalidate its underlying security assumptions. At the very least Artificial Intelligence: A Modern Approach should be on your basis list.
Changing tacks: I’m a basement AGI hacker working on singularity technology in my admitedly less than copious spare time between a real job and my family life. But I am writing code, trying to create an AGI. I am MIRI’s nightmare scenario. Yet I care enough to be clued into what MIRI is doing, and still find your work to be absolutely without value and irrelevant to me. This should be very concerning to you.
I think you may be overestimating the current state of FAI research.
Ultimately, yes. But we aren’t anywhere near the stage of “trying to program an FAI.” We’re still mostly at the stage where we identify fields of study (decision theory, logical uncertainty, tiling agents, etc.) in which it seems possible to stumble upon heuristics that are “good enough” to specify an intelligent system before understanding the field formally on a fundamental level. (For example, it seems quite possible to develop an intelligent self-modifying system before understanding “reflective trust” in theory; but it seems unlikely that such a system will be safe.)
Our research primarily focuses on putting foundations under various fields of mathematics, so that we can understand safety in principle before trying to develop safe practical algorithms, and therefore is not much aided by knowledge of the tradeoffs made in actual AI work.
By analogy, consider Shannon putting the foundations under computer chess in 1950, by specifying an algorithm capable of playing perfect chess. This required significant knowledge of computer science and a lot of ingenuity, but it didn’t require 1950′s-era knowledge of game-playing heuristics, nor knowledge of what then-modern programs could or could not do. He was still trying to figure out the idealized case, to put foundations under the field in a world where it was not yet known that a computer could play perfect chess. From that state of confusion, knowledge of 1950′s game-playing heuristics wasn’t all that necessary.
Now, of course, it took half a century to go from Shannon’s idealized algorithm to a practical program that beat Kasparov, and that required intensive knowledge of modern practical algorithms and the relevant tradeoffs. Similarly, I do expect that we’ll get to a point in FAI research where knowledge of modern AI algorithms and the tradeoffs involved is crucial.
However, we just aren’t there yet.
It may look like all the hard work goes into the space between 1950 and Deep Blue, but in fact, the sort of foundational work done by Shannon was very important. If you tried to write a chess program capable of beating human champions before anyone understood an algorithm that would play perfect chess in the ideal case, you’d probably have a bad time.
We’re still trying to do the FAI-equivalent of designing an impractical algorithm which plays perfect chess, to put foundations under what it means to ask for safety from a system. Otherwise, “friendliness” is just this amorphous blob with good affect attached: I don’t think it’s very useful to even talk about “friendly practical systems” before first developing a formal understanding of what we’re asking for. As such, knowledge of modern AI heuristics just isn’t directly important to understanding our current research.
That said, I will again note that knowledge of modern AI capabilities is net positive, and can be an asset even when doing our brand of foundational research. I appreciate your suggestion, and I may well add “AI: A modern approach” to the “other tools” section. However, it isn’t currently a prereq for understanding or contributing to our active research—FAI as a field is just that far behind.
And I think you are underestimating what we know about AGI and friendliness (maybe not MIRI’s conception of it).
Regardless, one of the things you would learn from the first few chapters of AI: A Modern Approach is the principles of search, of particular note that algorithms which combine forwards search (start to goal) with backwards search (goal to start) perform best. A bidirectional search which looks from both the starting point and the ending point and meets in the middle achieves sqrt reduction in time compared to either unidirectional case (2b^k/2 vs b^k). You are advocating starting from first principles and searching ideaspace for a workable FAI design (forwards search). You should also consider simultaneously searching through the goal space of possible AGI designs for progressively more friendly architectures (backwards search, approximately), with progress in each search informing the other—e.g. theoretical exploration guided in the direction of implementable designs, and only considering practical designs which are formally analyzable.
If the sqrt reduction seems not obvious, one way to prime your intuition is to observe that the space of implementable FAI designs is much smaller than the space of theoretical FAI designs, and that by ignoring practical concerns you are aiming for the larger, less constrained set, and possibly landing at a region of design space which is disjoint from implementable designs.
EDIT: To give an example, further work on decision theory is pretty much a useless as advanced decision theory is super-exponential in time complexity. It’s the first thing to become a heuristic in real world implementations, and existing decision theory is sufficient for learning new heuristics from an agent’s experience.
Where are you even getting this from?
Available options don’t come pre-enumerated or pre-simulated.
I have personally studied modern AI theory (not via this specific textbook, but via others), and I happen to know a fair amount about various search algorithms. I’m confused as to why you think that knowledge of search algorithms is important for FAI research, though.
I mean, many fields teach you basic principles that are applicable outside of that field, but this is true of evolutionary biology and physics as well as modern AI.
I don’t deny that some understanding of search algorithms is useful, I’m just confused as to why you think it’s more useful than, say, the shifts in worldview that you could get from a physics textbook.
Hmm, it appears that I failed to get my point across. We’re not currently searching for workable FAI designs. We’re searching for a formal description of “friendly behavior.” Once we have that, we can start searching for FAI designs. Before we have that, the word “Friendly” doesn’t mean anything specific.
Yes, to be sure! “Implementable FAI designs” compose the bullseye on the much wider “all FAI designs” target. But we’re not at the part yet where we’re creating and aiming the arrows. Rather, we’re still looking for the target!
We don’t know what “Friendly” means, in a formal sense. If we did know what it meant, we would be able to specify an impractical brute force algorithm specifying a system which could take unbounded finite computing power and reliably undergo an intelligence explosion and have a beneficial impact upon humanity; because brute force is powerful. We’re trying to figure out how to write the unbounded solutions not because they’re practical, but because this is how you figure out what “friendly” means in a formal sense.
(Or, in other words, the word “Friendly” is still mysterious, we’re trying to ground it out. The way you do that is by figuring out what you mean given unbounded computing power and an understanding of general intelligence. Once you have that, you can start talking about “FAI designs,” but not before.)
By contrast, we do have an understanding of “intelligence” in this very weak sense, in that we can specify things like AIXI (which act very “intelligently” in our universe given a pocket universe that runs hypercomputers). Clearly, there is a huge gap between the “infinite computer” understanding and understanding sufficient to build practical systems, but we don’t even have the first type of understanding of what it means to ask for a “Friendly” system.
I definitely acknowledge that when you’re searching in FAI design space, it is very important to keep practicality in mind. But we aren’t searching in FAI design space, we’re searching for it.
I don’t think he meant to say that “knowledge of search algorithms is important for FAI research”, I think he meant to say “by analogy from search algorithms, you’re going to make progress faster if you research the abstract formal theory and the concrete implementation at the same time, letting progress in one guide work in the other”.
I’m personally sympathetic to your argument, that there’s no point in looking at the concrete implementations before we understand the formal specification in good enough detail to know what to look for in the concrete implementations… but on the other hand, I’m also sympathetic to the argument that if you do not also look at the concrete implementations, you may never hit the formal specifications that are actually correct.
To stretch the chess analogy, even though Shannon didn’t use any 1950s knowledge of game-playing heuristics, he presumably did use something like the knowledge of chess being a two-player game that’s played by the two taking alternating turns in moving different kinds of pieces on a board. If he didn’t have this information to ground his search, and had instead tried to come up with a general formal algorithm for winning in any game (including football, tag, and 20 questions), it seems much less likely that he would have come up with anything useful.
As a more relevant example, consider the discussion about VNM rationality. Suppose that you carry out a long research program focused on understanding how to specify Friendliness in a framework built around VNM rationality, all the while research in practical AI reveals that VNM rationality is a fundamentally flawed approach for looking at decision-making, and discovers a superior framework that’s much more suited for both AI design and Friendliness research. (I don’t expect this to necessarily happen, but I imagine that something like that could happen.) If your work on Friendliness research continues while you remain ignorant of this discovery, you’ll waste time pursuing a direction that can never produce a useful result, even on the level of an “infinite computer” understanding.
Thanks, Kaj.
I agree, and I think it is important to understand computation, logic, foundations of computer science, etc. in doing FAI research. Trying to do FAI theory with no knowledge of computers is surely a foolish endeavor. My point was more along the lines of “modern AI textbooks mostly contain heuristics and strategies for getting good behavior out of narrow systems, and this doesn’t seem like the appropriate place to get the relevant low-level knowledge.”
To continue abusing the chess analogy, I completely agree that Shannon needed to know things about chess, but I don’t think he needed to understand 1950′s-era programming techniques (such as the formal beginnings of assembler languages and the early attempts to construct compilers). It seems to me that the field of modern AI is less like “understanding chess” and more like “understanding assembly languages” in this particular analogy.
That said, I am not trying to say that this is the only way to approach friendliness research. I currently think that it’s one of the most promising methods, but I certainly won’t discourage anyone who wants to try to do friendliness research from a completely different direction.
The only points I’m trying to make here are that (a) I think MIRI’s approach is fairly promising, and (b) within this approach, an understanding of modern AI is not a prerequisite to understanding our active research.
Are there other approaches to FAI that would make significantly more use of modern narrow AI techniques? Yes, of course. (Nick Hay and Stuart Russell are poking at some of those topics today, and we occasionally get together and chat about them.) Would it be nice if MIRI could take a number of different approaches all at the same time? Yes, of course! But there are currently only three of us. I agree that it would be nice to be in a position where we had enough resources to try many different approaches at once, but it is currently a factual point that, in order to understand our active research, you don’t need much narrow AI knowledge.
Thanks, that’s a good clarification. May be worth explicitly mentioning something like that in the guide, too.
Kaj seems to have understood perfectly the point I was making, so I will simply point to his sibling comment. Thank you Kaj.
However your response I think reveals an even deeper disconnect. MIRI claims not to have a theory of friendliness, yet also presupposes what that theory will look like. I’m not sure what definition of friendliness you have in mind, but mine is roughly “the characteristics of an AGI which ensure it helps humanity through the singularity rather than be subsumed by it.” Such a definition would include an oracle AI → intelligence amplification approach, for example. MIRI on the other hand appears to be aiming towards the benevolent god model in exclusion to everything else (“turn it on and walk away”).
I’m not going to try advocating for any particular approach—I’ve done that before to Luke without much success. What I do advocate is that you do the same thing I have done and continue to do: take the broader definition of success (surviving the singularity), look at what is required to achieve that in practice, and do whatever gets us across the finish line the fastest. This is a race, both against UFAI and the inaction which costs the daily suffering of the present human condition.
When I did that analysis, I concluded that the benevolent god approach favored by MIRI has both the longest lead time and the lowest probability of success. Other approaches—whose goal states are well defined—could be achieved in the span of time it takes just to define what a benevolent god friendly AGI would look like.
I’m curious what conclusions you came to after your own assessment, assuming you did one at all.
Hmm, I’m not feeling like you’re giving me any charity here. Comments such as the following all give me an impression that you’re not open to actually engaging with my points:
...
...
...
...
None of these are particularly egregious, but they are all phrased somewhat aggressively, and add up to an impression that you’re mostly just trying to vent. I’m trying to interpret you charitably here, but I don’t feel like that’s being reciprocated, and this lowers my desire to engage with your concerns (mostly by lowering my expectation that you are trying to see my viewpoint).
I also feel a bit like you’re trying to argue against other people’s points through me. For example, I do not see MIRI’s active research as a “benevolent god only” approach, and I personally place low probability on a “turn it on and walk away” scenario.
Analogy: let’s say that someone is trying really hard to build a system that takes observations and turns them into a very accurate world-model, and that the fate of humanity rides on the resulting world-model being very accurate. If someone claimed that they had a very good model-building heuristic, while lacking an understanding of information theory and Bayesian reasoning, then I would be quite skeptical of claims like “don’t worry, I’m very sure that it won’t get stuck at the wrong solution.” Until they have a formal understanding of what it means for a model to get stuck, of what it means to use all available information, I would not be confident in their system. (How do you evaluate a heuristic before understanding what the heuristic is intended to approximate?)
Further, it seems to me implausible that they could become very confident in their model-building heuristic without developing a formal understanding of information theory along the way.
For similar reasons, I would be quite skeptical of any system purported to be “safe” by people with no formal understanding of what “safety” meant, and it seems implausible to me that they could become confident in the system’s behavior without first developing a formal understanding of the intended behavior.
My apologies. I have been fruitlessly engaging with SIAI/MIRI representatives longer than you have been involved in the organization, in the hope of seing sponsored work on what I see to far more useful lines of research given the time constraints we are all working with, e.g. work on AI boxing instead of utility functions and tiling agents.
I started by showing how many of the standard arguments extracted from the sequences used in favor of MIRI’s approach of direct FAI are fallacious, or at least presented unconvincingly. This didn’t work out very well for either side; in retrospect I think we mostly talked past each other.
I then argued based on timelines, showing that based on available tech and information and the weak inside view that UFAI could be as close as 5-20 years away, and MIRI’s own timelines did not, and still does not to my knowledge expect practical results in that short a time horizon. The response was a citation to Stuart Armstrong’s paper showing an average expert opinon of AI being 50-70 years away… which was stunning considering the thesis of the paper was about just how bad it is to ask experts about questions like the ETA for human-level AI.
I then asked MIRI to consider hiring a project manager, i.e. a professional whose job it is to keep projects on time and on budget, to help make these decisions in coordinating and guiding research efforts. This suggestion was received about as well as the others.
And then I saw an updated course list from the machine intelligence research institute which dropped from consideration the sixty year history of people actually writing machine intelligences, both the theory and the practice that has accumulated.
So if it seemed a bit like I was trying to argue against other people’s points through you, I’m sorry I guess I was. I was arguing with MIRI, which you now represent.
Regarding your example, I understand what you are saying but I don’t think you are arguing against me. One way of making sure something is safe is making it unable to take actions with irreversible consequences. You don’t need to be confident in a system’s behavior, if that system’s behavior has no effectors on the real world.
I’m all for a trustless AI running as a “physical operating system” for a positive universe. But we have time to figure out how to do that post-singularity.
Thanks—I don’t really want to get into involved arguments about overall strategy on this forum, but I can address each of your points in turn. I’m afraid I only have time to sketch my justifications rather than explain them in detail, as I’m quite pressed for time these days.
I understand that these probably won’t convince you, but I hope to at least demonstrate that I have been giving these sorts of things some thought.
My conclusions:
Most types of boxing research would be premature, given how little we know about what early AGI architectures will look like.
That said, there is some boxing work that could be done nowadays. We do touch upon a lot of this stuff (or things in nearby spaces) under the “Corrigibility” banner. (See also Stuart’s “low impact” posts.)
Furthermore, some of our decision theory research does include early work on boxing problems (e.g., how do you build an oracle that does not try to manipulate the programmers into giving it easier questions? Turns out this has a lot to do with how the oracle evaluates its decisions.)
I agree that there is more work that could be done on boxing that would be positive value, but I expect it would be more speculative than, say, the tiling work.
My thoughts: “could” is ambiguous here. What probability do you put on AGI in 5 years? My personal 95% confidence interval is 5 to 150 years (including outside view, model uncertainty, etc) with a mean around 50 years and skewed a bit towards the front, and I am certainly not shooting for a strategy that has us lose 50% of the time, so I agree that we damn well better be on a 20-30 year track.
I think MIRI made the right choice here. There are only three full-time FAI researchers at MIRI right now, and we’re good at coordinating with each other and holding ourselves to deadlines. A project manager would be drastic overkill.
To be clear, this is no longer a course list, it’s a research guide. The fact of the matter is, modern narrow AI is not necessary to understand our active research. Is it still useful for a greater context? Yes. Is there FAI work that would depend upon modern narrow AI research? Of course! But the subject simply isn’t a prerequisite for understanding our current active research.
I do understand how this omission galled you, though. Apologies for that.
I’m not sure what this means or how it’s safe. (I wouldn’t, for example, be comfortable constructing a machine that forcibly wireheads me, just because it believes it can reverse the process.)
I think that there’s something useful in the space of “low impact” / “domesticity,” but suggestions like “just make everything reversible” don’t seem to engage with the difficulty of the problem.
I also don’t understand what it meas to have “no effectors in the real world.”
There is no such thing as “not effecting the world.” Running a processor has electrical and gravitational effects on everything in the area. Wifi exists. This paper describes how an evolved clock re-purposed the circuits on its motherboard to magnify signals from nearby laptops to use as an oscillator, stunning everyone involved. Weak genetic programming algorithms ended up using hardware in completely unexpected ways to do things the programmers expected was impossible. So yes, we really do need to worry about strong intelligent processes using their hardware in unanticipated ways in order to affect the world.
See also the power of intelligence.
(Furthermore, anything that interacts with humans is essentially using humans as effectors. There’s no such thing as “not having effectors on the real world.”)
I agree that there’s something interesting in the space of “just build AIs that don’t care about the real world” (for example, constructing an AI which is only trying to affect the platonic output of its turing machine and does not care about its physical implementation), but even this has potential security holes if you look closely. Some of our decision theory work does touch upon this sort of possibility, but there’s definitely other work to be done in this space that we aren’t looking at.
But again, suggestions like “just don’t let it have effectors” fail to engage with the difficulty of the problem.
The “cookbook with practical FAI advice” is in some way an optimization of a general solution. Adding constraints corresponding to limited resources / specific approaches. Making the task harder. Like adding friction to a “calculate where the trains will collide” problem.
It seems like a good idea to have a solution to the general problem, something which provably has the properties we want it to, before we deal with how that something survives the transition to actual AGI implementations.
Skipping this step (the theoretical foundation) would be tantamount to “this trick seems like it could work, for ill-defined values of ‘could’”.
Also, for such a cookbook to be taken seriously, and not just be more speculating, a “this derives from a general safety theorem, see the greek alphabet soup (sterile) in the appendix” would provide a much larger incentive to take the actual guidelines seriously.
In Zen, you first must know how to grow the tree before you alpha-beta prune it.
ETA: There is a case to be made that no general solution can exist (a la Halting Problem) or is practically unattainable or cannot be ‘dumbed down’ to work for actual approaches, and that we therefore must focus on solving only specific problems. We’re not yet at that point though, IME.
See my later reply to Nate here:
http://lesswrong.com/lw/l7o/miri_research_guide/bl0n