Regardless, one of the things you would learn from the first few chapters of AI: A Modern Approach is the principles of search
I have personally studied modern AI theory (not via this specific textbook, but via others), and I happen to know a fair amount about various search algorithms. I’m confused as to why you think that knowledge of search algorithms is important for FAI research, though.
I mean, many fields teach you basic principles that are applicable outside of that field, but this is true of evolutionary biology and physics as well as modern AI.
I don’t deny that some understanding of search algorithms is useful, I’m just confused as to why you think it’s more useful than, say, the shifts in worldview that you could get from a physics textbook.
You are advocating starting from first principles and searching ideaspace for a workable FAI design
Hmm, it appears that I failed to get my point across. We’re not currently searching for workable FAI designs. We’re searching for a formal description of “friendly behavior.” Once we have that, we can start searching for FAI designs. Before we have that, the word “Friendly” doesn’t mean anything specific.
the space of implementable FAI designs is much smaller than the space of theoretical FAI designs
Yes, to be sure! “Implementable FAI designs” compose the bullseye on the much wider “all FAI designs” target. But we’re not at the part yet where we’re creating and aiming the arrows. Rather, we’re still looking for the target!
We don’t know what “Friendly” means, in a formal sense. If we did know what it meant, we would be able to specify an impractical brute force algorithm specifying a system which could take unbounded finite computing power and reliably undergo an intelligence explosion and have a beneficial impact upon humanity; because brute force is powerful. We’re trying to figure out how to write the unbounded solutions not because they’re practical, but because this is how you figure out what “friendly” means in a formal sense.
(Or, in other words, the word “Friendly” is still mysterious, we’re trying to ground it out. The way you do that is by figuring out what you mean given unbounded computing power and an understanding of general intelligence. Once you have that, you can start talking about “FAI designs,” but not before.)
By contrast, we do have an understanding of “intelligence” in this very weak sense, in that we can specify things like AIXI (which act very “intelligently” in our universe given a pocket universe that runs hypercomputers). Clearly, there is a huge gap between the “infinite computer” understanding and understanding sufficient to build practical systems, but we don’t even have the first type of understanding of what it means to ask for a “Friendly” system.
I definitely acknowledge that when you’re searching in FAI design space, it is very important to keep practicality in mind. But we aren’t searching in FAI design space, we’re searching for it.
I’m confused as to why you think that knowledge of search algorithms is important for FAI research, though.
I don’t think he meant to say that “knowledge of search algorithms is important for FAI research”, I think he meant to say “by analogy from search algorithms, you’re going to make progress faster if you research the abstract formal theory and the concrete implementation at the same time, letting progress in one guide work in the other”.
I’m personally sympathetic to your argument, that there’s no point in looking at the concrete implementations before we understand the formal specification in good enough detail to know what to look for in the concrete implementations… but on the other hand, I’m also sympathetic to the argument that if you do not also look at the concrete implementations, you may never hit the formal specifications that are actually correct.
To stretch the chess analogy, even though Shannon didn’t use any 1950s knowledge of game-playing heuristics, he presumably did use something like the knowledge of chess being a two-player game that’s played by the two taking alternating turns in moving different kinds of pieces on a board. If he didn’t have this information to ground his search, and had instead tried to come up with a general formal algorithm for winning in any game (including football, tag, and 20 questions), it seems much less likely that he would have come up with anything useful.
As a more relevant example, consider the discussion about VNM rationality. Suppose that you carry out a long research program focused on understanding how to specify Friendliness in a framework built around VNM rationality, all the while research in practical AI reveals that VNM rationality is a fundamentally flawed approach for looking at decision-making, and discovers a superior framework that’s much more suited for both AI design and Friendliness research. (I don’t expect this to necessarily happen, but I imagine that something like that could happen.) If your work on Friendliness research continues while you remain ignorant of this discovery, you’ll waste time pursuing a direction that can never produce a useful result, even on the level of an “infinite computer” understanding.
To stretch the chess analogy, even though Shannon didn’t use any 1950s knowledge of game-playing heuristics, he presumably did use something like the knowledge of chess being a two-player game that’s played by the two taking alternating turns in moving different kinds of pieces on a board.
I agree, and I think it is important to understand computation, logic, foundations of computer science, etc. in doing FAI research. Trying to do FAI theory with no knowledge of computers is surely a foolish endeavor. My point was more along the lines of “modern AI textbooks mostly contain heuristics and strategies for getting good behavior out of narrow systems, and this doesn’t seem like the appropriate place to get the relevant low-level knowledge.”
To continue abusing the chess analogy, I completely agree that Shannon needed to know things about chess, but I don’t think he needed to understand 1950′s-era programming techniques (such as the formal beginnings of assembler languages and the early attempts to construct compilers). It seems to me that the field of modern AI is less like “understanding chess” and more like “understanding assembly languages” in this particular analogy.
That said, I am not trying to say that this is the only way to approach friendliness research. I currently think that it’s one of the most promising methods, but I certainly won’t discourage anyone who wants to try to do friendliness research from a completely different direction.
The only points I’m trying to make here are that (a) I think MIRI’s approach is fairly promising, and (b) within this approach, an understanding of modern AI is not a prerequisite to understanding our active research.
Are there other approaches to FAI that would make significantly more use of modern narrow AI techniques? Yes, of course. (Nick Hay and Stuart Russell are poking at some of those topics today, and we occasionally get together and chat about them.) Would it be nice if MIRI could take a number of different approaches all at the same time? Yes, of course! But there are currently only three of us. I agree that it would be nice to be in a position where we had enough resources to try many different approaches at once, but it is currently a factual point that, in order to understand our active research, you don’t need much narrow AI knowledge.
Kaj seems to have understood perfectly the point I was making, so I will simply point to his sibling comment. Thank you Kaj.
However your response I think reveals an even deeper disconnect. MIRI claims not to have a theory of friendliness, yet also presupposes what that theory will look like. I’m not sure what definition of friendliness you have in mind, but mine is roughly “the characteristics of an AGI which ensure it helps humanity through the singularity rather than be subsumed by it.” Such a definition would include an oracle AI → intelligence amplification approach, for example. MIRI on the other hand appears to be aiming towards the benevolent god model in exclusion to everything else (“turn it on and walk away”).
I’m not going to try advocating for any particular approach—I’ve done that before to Luke without much success. What I do advocate is that you do the same thing I have done and continue to do: take the broader definition of success (surviving the singularity), look at what is required to achieve that in practice, and do whatever gets us across the finish line the fastest. This is a race, both against UFAI and the inaction which costs the daily suffering of the present human condition.
When I did that analysis, I concluded that the benevolent god approach favored by MIRI has both the longest lead time and the lowest probability of success. Other approaches—whose goal states are well defined—could be achieved in the span of time it takes just to define what a benevolent god friendly AGI would look like.
I’m curious what conclusions you came to after your own assessment, assuming you did one at all.
Hmm, I’m not feeling like you’re giving me any charity here. Comments such as the following all give me an impression that you’re not open to actually engaging with my points:
So you claim.
...
Yet I care enough to be clued into what MIRI is doing, and still find your work to be absolutely without value and irrelevant to me. This should be very concerning to you.
...
one of the things you would learn from the first few chapters of AI: A Modern Approach is the principles of search,
...
further work on decision theory is pretty much a useless
...
assuming you did one at all.
None of these are particularly egregious, but they are all phrased somewhat aggressively, and add up to an impression that you’re mostly just trying to vent. I’m trying to interpret you charitably here, but I don’t feel like that’s being reciprocated, and this lowers my desire to engage with your concerns (mostly by lowering my expectation that you are trying to see my viewpoint).
I also feel a bit like you’re trying to argue against other people’s points through me. For example, I do not see MIRI’s active research as a “benevolent god only” approach, and I personally place low probability on a “turn it on and walk away” scenario.
Other approaches—whose goal states are well defined—could be achieved in the span of time it takes just to define what a benevolent god friendly AGI would look like.
Analogy: let’s say that someone is trying really hard to build a system that takes observations and turns them into a very accurate world-model, and that the fate of humanity rides on the resulting world-model being very accurate. If someone claimed that they had a very good model-building heuristic, while lacking an understanding of information theory and Bayesian reasoning, then I would be quite skeptical of claims like “don’t worry, I’m very sure that it won’t get stuck at the wrong solution.” Until they have a formal understanding of what it means for a model to get stuck, of what it means to use all available information, I would not be confident in their system. (How do you evaluate a heuristic before understanding what the heuristic is intended to approximate?)
Further, it seems to me implausible that they could become very confident in their model-building heuristic without developing a formal understanding of information theory along the way.
For similar reasons, I would be quite skeptical of any system purported to be “safe” by people with no formal understanding of what “safety” meant, and it seems implausible to me that they could become confident in the system’s behavior without first developing a formal understanding of the intended behavior.
My apologies. I have been fruitlessly engaging with SIAI/MIRI representatives longer than you have been involved in the organization, in the hope of seing sponsored work on what I see to far more useful lines of research given the time constraints we are all working with, e.g. work on AI boxing instead of utility functions and tiling agents.
I started by showing how many of the standard arguments extracted from the sequences used in favor of MIRI’s approach of direct FAI are fallacious, or at least presented unconvincingly. This didn’t work out very well for either side; in retrospect I think we mostly talked past each other.
I then argued based on timelines, showing that based on available tech and information and the weak inside view that UFAI could be as close as 5-20 years away, and MIRI’s own timelines did not, and still does not to my knowledge expect practical results in that short a time horizon. The response was a citation to Stuart Armstrong’s paper showing an average expert opinon of AI being 50-70 years away… which was stunning considering the thesis of the paper was about just how bad it is to ask experts about questions like the ETA for human-level AI.
I then asked MIRI to consider hiring a project manager, i.e. a professional whose job it is to keep projects on time and on budget, to help make these decisions in coordinating and guiding research efforts. This suggestion was received about as well as the others.
And then I saw an updated course list from the machine intelligence research institute which dropped from consideration the sixty year history of people actually writing machine intelligences, both the theory and the practice that has accumulated.
So if it seemed a bit like I was trying to argue against other people’s points through you, I’m sorry I guess I was. I was arguing with MIRI, which you now represent.
Regarding your example, I understand what you are saying but I don’t think you are arguing against me. One way of making sure something is safe is making it unable to take actions with irreversible consequences. You don’t need to be confident in a system’s behavior, if that system’s behavior has no effectors on the real world.
I’m all for a trustless AI running as a “physical operating system” for a positive universe. But we have time to figure out how to do that post-singularity.
Thanks—I don’t really want to get into involved arguments about overall strategy on this forum, but I can address each of your points in turn. I’m afraid I only have time to sketch my justifications rather than explain them in detail, as I’m quite pressed for time these days.
I understand that these probably won’t convince you, but I hope to at least demonstrate that I have been giving these sorts of things some thought.
work on AI boxing
My conclusions:
Most types of boxing research would be premature, given how little we know about what early AGI architectures will look like.
That said, there is some boxing work that could be done nowadays. We do touch upon a lot of this stuff (or things in nearby spaces) under the “Corrigibility” banner. (See also Stuart’s “low impact” posts.)
Furthermore, some of our decision theory research does include early work on boxing problems (e.g., how do you build an oracle that does not try to manipulate the programmers into giving it easier questions? Turns out this has a lot to do with how the oracle evaluates its decisions.)
I agree that there is more work that could be done on boxing that would be positive value, but I expect it would be more speculative than, say, the tiling work.
the weak inside view that UFAI could be as close as 5-20 years away
My thoughts: “could” is ambiguous here. What probability do you put on AGI in 5 years? My personal 95% confidence interval is 5 to 150 years (including outside view, model uncertainty, etc) with a mean around 50 years and skewed a bit towards the front, and I am certainly not shooting for a strategy that has us lose 50% of the time, so I agree that we damn well better be on a 20-30 year track.
I then asked MIRI to consider hiring a project manager
I think MIRI made the right choice here. There are only three full-time FAI researchers at MIRI right now, and we’re good at coordinating with each other and holding ourselves to deadlines. A project manager would be drastic overkill.
And then I saw an updated course list from the machine intelligence research institute which dropped from consideration the sixty year history of people actually writing machine intelligences
To be clear, this is no longer a course list, it’s a research guide. The fact of the matter is, modern narrow AI is not necessary to understand our active research. Is it still useful for a greater context? Yes. Is there FAI work that would depend upon modern narrow AI research? Of course! But the subject simply isn’t a prerequisite for understanding our current active research.
I do understand how this omission galled you, though. Apologies for that.
One way of making sure something is safe is making it unable to take actions with irreversible consequences.
I’m not sure what this means or how it’s safe. (I wouldn’t, for example, be comfortable constructing a machine that forcibly wireheads me, just because it believes it can reverse the process.)
I think that there’s something useful in the space of “low impact” / “domesticity,” but suggestions like “just make everything reversible” don’t seem to engage with the difficulty of the problem.
You don’t need to be confident in a system’s behavior, if that system’s behavior has no effectors on the real world.
I also don’t understand what it meas to have “no effectors in the real world.”
There is no such thing as “not effecting the world.” Running a processor has electrical and gravitational effects on everything in the area. Wifi exists. This paper describes how an evolved clock re-purposed the circuits on its motherboard to magnify signals from nearby laptops to use as an oscillator, stunning everyone involved. Weak genetic programming algorithms ended up using hardware in completely unexpected ways to do things the programmers expected was impossible. So yes, we really do need to worry about strong intelligent processes using their hardware in unanticipated ways in order to affect the world.
(Furthermore, anything that interacts with humans is essentially using humans as effectors. There’s no such thing as “not having effectors on the real world.”)
I agree that there’s something interesting in the space of “just build AIs that don’t care about the real world” (for example, constructing an AI which is only trying to affect the platonic output of its turing machine and does not care about its physical implementation), but even this has potential security holes if you look closely. Some of our decision theory work does touch upon this sort of possibility, but there’s definitely other work to be done in this space that we aren’t looking at.
But again, suggestions like “just don’t let it have effectors” fail to engage with the difficulty of the problem.
I have personally studied modern AI theory (not via this specific textbook, but via others), and I happen to know a fair amount about various search algorithms. I’m confused as to why you think that knowledge of search algorithms is important for FAI research, though.
I mean, many fields teach you basic principles that are applicable outside of that field, but this is true of evolutionary biology and physics as well as modern AI.
I don’t deny that some understanding of search algorithms is useful, I’m just confused as to why you think it’s more useful than, say, the shifts in worldview that you could get from a physics textbook.
Hmm, it appears that I failed to get my point across. We’re not currently searching for workable FAI designs. We’re searching for a formal description of “friendly behavior.” Once we have that, we can start searching for FAI designs. Before we have that, the word “Friendly” doesn’t mean anything specific.
Yes, to be sure! “Implementable FAI designs” compose the bullseye on the much wider “all FAI designs” target. But we’re not at the part yet where we’re creating and aiming the arrows. Rather, we’re still looking for the target!
We don’t know what “Friendly” means, in a formal sense. If we did know what it meant, we would be able to specify an impractical brute force algorithm specifying a system which could take unbounded finite computing power and reliably undergo an intelligence explosion and have a beneficial impact upon humanity; because brute force is powerful. We’re trying to figure out how to write the unbounded solutions not because they’re practical, but because this is how you figure out what “friendly” means in a formal sense.
(Or, in other words, the word “Friendly” is still mysterious, we’re trying to ground it out. The way you do that is by figuring out what you mean given unbounded computing power and an understanding of general intelligence. Once you have that, you can start talking about “FAI designs,” but not before.)
By contrast, we do have an understanding of “intelligence” in this very weak sense, in that we can specify things like AIXI (which act very “intelligently” in our universe given a pocket universe that runs hypercomputers). Clearly, there is a huge gap between the “infinite computer” understanding and understanding sufficient to build practical systems, but we don’t even have the first type of understanding of what it means to ask for a “Friendly” system.
I definitely acknowledge that when you’re searching in FAI design space, it is very important to keep practicality in mind. But we aren’t searching in FAI design space, we’re searching for it.
I don’t think he meant to say that “knowledge of search algorithms is important for FAI research”, I think he meant to say “by analogy from search algorithms, you’re going to make progress faster if you research the abstract formal theory and the concrete implementation at the same time, letting progress in one guide work in the other”.
I’m personally sympathetic to your argument, that there’s no point in looking at the concrete implementations before we understand the formal specification in good enough detail to know what to look for in the concrete implementations… but on the other hand, I’m also sympathetic to the argument that if you do not also look at the concrete implementations, you may never hit the formal specifications that are actually correct.
To stretch the chess analogy, even though Shannon didn’t use any 1950s knowledge of game-playing heuristics, he presumably did use something like the knowledge of chess being a two-player game that’s played by the two taking alternating turns in moving different kinds of pieces on a board. If he didn’t have this information to ground his search, and had instead tried to come up with a general formal algorithm for winning in any game (including football, tag, and 20 questions), it seems much less likely that he would have come up with anything useful.
As a more relevant example, consider the discussion about VNM rationality. Suppose that you carry out a long research program focused on understanding how to specify Friendliness in a framework built around VNM rationality, all the while research in practical AI reveals that VNM rationality is a fundamentally flawed approach for looking at decision-making, and discovers a superior framework that’s much more suited for both AI design and Friendliness research. (I don’t expect this to necessarily happen, but I imagine that something like that could happen.) If your work on Friendliness research continues while you remain ignorant of this discovery, you’ll waste time pursuing a direction that can never produce a useful result, even on the level of an “infinite computer” understanding.
Thanks, Kaj.
I agree, and I think it is important to understand computation, logic, foundations of computer science, etc. in doing FAI research. Trying to do FAI theory with no knowledge of computers is surely a foolish endeavor. My point was more along the lines of “modern AI textbooks mostly contain heuristics and strategies for getting good behavior out of narrow systems, and this doesn’t seem like the appropriate place to get the relevant low-level knowledge.”
To continue abusing the chess analogy, I completely agree that Shannon needed to know things about chess, but I don’t think he needed to understand 1950′s-era programming techniques (such as the formal beginnings of assembler languages and the early attempts to construct compilers). It seems to me that the field of modern AI is less like “understanding chess” and more like “understanding assembly languages” in this particular analogy.
That said, I am not trying to say that this is the only way to approach friendliness research. I currently think that it’s one of the most promising methods, but I certainly won’t discourage anyone who wants to try to do friendliness research from a completely different direction.
The only points I’m trying to make here are that (a) I think MIRI’s approach is fairly promising, and (b) within this approach, an understanding of modern AI is not a prerequisite to understanding our active research.
Are there other approaches to FAI that would make significantly more use of modern narrow AI techniques? Yes, of course. (Nick Hay and Stuart Russell are poking at some of those topics today, and we occasionally get together and chat about them.) Would it be nice if MIRI could take a number of different approaches all at the same time? Yes, of course! But there are currently only three of us. I agree that it would be nice to be in a position where we had enough resources to try many different approaches at once, but it is currently a factual point that, in order to understand our active research, you don’t need much narrow AI knowledge.
Thanks, that’s a good clarification. May be worth explicitly mentioning something like that in the guide, too.
Kaj seems to have understood perfectly the point I was making, so I will simply point to his sibling comment. Thank you Kaj.
However your response I think reveals an even deeper disconnect. MIRI claims not to have a theory of friendliness, yet also presupposes what that theory will look like. I’m not sure what definition of friendliness you have in mind, but mine is roughly “the characteristics of an AGI which ensure it helps humanity through the singularity rather than be subsumed by it.” Such a definition would include an oracle AI → intelligence amplification approach, for example. MIRI on the other hand appears to be aiming towards the benevolent god model in exclusion to everything else (“turn it on and walk away”).
I’m not going to try advocating for any particular approach—I’ve done that before to Luke without much success. What I do advocate is that you do the same thing I have done and continue to do: take the broader definition of success (surviving the singularity), look at what is required to achieve that in practice, and do whatever gets us across the finish line the fastest. This is a race, both against UFAI and the inaction which costs the daily suffering of the present human condition.
When I did that analysis, I concluded that the benevolent god approach favored by MIRI has both the longest lead time and the lowest probability of success. Other approaches—whose goal states are well defined—could be achieved in the span of time it takes just to define what a benevolent god friendly AGI would look like.
I’m curious what conclusions you came to after your own assessment, assuming you did one at all.
Hmm, I’m not feeling like you’re giving me any charity here. Comments such as the following all give me an impression that you’re not open to actually engaging with my points:
...
...
...
...
None of these are particularly egregious, but they are all phrased somewhat aggressively, and add up to an impression that you’re mostly just trying to vent. I’m trying to interpret you charitably here, but I don’t feel like that’s being reciprocated, and this lowers my desire to engage with your concerns (mostly by lowering my expectation that you are trying to see my viewpoint).
I also feel a bit like you’re trying to argue against other people’s points through me. For example, I do not see MIRI’s active research as a “benevolent god only” approach, and I personally place low probability on a “turn it on and walk away” scenario.
Analogy: let’s say that someone is trying really hard to build a system that takes observations and turns them into a very accurate world-model, and that the fate of humanity rides on the resulting world-model being very accurate. If someone claimed that they had a very good model-building heuristic, while lacking an understanding of information theory and Bayesian reasoning, then I would be quite skeptical of claims like “don’t worry, I’m very sure that it won’t get stuck at the wrong solution.” Until they have a formal understanding of what it means for a model to get stuck, of what it means to use all available information, I would not be confident in their system. (How do you evaluate a heuristic before understanding what the heuristic is intended to approximate?)
Further, it seems to me implausible that they could become very confident in their model-building heuristic without developing a formal understanding of information theory along the way.
For similar reasons, I would be quite skeptical of any system purported to be “safe” by people with no formal understanding of what “safety” meant, and it seems implausible to me that they could become confident in the system’s behavior without first developing a formal understanding of the intended behavior.
My apologies. I have been fruitlessly engaging with SIAI/MIRI representatives longer than you have been involved in the organization, in the hope of seing sponsored work on what I see to far more useful lines of research given the time constraints we are all working with, e.g. work on AI boxing instead of utility functions and tiling agents.
I started by showing how many of the standard arguments extracted from the sequences used in favor of MIRI’s approach of direct FAI are fallacious, or at least presented unconvincingly. This didn’t work out very well for either side; in retrospect I think we mostly talked past each other.
I then argued based on timelines, showing that based on available tech and information and the weak inside view that UFAI could be as close as 5-20 years away, and MIRI’s own timelines did not, and still does not to my knowledge expect practical results in that short a time horizon. The response was a citation to Stuart Armstrong’s paper showing an average expert opinon of AI being 50-70 years away… which was stunning considering the thesis of the paper was about just how bad it is to ask experts about questions like the ETA for human-level AI.
I then asked MIRI to consider hiring a project manager, i.e. a professional whose job it is to keep projects on time and on budget, to help make these decisions in coordinating and guiding research efforts. This suggestion was received about as well as the others.
And then I saw an updated course list from the machine intelligence research institute which dropped from consideration the sixty year history of people actually writing machine intelligences, both the theory and the practice that has accumulated.
So if it seemed a bit like I was trying to argue against other people’s points through you, I’m sorry I guess I was. I was arguing with MIRI, which you now represent.
Regarding your example, I understand what you are saying but I don’t think you are arguing against me. One way of making sure something is safe is making it unable to take actions with irreversible consequences. You don’t need to be confident in a system’s behavior, if that system’s behavior has no effectors on the real world.
I’m all for a trustless AI running as a “physical operating system” for a positive universe. But we have time to figure out how to do that post-singularity.
Thanks—I don’t really want to get into involved arguments about overall strategy on this forum, but I can address each of your points in turn. I’m afraid I only have time to sketch my justifications rather than explain them in detail, as I’m quite pressed for time these days.
I understand that these probably won’t convince you, but I hope to at least demonstrate that I have been giving these sorts of things some thought.
My conclusions:
Most types of boxing research would be premature, given how little we know about what early AGI architectures will look like.
That said, there is some boxing work that could be done nowadays. We do touch upon a lot of this stuff (or things in nearby spaces) under the “Corrigibility” banner. (See also Stuart’s “low impact” posts.)
Furthermore, some of our decision theory research does include early work on boxing problems (e.g., how do you build an oracle that does not try to manipulate the programmers into giving it easier questions? Turns out this has a lot to do with how the oracle evaluates its decisions.)
I agree that there is more work that could be done on boxing that would be positive value, but I expect it would be more speculative than, say, the tiling work.
My thoughts: “could” is ambiguous here. What probability do you put on AGI in 5 years? My personal 95% confidence interval is 5 to 150 years (including outside view, model uncertainty, etc) with a mean around 50 years and skewed a bit towards the front, and I am certainly not shooting for a strategy that has us lose 50% of the time, so I agree that we damn well better be on a 20-30 year track.
I think MIRI made the right choice here. There are only three full-time FAI researchers at MIRI right now, and we’re good at coordinating with each other and holding ourselves to deadlines. A project manager would be drastic overkill.
To be clear, this is no longer a course list, it’s a research guide. The fact of the matter is, modern narrow AI is not necessary to understand our active research. Is it still useful for a greater context? Yes. Is there FAI work that would depend upon modern narrow AI research? Of course! But the subject simply isn’t a prerequisite for understanding our current active research.
I do understand how this omission galled you, though. Apologies for that.
I’m not sure what this means or how it’s safe. (I wouldn’t, for example, be comfortable constructing a machine that forcibly wireheads me, just because it believes it can reverse the process.)
I think that there’s something useful in the space of “low impact” / “domesticity,” but suggestions like “just make everything reversible” don’t seem to engage with the difficulty of the problem.
I also don’t understand what it meas to have “no effectors in the real world.”
There is no such thing as “not effecting the world.” Running a processor has electrical and gravitational effects on everything in the area. Wifi exists. This paper describes how an evolved clock re-purposed the circuits on its motherboard to magnify signals from nearby laptops to use as an oscillator, stunning everyone involved. Weak genetic programming algorithms ended up using hardware in completely unexpected ways to do things the programmers expected was impossible. So yes, we really do need to worry about strong intelligent processes using their hardware in unanticipated ways in order to affect the world.
See also the power of intelligence.
(Furthermore, anything that interacts with humans is essentially using humans as effectors. There’s no such thing as “not having effectors on the real world.”)
I agree that there’s something interesting in the space of “just build AIs that don’t care about the real world” (for example, constructing an AI which is only trying to affect the platonic output of its turing machine and does not care about its physical implementation), but even this has potential security holes if you look closely. Some of our decision theory work does touch upon this sort of possibility, but there’s definitely other work to be done in this space that we aren’t looking at.
But again, suggestions like “just don’t let it have effectors” fail to engage with the difficulty of the problem.