Thanks—I don’t really want to get into involved arguments about overall strategy on this forum, but I can address each of your points in turn. I’m afraid I only have time to sketch my justifications rather than explain them in detail, as I’m quite pressed for time these days.
I understand that these probably won’t convince you, but I hope to at least demonstrate that I have been giving these sorts of things some thought.
work on AI boxing
My conclusions:
Most types of boxing research would be premature, given how little we know about what early AGI architectures will look like.
That said, there is some boxing work that could be done nowadays. We do touch upon a lot of this stuff (or things in nearby spaces) under the “Corrigibility” banner. (See also Stuart’s “low impact” posts.)
Furthermore, some of our decision theory research does include early work on boxing problems (e.g., how do you build an oracle that does not try to manipulate the programmers into giving it easier questions? Turns out this has a lot to do with how the oracle evaluates its decisions.)
I agree that there is more work that could be done on boxing that would be positive value, but I expect it would be more speculative than, say, the tiling work.
the weak inside view that UFAI could be as close as 5-20 years away
My thoughts: “could” is ambiguous here. What probability do you put on AGI in 5 years? My personal 95% confidence interval is 5 to 150 years (including outside view, model uncertainty, etc) with a mean around 50 years and skewed a bit towards the front, and I am certainly not shooting for a strategy that has us lose 50% of the time, so I agree that we damn well better be on a 20-30 year track.
I then asked MIRI to consider hiring a project manager
I think MIRI made the right choice here. There are only three full-time FAI researchers at MIRI right now, and we’re good at coordinating with each other and holding ourselves to deadlines. A project manager would be drastic overkill.
And then I saw an updated course list from the machine intelligence research institute which dropped from consideration the sixty year history of people actually writing machine intelligences
To be clear, this is no longer a course list, it’s a research guide. The fact of the matter is, modern narrow AI is not necessary to understand our active research. Is it still useful for a greater context? Yes. Is there FAI work that would depend upon modern narrow AI research? Of course! But the subject simply isn’t a prerequisite for understanding our current active research.
I do understand how this omission galled you, though. Apologies for that.
One way of making sure something is safe is making it unable to take actions with irreversible consequences.
I’m not sure what this means or how it’s safe. (I wouldn’t, for example, be comfortable constructing a machine that forcibly wireheads me, just because it believes it can reverse the process.)
I think that there’s something useful in the space of “low impact” / “domesticity,” but suggestions like “just make everything reversible” don’t seem to engage with the difficulty of the problem.
You don’t need to be confident in a system’s behavior, if that system’s behavior has no effectors on the real world.
I also don’t understand what it meas to have “no effectors in the real world.”
There is no such thing as “not effecting the world.” Running a processor has electrical and gravitational effects on everything in the area. Wifi exists. This paper describes how an evolved clock re-purposed the circuits on its motherboard to magnify signals from nearby laptops to use as an oscillator, stunning everyone involved. Weak genetic programming algorithms ended up using hardware in completely unexpected ways to do things the programmers expected was impossible. So yes, we really do need to worry about strong intelligent processes using their hardware in unanticipated ways in order to affect the world.
(Furthermore, anything that interacts with humans is essentially using humans as effectors. There’s no such thing as “not having effectors on the real world.”)
I agree that there’s something interesting in the space of “just build AIs that don’t care about the real world” (for example, constructing an AI which is only trying to affect the platonic output of its turing machine and does not care about its physical implementation), but even this has potential security holes if you look closely. Some of our decision theory work does touch upon this sort of possibility, but there’s definitely other work to be done in this space that we aren’t looking at.
But again, suggestions like “just don’t let it have effectors” fail to engage with the difficulty of the problem.
Thanks—I don’t really want to get into involved arguments about overall strategy on this forum, but I can address each of your points in turn. I’m afraid I only have time to sketch my justifications rather than explain them in detail, as I’m quite pressed for time these days.
I understand that these probably won’t convince you, but I hope to at least demonstrate that I have been giving these sorts of things some thought.
My conclusions:
Most types of boxing research would be premature, given how little we know about what early AGI architectures will look like.
That said, there is some boxing work that could be done nowadays. We do touch upon a lot of this stuff (or things in nearby spaces) under the “Corrigibility” banner. (See also Stuart’s “low impact” posts.)
Furthermore, some of our decision theory research does include early work on boxing problems (e.g., how do you build an oracle that does not try to manipulate the programmers into giving it easier questions? Turns out this has a lot to do with how the oracle evaluates its decisions.)
I agree that there is more work that could be done on boxing that would be positive value, but I expect it would be more speculative than, say, the tiling work.
My thoughts: “could” is ambiguous here. What probability do you put on AGI in 5 years? My personal 95% confidence interval is 5 to 150 years (including outside view, model uncertainty, etc) with a mean around 50 years and skewed a bit towards the front, and I am certainly not shooting for a strategy that has us lose 50% of the time, so I agree that we damn well better be on a 20-30 year track.
I think MIRI made the right choice here. There are only three full-time FAI researchers at MIRI right now, and we’re good at coordinating with each other and holding ourselves to deadlines. A project manager would be drastic overkill.
To be clear, this is no longer a course list, it’s a research guide. The fact of the matter is, modern narrow AI is not necessary to understand our active research. Is it still useful for a greater context? Yes. Is there FAI work that would depend upon modern narrow AI research? Of course! But the subject simply isn’t a prerequisite for understanding our current active research.
I do understand how this omission galled you, though. Apologies for that.
I’m not sure what this means or how it’s safe. (I wouldn’t, for example, be comfortable constructing a machine that forcibly wireheads me, just because it believes it can reverse the process.)
I think that there’s something useful in the space of “low impact” / “domesticity,” but suggestions like “just make everything reversible” don’t seem to engage with the difficulty of the problem.
I also don’t understand what it meas to have “no effectors in the real world.”
There is no such thing as “not effecting the world.” Running a processor has electrical and gravitational effects on everything in the area. Wifi exists. This paper describes how an evolved clock re-purposed the circuits on its motherboard to magnify signals from nearby laptops to use as an oscillator, stunning everyone involved. Weak genetic programming algorithms ended up using hardware in completely unexpected ways to do things the programmers expected was impossible. So yes, we really do need to worry about strong intelligent processes using their hardware in unanticipated ways in order to affect the world.
See also the power of intelligence.
(Furthermore, anything that interacts with humans is essentially using humans as effectors. There’s no such thing as “not having effectors on the real world.”)
I agree that there’s something interesting in the space of “just build AIs that don’t care about the real world” (for example, constructing an AI which is only trying to affect the platonic output of its turing machine and does not care about its physical implementation), but even this has potential security holes if you look closely. Some of our decision theory work does touch upon this sort of possibility, but there’s definitely other work to be done in this space that we aren’t looking at.
But again, suggestions like “just don’t let it have effectors” fail to engage with the difficulty of the problem.