Kaj seems to have understood perfectly the point I was making, so I will simply point to his sibling comment. Thank you Kaj.
However your response I think reveals an even deeper disconnect. MIRI claims not to have a theory of friendliness, yet also presupposes what that theory will look like. I’m not sure what definition of friendliness you have in mind, but mine is roughly “the characteristics of an AGI which ensure it helps humanity through the singularity rather than be subsumed by it.” Such a definition would include an oracle AI → intelligence amplification approach, for example. MIRI on the other hand appears to be aiming towards the benevolent god model in exclusion to everything else (“turn it on and walk away”).
I’m not going to try advocating for any particular approach—I’ve done that before to Luke without much success. What I do advocate is that you do the same thing I have done and continue to do: take the broader definition of success (surviving the singularity), look at what is required to achieve that in practice, and do whatever gets us across the finish line the fastest. This is a race, both against UFAI and the inaction which costs the daily suffering of the present human condition.
When I did that analysis, I concluded that the benevolent god approach favored by MIRI has both the longest lead time and the lowest probability of success. Other approaches—whose goal states are well defined—could be achieved in the span of time it takes just to define what a benevolent god friendly AGI would look like.
I’m curious what conclusions you came to after your own assessment, assuming you did one at all.
Hmm, I’m not feeling like you’re giving me any charity here. Comments such as the following all give me an impression that you’re not open to actually engaging with my points:
So you claim.
...
Yet I care enough to be clued into what MIRI is doing, and still find your work to be absolutely without value and irrelevant to me. This should be very concerning to you.
...
one of the things you would learn from the first few chapters of AI: A Modern Approach is the principles of search,
...
further work on decision theory is pretty much a useless
...
assuming you did one at all.
None of these are particularly egregious, but they are all phrased somewhat aggressively, and add up to an impression that you’re mostly just trying to vent. I’m trying to interpret you charitably here, but I don’t feel like that’s being reciprocated, and this lowers my desire to engage with your concerns (mostly by lowering my expectation that you are trying to see my viewpoint).
I also feel a bit like you’re trying to argue against other people’s points through me. For example, I do not see MIRI’s active research as a “benevolent god only” approach, and I personally place low probability on a “turn it on and walk away” scenario.
Other approaches—whose goal states are well defined—could be achieved in the span of time it takes just to define what a benevolent god friendly AGI would look like.
Analogy: let’s say that someone is trying really hard to build a system that takes observations and turns them into a very accurate world-model, and that the fate of humanity rides on the resulting world-model being very accurate. If someone claimed that they had a very good model-building heuristic, while lacking an understanding of information theory and Bayesian reasoning, then I would be quite skeptical of claims like “don’t worry, I’m very sure that it won’t get stuck at the wrong solution.” Until they have a formal understanding of what it means for a model to get stuck, of what it means to use all available information, I would not be confident in their system. (How do you evaluate a heuristic before understanding what the heuristic is intended to approximate?)
Further, it seems to me implausible that they could become very confident in their model-building heuristic without developing a formal understanding of information theory along the way.
For similar reasons, I would be quite skeptical of any system purported to be “safe” by people with no formal understanding of what “safety” meant, and it seems implausible to me that they could become confident in the system’s behavior without first developing a formal understanding of the intended behavior.
My apologies. I have been fruitlessly engaging with SIAI/MIRI representatives longer than you have been involved in the organization, in the hope of seing sponsored work on what I see to far more useful lines of research given the time constraints we are all working with, e.g. work on AI boxing instead of utility functions and tiling agents.
I started by showing how many of the standard arguments extracted from the sequences used in favor of MIRI’s approach of direct FAI are fallacious, or at least presented unconvincingly. This didn’t work out very well for either side; in retrospect I think we mostly talked past each other.
I then argued based on timelines, showing that based on available tech and information and the weak inside view that UFAI could be as close as 5-20 years away, and MIRI’s own timelines did not, and still does not to my knowledge expect practical results in that short a time horizon. The response was a citation to Stuart Armstrong’s paper showing an average expert opinon of AI being 50-70 years away… which was stunning considering the thesis of the paper was about just how bad it is to ask experts about questions like the ETA for human-level AI.
I then asked MIRI to consider hiring a project manager, i.e. a professional whose job it is to keep projects on time and on budget, to help make these decisions in coordinating and guiding research efforts. This suggestion was received about as well as the others.
And then I saw an updated course list from the machine intelligence research institute which dropped from consideration the sixty year history of people actually writing machine intelligences, both the theory and the practice that has accumulated.
So if it seemed a bit like I was trying to argue against other people’s points through you, I’m sorry I guess I was. I was arguing with MIRI, which you now represent.
Regarding your example, I understand what you are saying but I don’t think you are arguing against me. One way of making sure something is safe is making it unable to take actions with irreversible consequences. You don’t need to be confident in a system’s behavior, if that system’s behavior has no effectors on the real world.
I’m all for a trustless AI running as a “physical operating system” for a positive universe. But we have time to figure out how to do that post-singularity.
Thanks—I don’t really want to get into involved arguments about overall strategy on this forum, but I can address each of your points in turn. I’m afraid I only have time to sketch my justifications rather than explain them in detail, as I’m quite pressed for time these days.
I understand that these probably won’t convince you, but I hope to at least demonstrate that I have been giving these sorts of things some thought.
work on AI boxing
My conclusions:
Most types of boxing research would be premature, given how little we know about what early AGI architectures will look like.
That said, there is some boxing work that could be done nowadays. We do touch upon a lot of this stuff (or things in nearby spaces) under the “Corrigibility” banner. (See also Stuart’s “low impact” posts.)
Furthermore, some of our decision theory research does include early work on boxing problems (e.g., how do you build an oracle that does not try to manipulate the programmers into giving it easier questions? Turns out this has a lot to do with how the oracle evaluates its decisions.)
I agree that there is more work that could be done on boxing that would be positive value, but I expect it would be more speculative than, say, the tiling work.
the weak inside view that UFAI could be as close as 5-20 years away
My thoughts: “could” is ambiguous here. What probability do you put on AGI in 5 years? My personal 95% confidence interval is 5 to 150 years (including outside view, model uncertainty, etc) with a mean around 50 years and skewed a bit towards the front, and I am certainly not shooting for a strategy that has us lose 50% of the time, so I agree that we damn well better be on a 20-30 year track.
I then asked MIRI to consider hiring a project manager
I think MIRI made the right choice here. There are only three full-time FAI researchers at MIRI right now, and we’re good at coordinating with each other and holding ourselves to deadlines. A project manager would be drastic overkill.
And then I saw an updated course list from the machine intelligence research institute which dropped from consideration the sixty year history of people actually writing machine intelligences
To be clear, this is no longer a course list, it’s a research guide. The fact of the matter is, modern narrow AI is not necessary to understand our active research. Is it still useful for a greater context? Yes. Is there FAI work that would depend upon modern narrow AI research? Of course! But the subject simply isn’t a prerequisite for understanding our current active research.
I do understand how this omission galled you, though. Apologies for that.
One way of making sure something is safe is making it unable to take actions with irreversible consequences.
I’m not sure what this means or how it’s safe. (I wouldn’t, for example, be comfortable constructing a machine that forcibly wireheads me, just because it believes it can reverse the process.)
I think that there’s something useful in the space of “low impact” / “domesticity,” but suggestions like “just make everything reversible” don’t seem to engage with the difficulty of the problem.
You don’t need to be confident in a system’s behavior, if that system’s behavior has no effectors on the real world.
I also don’t understand what it meas to have “no effectors in the real world.”
There is no such thing as “not effecting the world.” Running a processor has electrical and gravitational effects on everything in the area. Wifi exists. This paper describes how an evolved clock re-purposed the circuits on its motherboard to magnify signals from nearby laptops to use as an oscillator, stunning everyone involved. Weak genetic programming algorithms ended up using hardware in completely unexpected ways to do things the programmers expected was impossible. So yes, we really do need to worry about strong intelligent processes using their hardware in unanticipated ways in order to affect the world.
(Furthermore, anything that interacts with humans is essentially using humans as effectors. There’s no such thing as “not having effectors on the real world.”)
I agree that there’s something interesting in the space of “just build AIs that don’t care about the real world” (for example, constructing an AI which is only trying to affect the platonic output of its turing machine and does not care about its physical implementation), but even this has potential security holes if you look closely. Some of our decision theory work does touch upon this sort of possibility, but there’s definitely other work to be done in this space that we aren’t looking at.
But again, suggestions like “just don’t let it have effectors” fail to engage with the difficulty of the problem.
Kaj seems to have understood perfectly the point I was making, so I will simply point to his sibling comment. Thank you Kaj.
However your response I think reveals an even deeper disconnect. MIRI claims not to have a theory of friendliness, yet also presupposes what that theory will look like. I’m not sure what definition of friendliness you have in mind, but mine is roughly “the characteristics of an AGI which ensure it helps humanity through the singularity rather than be subsumed by it.” Such a definition would include an oracle AI → intelligence amplification approach, for example. MIRI on the other hand appears to be aiming towards the benevolent god model in exclusion to everything else (“turn it on and walk away”).
I’m not going to try advocating for any particular approach—I’ve done that before to Luke without much success. What I do advocate is that you do the same thing I have done and continue to do: take the broader definition of success (surviving the singularity), look at what is required to achieve that in practice, and do whatever gets us across the finish line the fastest. This is a race, both against UFAI and the inaction which costs the daily suffering of the present human condition.
When I did that analysis, I concluded that the benevolent god approach favored by MIRI has both the longest lead time and the lowest probability of success. Other approaches—whose goal states are well defined—could be achieved in the span of time it takes just to define what a benevolent god friendly AGI would look like.
I’m curious what conclusions you came to after your own assessment, assuming you did one at all.
Hmm, I’m not feeling like you’re giving me any charity here. Comments such as the following all give me an impression that you’re not open to actually engaging with my points:
...
...
...
...
None of these are particularly egregious, but they are all phrased somewhat aggressively, and add up to an impression that you’re mostly just trying to vent. I’m trying to interpret you charitably here, but I don’t feel like that’s being reciprocated, and this lowers my desire to engage with your concerns (mostly by lowering my expectation that you are trying to see my viewpoint).
I also feel a bit like you’re trying to argue against other people’s points through me. For example, I do not see MIRI’s active research as a “benevolent god only” approach, and I personally place low probability on a “turn it on and walk away” scenario.
Analogy: let’s say that someone is trying really hard to build a system that takes observations and turns them into a very accurate world-model, and that the fate of humanity rides on the resulting world-model being very accurate. If someone claimed that they had a very good model-building heuristic, while lacking an understanding of information theory and Bayesian reasoning, then I would be quite skeptical of claims like “don’t worry, I’m very sure that it won’t get stuck at the wrong solution.” Until they have a formal understanding of what it means for a model to get stuck, of what it means to use all available information, I would not be confident in their system. (How do you evaluate a heuristic before understanding what the heuristic is intended to approximate?)
Further, it seems to me implausible that they could become very confident in their model-building heuristic without developing a formal understanding of information theory along the way.
For similar reasons, I would be quite skeptical of any system purported to be “safe” by people with no formal understanding of what “safety” meant, and it seems implausible to me that they could become confident in the system’s behavior without first developing a formal understanding of the intended behavior.
My apologies. I have been fruitlessly engaging with SIAI/MIRI representatives longer than you have been involved in the organization, in the hope of seing sponsored work on what I see to far more useful lines of research given the time constraints we are all working with, e.g. work on AI boxing instead of utility functions and tiling agents.
I started by showing how many of the standard arguments extracted from the sequences used in favor of MIRI’s approach of direct FAI are fallacious, or at least presented unconvincingly. This didn’t work out very well for either side; in retrospect I think we mostly talked past each other.
I then argued based on timelines, showing that based on available tech and information and the weak inside view that UFAI could be as close as 5-20 years away, and MIRI’s own timelines did not, and still does not to my knowledge expect practical results in that short a time horizon. The response was a citation to Stuart Armstrong’s paper showing an average expert opinon of AI being 50-70 years away… which was stunning considering the thesis of the paper was about just how bad it is to ask experts about questions like the ETA for human-level AI.
I then asked MIRI to consider hiring a project manager, i.e. a professional whose job it is to keep projects on time and on budget, to help make these decisions in coordinating and guiding research efforts. This suggestion was received about as well as the others.
And then I saw an updated course list from the machine intelligence research institute which dropped from consideration the sixty year history of people actually writing machine intelligences, both the theory and the practice that has accumulated.
So if it seemed a bit like I was trying to argue against other people’s points through you, I’m sorry I guess I was. I was arguing with MIRI, which you now represent.
Regarding your example, I understand what you are saying but I don’t think you are arguing against me. One way of making sure something is safe is making it unable to take actions with irreversible consequences. You don’t need to be confident in a system’s behavior, if that system’s behavior has no effectors on the real world.
I’m all for a trustless AI running as a “physical operating system” for a positive universe. But we have time to figure out how to do that post-singularity.
Thanks—I don’t really want to get into involved arguments about overall strategy on this forum, but I can address each of your points in turn. I’m afraid I only have time to sketch my justifications rather than explain them in detail, as I’m quite pressed for time these days.
I understand that these probably won’t convince you, but I hope to at least demonstrate that I have been giving these sorts of things some thought.
My conclusions:
Most types of boxing research would be premature, given how little we know about what early AGI architectures will look like.
That said, there is some boxing work that could be done nowadays. We do touch upon a lot of this stuff (or things in nearby spaces) under the “Corrigibility” banner. (See also Stuart’s “low impact” posts.)
Furthermore, some of our decision theory research does include early work on boxing problems (e.g., how do you build an oracle that does not try to manipulate the programmers into giving it easier questions? Turns out this has a lot to do with how the oracle evaluates its decisions.)
I agree that there is more work that could be done on boxing that would be positive value, but I expect it would be more speculative than, say, the tiling work.
My thoughts: “could” is ambiguous here. What probability do you put on AGI in 5 years? My personal 95% confidence interval is 5 to 150 years (including outside view, model uncertainty, etc) with a mean around 50 years and skewed a bit towards the front, and I am certainly not shooting for a strategy that has us lose 50% of the time, so I agree that we damn well better be on a 20-30 year track.
I think MIRI made the right choice here. There are only three full-time FAI researchers at MIRI right now, and we’re good at coordinating with each other and holding ourselves to deadlines. A project manager would be drastic overkill.
To be clear, this is no longer a course list, it’s a research guide. The fact of the matter is, modern narrow AI is not necessary to understand our active research. Is it still useful for a greater context? Yes. Is there FAI work that would depend upon modern narrow AI research? Of course! But the subject simply isn’t a prerequisite for understanding our current active research.
I do understand how this omission galled you, though. Apologies for that.
I’m not sure what this means or how it’s safe. (I wouldn’t, for example, be comfortable constructing a machine that forcibly wireheads me, just because it believes it can reverse the process.)
I think that there’s something useful in the space of “low impact” / “domesticity,” but suggestions like “just make everything reversible” don’t seem to engage with the difficulty of the problem.
I also don’t understand what it meas to have “no effectors in the real world.”
There is no such thing as “not effecting the world.” Running a processor has electrical and gravitational effects on everything in the area. Wifi exists. This paper describes how an evolved clock re-purposed the circuits on its motherboard to magnify signals from nearby laptops to use as an oscillator, stunning everyone involved. Weak genetic programming algorithms ended up using hardware in completely unexpected ways to do things the programmers expected was impossible. So yes, we really do need to worry about strong intelligent processes using their hardware in unanticipated ways in order to affect the world.
See also the power of intelligence.
(Furthermore, anything that interacts with humans is essentially using humans as effectors. There’s no such thing as “not having effectors on the real world.”)
I agree that there’s something interesting in the space of “just build AIs that don’t care about the real world” (for example, constructing an AI which is only trying to affect the platonic output of its turing machine and does not care about its physical implementation), but even this has potential security holes if you look closely. Some of our decision theory work does touch upon this sort of possibility, but there’s definitely other work to be done in this space that we aren’t looking at.
But again, suggestions like “just don’t let it have effectors” fail to engage with the difficulty of the problem.