Luke: I appreciate your transparency and clear communication regarding SingInst.
The main reason that I remain reluctant to donate to SingInst is that I find your answer (and the answers of other SingInst affiliates who I’ve talked with) to the question about Friendly AI subproblems to be unsatisfactory. Based on what I know at present, subproblems of the type that you mention are way too vague for it to be possible for even the best researchers to make progress on them.
My general impression is that the SingInst staff have insufficient exposure to technical research to understand how hard it is to answer questions posed at such a level of generality. I’m largely in agreement with Vladimir M’s comments on this thread.
Now, it may well be possible to further subdivide and sharpen the subproblems at hand to the point where they’re well defined enough to answer, but the fact that you seem unaware of how crucial this is is enough to make me seriously doubt SingInst’s ability to make progress on these problems.
I’m glad to see that you place high priority on talking to good researchers, but I think that the main benefit that will derive from doing so (aside from increasing awareness of AI risk) will be to shift SingInst staff member’s beliefs in the direction of the Friendly AI problem being intractable.
I find your answer… to the question about Friendly AI subproblems to be unsatisfactory. Based on what I know at present, subproblemz of the type that you mention are way too vague for it to be possible for even the best researchers to make progress on them.
No doubt, a one-paragraph list of sub-problems written in English is “unsatisfactory.” That’s why we would “really like to write up explanations of these problems in all their technical detail.”
But it’s not true that the problems are too vague to make progress on them. For example, with regard to the sub-problem of designing an agent architecture capable of having preferences over the external world, recent papers by (SI research associate) Daniel Dewey, Orseau & Ring, and Hibbard each constitute progress.
My general impression is that the SingInst staff have insufficient exposure to technical research to understand how hard it is to answer questions posed at such a level of generality.
I doubt this is a problem. We are quite familiar with technical research, and we know how hard it is for, in my usual example of what needs to be done to solve many of the FAI sub-problems, “Claude Shannon to just invent information theory almost out of nothing.”
In fact, here is a paragraph I wrote months ago for a (not yet released) document called Open Problems in Friendly Artificial Intelligence:
Richard Bellman may have been right that “the very construction of a precise mathematical statement of a verbal problem is itself a problem of major difficulty” (Bellman 1961). Some of the problems in this document have not yet been stated with mathematical precision, and the need for a precise statement of the problem is part of each open problem. But there is reason for optimism. Many times, particular heroes have managed to formalize a previously fuzzy and mysterious concept: see Kolmogorov on complexity and simplicity (Kolmogorov 1965; Li & Vitányi 2008), Solomonoff on induction (Solomonoff 1964a, 1964b; Rathmanner & Hutter 2011), Von Neumann and Morgenstern on rationality (Von Neumann & Morgenstern 1944; Anand 1995), and Shannon on information (Shannon 1948; Arndt 2004).
Also, I regularly say that “Friendly AI might be an incoherent idea, and impossible.” But as Nesov said, “Believing problem intractable isn’t a step towards solving the problem.” Many now-solved problems once looked impossible. But anyway, this is one reason to pursue research in both Friendly AI and on “maxipok” solutions that maximize the chance of an “ok” outcome, like Oracle AI.
I’m glad to see that you place high priority on talking to good researchers, but I think that the main benefit that will derive from doing so (aside from increasing awareness of AI risk) will be to shift SingInst’ staff member’s beliefs in the direction of the Friendly AI problem being intractable.
Believing problem intractable isn’t a step towards solving the problem. It might be correct to downgrade your confidence in a problem being solvable, but isn’t in itself a useful thing if the goal remains motivated. It mostly serves as an indication of epistemic rationality, if indeed the problem is less tractable than believed, or perhaps it could be a useful strategic consideration. Noticing that the current approach is worse than an alternative (i.e. open problems are harder to communicate than expected, but what’s the better alternative that makes it possible to use this piece of better understanding?), or noticing a particular error in present beliefs, is much more useful.
Believing problem intractable isn’t a step towards solving the problem. It might be correct to downgrade your confidence in a problem being solvable, but isn’t in itself a useful thing if the goal remains motivated.
I agree, but it may be appropriate to be more modest in aim (e.g. by pushing for neuromorphic AI with some built-in safety precautions even if achieving this outcome is much less valuable than creating a Friendly AI would be).
e.g. by pushing for neuromorphic AI with some built-in safety precautions even if achieving this outcome is much less valuable than creating a Friendly AI would be
I believe it won’t be “less valuable”, but instead would directly cause existential catastrophe, if successful. Feasibility of solving FAI doesn’t enter into this judgment.
I believe it won’t be “less valuable”, but instead would directly cause existential catastrophe, if successful.
I meant in expected value.
As Anna mentioned in one of her Google AGI talks there’s the possibility of an AGI being willing to trade with humans to avoid a small probabity of being destroyed by humans (though I concede that it’s not at all clear how one would create an enforceable agreement). Also a neuromorphic AI could be not so far from a WBE. Do you think that whole brain emulation would directly cause existential catastrophe?
I believe it won’t be “less valuable”, but instead would directly cause existential catastrophe, if successful.
I meant in expected value.
Huh? I didn’t mean opportunity cost, but simply that successful neuromorphic AI destroys the world. Staging a global catastrophe does have lower expected value than protecting from global catastrophe (with whatever probabilities), but also lower expected value than watching TV.
Do you think that whole brain emulation would directly cause existential catastrophe?
Indirectly, but with influence that compresses expected time-to-catastrophe after the tech starts working from decades-centuries to years (decades if WBE tech comes early and only slow or few uploads can be supported initially). It’s not all lost at that point, since WBEs could do some FAI research, and would be in a better position to actually implement a FAI and think longer about it, but ease of producing an UFAI would go way up (directly, by physically faster research of AGI, or by experimenting with variations on human brains or optimization processes built out of WBEs).
The main thing that distinguishes WBEs is that they are still initially human, still have same values. All other tech breaks values, and giving it power makes humane values lose the world.
Huh? I didn’t mean opportunity cost, but simply that successful neuromorphic AI destroys the world. Staging a global catastrophe does have lower expected value than protecting from global catastrophe (with whatever probabilities), but also lower expected value than watching TV.
I was saying that it could be that with more information we would find that
0 < EU(Friendly AI research) < EU(Pushing for relatively safe neuromorphic AI) < EU(Successful construction of a Friendly AI).
even if there’s a high chance that relatively safe neuromorphic AI would cause global catastrophe and carry no positive benefits. This could be the case if Friendly AI research sufficiently hard. I think that given the current uncertainty about the difficulty of friendly AI research would have to be extremely confident that relatively safe neuromorphic AI that would cause global catastrophe to rule this possibility out.
Indirectly, but with influence that compresses expected time-to-catastrophe after the tech starts working from decades-centuries to years (decades if WBE tech comes early and only slow or few uploads can be supported initially). It’s not all lost at that point, since WBEs could do some FAI research, and would be in a better position to actually implement a FAI and think longer about it, but ease of producing an UFAI would go way up (directly, by physically faster research of AGI, or by experimenting with variations on human brains or optimization processes built out of WBEs).
Agree with this
The main thing that distinguishes WBEs is that they are still initially human, still have same values. All other tech breaks values, and giving it power makes humane values lose the world.
I think that I’d rather have an uploaded crow brain have its computational power and memory substantially increased and then go FOOM than have an arbitrary powerful optimization process; just because a neuromorphic AI wouldn’t have values that are precisely human doesn’t mean it wouldn’t be totally devoid of value from our point of view.
I think that I’d rather have an uploaded crow brain have its computational power and memory substantially increased and then go FOOM than have an arbitrary powerful optimization process; just because a neuromorphic AI wouldn’t have values that are precisely human doesn’t mean it would be totally devoid of value from our point of view.
I expect it would; even a human whose brain was meddled with to make it more intelligent is probably a very bad idea, unless this modified human builds a modified-human-Friendly-AI (in which case some value drift would probably be worth protection from existential risk) or, even better, a useful FAI theory elicited Oracle AI-style. The crucial question here is the character of FOOMing, how much of initial value is retained.
Luke: I appreciate your transparency and clear communication regarding SingInst.
The main reason that I remain reluctant to donate to SingInst is that I find your answer (and the answers of other SingInst affiliates who I’ve talked with) to the question about Friendly AI subproblems to be unsatisfactory. Based on what I know at present, subproblems of the type that you mention are way too vague for it to be possible for even the best researchers to make progress on them.
My general impression is that the SingInst staff have insufficient exposure to technical research to understand how hard it is to answer questions posed at such a level of generality. I’m largely in agreement with Vladimir M’s comments on this thread.
Now, it may well be possible to further subdivide and sharpen the subproblems at hand to the point where they’re well defined enough to answer, but the fact that you seem unaware of how crucial this is is enough to make me seriously doubt SingInst’s ability to make progress on these problems.
I’m glad to see that you place high priority on talking to good researchers, but I think that the main benefit that will derive from doing so (aside from increasing awareness of AI risk) will be to shift SingInst staff member’s beliefs in the direction of the Friendly AI problem being intractable.
No doubt, a one-paragraph list of sub-problems written in English is “unsatisfactory.” That’s why we would “really like to write up explanations of these problems in all their technical detail.”
But it’s not true that the problems are too vague to make progress on them. For example, with regard to the sub-problem of designing an agent architecture capable of having preferences over the external world, recent papers by (SI research associate) Daniel Dewey, Orseau & Ring, and Hibbard each constitute progress.
I doubt this is a problem. We are quite familiar with technical research, and we know how hard it is for, in my usual example of what needs to be done to solve many of the FAI sub-problems, “Claude Shannon to just invent information theory almost out of nothing.”
In fact, here is a paragraph I wrote months ago for a (not yet released) document called Open Problems in Friendly Artificial Intelligence:
Also, I regularly say that “Friendly AI might be an incoherent idea, and impossible.” But as Nesov said, “Believing problem intractable isn’t a step towards solving the problem.” Many now-solved problems once looked impossible. But anyway, this is one reason to pursue research in both Friendly AI and on “maxipok” solutions that maximize the chance of an “ok” outcome, like Oracle AI.
Believing problem intractable isn’t a step towards solving the problem. It might be correct to downgrade your confidence in a problem being solvable, but isn’t in itself a useful thing if the goal remains motivated. It mostly serves as an indication of epistemic rationality, if indeed the problem is less tractable than believed, or perhaps it could be a useful strategic consideration. Noticing that the current approach is worse than an alternative (i.e. open problems are harder to communicate than expected, but what’s the better alternative that makes it possible to use this piece of better understanding?), or noticing a particular error in present beliefs, is much more useful.
I agree, but it may be appropriate to be more modest in aim (e.g. by pushing for neuromorphic AI with some built-in safety precautions even if achieving this outcome is much less valuable than creating a Friendly AI would be).
I believe it won’t be “less valuable”, but instead would directly cause existential catastrophe, if successful. Feasibility of solving FAI doesn’t enter into this judgment.
I meant in expected value.
As Anna mentioned in one of her Google AGI talks there’s the possibility of an AGI being willing to trade with humans to avoid a small probabity of being destroyed by humans (though I concede that it’s not at all clear how one would create an enforceable agreement). Also a neuromorphic AI could be not so far from a WBE. Do you think that whole brain emulation would directly cause existential catastrophe?
Huh? I didn’t mean opportunity cost, but simply that successful neuromorphic AI destroys the world. Staging a global catastrophe does have lower expected value than protecting from global catastrophe (with whatever probabilities), but also lower expected value than watching TV.
Indirectly, but with influence that compresses expected time-to-catastrophe after the tech starts working from decades-centuries to years (decades if WBE tech comes early and only slow or few uploads can be supported initially). It’s not all lost at that point, since WBEs could do some FAI research, and would be in a better position to actually implement a FAI and think longer about it, but ease of producing an UFAI would go way up (directly, by physically faster research of AGI, or by experimenting with variations on human brains or optimization processes built out of WBEs).
The main thing that distinguishes WBEs is that they are still initially human, still have same values. All other tech breaks values, and giving it power makes humane values lose the world.
I was saying that it could be that with more information we would find that
0 < EU(Friendly AI research) < EU(Pushing for relatively safe neuromorphic AI) < EU(Successful construction of a Friendly AI).
even if there’s a high chance that relatively safe neuromorphic AI would cause global catastrophe and carry no positive benefits. This could be the case if Friendly AI research sufficiently hard. I think that given the current uncertainty about the difficulty of friendly AI research would have to be extremely confident that relatively safe neuromorphic AI that would cause global catastrophe to rule this possibility out.
Agree with this
I think that I’d rather have an uploaded crow brain have its computational power and memory substantially increased and then go FOOM than have an arbitrary powerful optimization process; just because a neuromorphic AI wouldn’t have values that are precisely human doesn’t mean it wouldn’t be totally devoid of value from our point of view.
I expect it would; even a human whose brain was meddled with to make it more intelligent is probably a very bad idea, unless this modified human builds a modified-human-Friendly-AI (in which case some value drift would probably be worth protection from existential risk) or, even better, a useful FAI theory elicited Oracle AI-style. The crucial question here is the character of FOOMing, how much of initial value is retained.