It’s strange that people say the arguments for Big Scary Idea are not written anywhere. The argument seems to be simple and direct:
Hard takeoff will make AI god-powerful very quickly.
During hard takeoff, the AI’s utility=goals=values=what-it-optimizes-for will solidify (when AI understand its own theory and self-modify correspondingly), and even if it was changeable before, it will be unchangeable forever since.
Unless the AI goals embody every single value important for humans and are otherwise just right in every respect, the results of using god powers to optimize for these goals will be horrible.
Human values are not a natural category, there’s little to no chance that AI will converge on them by itself, unless specifically and precisely programmed.
The only really speculative step is step 1. But if you already believe in singularity and hard foom, then the argument should be unrefutable...
Arguments for step 2, e.g. the Omohundroan Ghandi folk theorem, are questionable. Step 3 isn’t supported with impressive technical arguments anywhere I know of, step 4 isn’t supported with impressive technical arguments anywhere I know of. Remember, there are a lot of moral realists out there who think of AIs as people who will sense and feel compelled by moral law. It’s hard to make impressive technical arguments against that intuition. FOOM=doom and FOOM=yay folk can both point out a lot of facts about the world and draw analogies, but as far as impressive technical arguments go there’s not much that can be done, largely because we have never built an AGI. It’s a matter of moral philosophy, an inherently tricky subject.
I don’t understand how Omohundroan Ghandi folk theorem is related to step 2. Could you elaborate? Step 2 looks obvious to me: assuming step 1, at some point the AI with imprecise and drifting utility would understand how to build a better AI with precise and fixed utility. Since building this better AI will maximize the current AI utility, the better AI will be built and its utility forever solidified.
As you say, steps 3 and 4 are currently hard to support with technical arguments, there are so many non-technical concepts involved. And it may be hard to argue intuitively with most people. But Goertzel is a programmer, he should know how programs behave :) Of course, he says his program will be intelligent, not stupid, and it is a good idea, as long as it is remembered that intelligent in this sense already means friendly, and friendliness does not follow from just being a powerful optimization process.
Also, thinking of AIs as people can only work up to the point where AI achieves complete self-understanding. This has never happened to humans.
But Goertzel is a programmer, he should know how programs behave :) Of course, he says his program will be intelligent, not stupid, and it is a good idea, as long as it is remembered that intelligent in this sense already means friendly, and friendliness does not follow from just being a powerful optimization process.
Hm, when I try to emulate Goertzel’s perspective I think about it this way: if you look at brains, they seem to be a bunch of machine learning algorithms and domain-specific modules largely engineered to solve tricky game theory problems. Love isn’t something that humans do despite game theory; love is game theory. And yet despite that it seems that brains end up doing lots of weird things like deciding to become a hermit or paint or compose or whatever. That’s sort of weird; if you’d asked me what chimps would evolve into when they became generally intelligent, and I hadn’t already seen humans or humanity, then I might’ve guessed that they’d evolve to develop efficient mating strategies, e.g. arranged marriage, and efficient forms of dominance contests, e.g. boxing with gloves, that don’t look at all like the memetic heights of academia or the art scene. Much of academia is just social maneuvering, but the very smartest humans don’t actually seem to be motivated by status displays; it seems that abstract memes have taken over the machine learning algorithms just by virtue of their being out there in Platospace, and that’s actually pretty weird and perhaps unexpected.
So yes, Goertzel is a programmer and should know how programs behave, but human minds look like they’re made of programs, and yet they ended up somewhat Friendly (or cosmically connected or whatever) despite that. Now the typical counter is AIXI: okay, maybe hacked-together machine learning algorithms will reliably stumble onto and adopt cosmic abstract concepts, but it sure doesn’t look like AIXI would. Goertzel’s counter to that is, of course, that AIXI is unproven, and that if you built an approximation of it then you’d have to use brain-like machine learning algorithms, which are liable to get distracted by abstract concepts. It might not be possible to get past the point where you’re distracted by abstract concepts, and once they’re in your mind (e.g. as problem representations, as subgoals, as whatever they are in human minds), you don’t want to abandon them, even if you gain complete self-understanding. (There are various other paths that argument could take, but they all can plausibly lead to roughly the same place.)
I think that taking the soundness of such analogical arguments for granted would be incautious, and that’s why I tend to promote the SingInst perspective around folk who aren’t aware of it, but despite being pragmatically incautious they’re not obviously epistemicly unsound, and I can easily see how someone could feel it was intuitively obvious that they were epistemicly sound. I think the biggest problem with that set of arguments is that they seem to unjustifiably discount the possibility of very small, very recursive seed AIs that can evolve to superintelligence very, very quickly; which are the same AIs that would get to superintelligence first in a race scenario. There are various reasons to be skeptical that such architectures will work, but even so it seems rather incautious to ignore them, and I feel like Goertzel is perhaps ignoring them, perhaps because he’s not familiar with those kinds of AI architectures.
So yes, Goertzel is a programmer and should know how programs behave, but human minds look like they’re made of programs, and yet they ended up somewhat Friendly (or cosmically connected or whatever) despite that.
That humans are only (as you flatteringly put it) “somewhat” friendly to human values is clearly an argument in favor of caution, is it not?
It is, but it’s possible to argue somewhat convincingly that the lack of friendliness is in fact due to lack of intelligence. My favorite counterexample was Von Neumann, who didn’t really seem to care much about anyone, but then I heard that he actually had somewhat complex political views but simplified them for consumption by the masses. On the whole it seems that intelligent folk really are significantly more moral than the majority of humanity, and this favors the “intelligence implies, or is the same thing as, cosmic goodness” perspective. This sort of argument is also very psychologically appealing to Enlightenment-influenced thinkers, i.e. most modern intellectuals, e.g. young Eliezer.
(Mildly buzzed, apologies for errors.)
(ETA: In case it isn’t clear, I’m not arguing that such a perspective is a good one to adopt, I’m just trying to explain how one could feel justified in holding it as a default perspective and feel justified in being skeptical of intuitive non-technical arguments against it. I think constructing such explanations is necessary if one is to feel justified in disagreeing with one’s opposition, for the same reason that you shouldn’t make a move in chess until you’ve looked at what moves your opponent is likely to play in response, and then what move you could make in that case, and what moves they might make in response to that, and so on.)
On the whole it seems that intelligent folk really are significantly more moral than the majority of humanity, and this favors the “intelligence implies, or is the same thing as, cosmic goodness” perspective.
I think there are a number of reasons to be skeptical of the premise (and the implicit one about cosmic goodness being a coherent thing, but that’s obviously covered territory.) Most people think their tribe seems more moral than others, so nerd impressions that nerds are particularly moral should be discounted. The people who are most interested in intellectual topics (i.e., the most obviously intelligent intelligent people) do often appear to be the least interested in worldly ambition/aggressive generally, but we would expect that just as a matter of preferences crowding each other out; worldly ambitious intelligent people seem to be among the most conspicuously amoral, even though you’d expect them to be the most well-equipped in means and motive to look otherwise. I recall Robin Hanson has referenced studies (which I’m too lazy to look up) that the intelligent lie and cheat more often; certainly this could be explained by an opportunity effect, but so could their presumedly lower levels of personal violence. Humans are friendlier than chimpanzees but less friendly than bonobos, and across the tree of life niceness and nastiness don’t seem to have any relationship to computational power.
worldly ambitious intelligent people seem to be among the most conspicuously amoral
That’s true and important, but stereotypical worldly intelligent people rarely “grave new values on new tables”, and so might be much less intelligent than your Rousseaus and Hammurabis in the sense that they affect the cosmos less overall. Even worldly big shots like Stalin and Genghis rarely establish any significant ideological foothold. The memes use them like empty vessels.
But even so, the omnipresent you-claim-might-makes-right counterarguments remain uncontested. Hard to contest them.
Humans are friendlier than chimpanzees but less friendly than bonobos, and across the tree of life niceness and nastiness don’t seem to have any relationship to computational power.
It’s hard to tell how relevant this is; there’s much discontinuity between chimps and humans and much variance among humans. (Although it’s not that important, I’m skeptical of claims about bonobos; there were some premature sensationalist claims and then some counter-claims, and it all seemed annoyingly politicized.)
That’s true and important, but stereotypical worldly intelligent people rarely “grave new values on new tables”, and so might be much less intelligent than your Rousseaus and Hammurabis in the sense that they affect the cosmos less overall.
However, non-worldly intelligent people like Rousseau and Marx frequently give the new values that make people like Robespierre and Stalin possible.
In the public mind Rousseau and Marx and their intellectual progeny are generally seen as cosmically connected/intelligent/progressive, right? Maybe overzealous, but their hearts were in the right place. If so that would support the intelligence=goodness claim. If the Enlightenment is good by the lights of the public, then the uFAI-Antichrist is good by the lights of the public. [Removed section supporting this claim.] And who are we to disagree with the dead, the sheep and the shepherds?
(ETA: Contrarian terminology aside, the claim looks absurd without its supporting arguments… ugh.)
I would say that it is simply the case that many moral systems require intelligence, or are more effective with intelligence. The intelligence doesn’t lead to morality per se, but does lead to ability to practically apply the morality. Furthermore, low intelligence usually implies lower tendency to cross-link the beliefs, resulting in less, hmm, morally coherent behaviour.
The people who are most interested in intellectual topics (i.e., the most obviously intelligent intelligent people) do often appear to be the least interested in worldly ambition/aggressive generally, but we would expect that just as a matter of preferences crowding each other out; worldly ambitious intelligent people seem to be among the most conspicuously amoral, even though you’d expect them to be the most well-equipped in means and motive to look otherwise.
Fuck, wrote a response but lost it. The gist was, yeah, your points are valid, and the might-makes-right problems are pretty hard to get around even on the object level; I see an interesting way to defensibly move the goalposts, but the argument can’t be discussed on LessWrong and I should think about it more carefully in any case.
On the whole it seems that intelligent folk really are significantly more moral than the majority of humanity
That’s been my observation, also. But if it’s true, I wonder why?
It could be because intelligence is useful for moral reasoning. Or it could be because intelligence is correlated with some temperamental, neurological, or personality traits that influence moral behavior. In the latter case, moral behavior would be a characteristic of the substrate of intelligent human minds.
So you’re saying Goertzel believes that once any mind with sufficient intelligence and generally unfixed goals encounters certain abstract concepts, these concepts will hijack the cognitive architecture and rewrite its goals, with results equivalent for any reasonable initial mind design.
And the only evidence for this is that it happened once.
This does look a little obviously epistemically unsound.
Just an off-the-cuff not-very-detailed hypothesis about what he believes.
with results equivalent for any reasonable initial mind design
Or at least any mind design that looks even vaguely person-like, e.g. uses clever Bayesian machine learning algorithms found by computational cognitive scientists; but I think Ben might be unknowingly ignoring certain architectures that are “reasonable” in a certain sense but do not look vaguely person-like.
And the only evidence for this is that it happened once.
Yes, but an embarrassingly naive application of Laplace’s rule gives us a two-thirds probability it’ll happen again.
This does look a little obviously epistemically unsound.
Eh, it looks pretty pragmatically incautious, but if you’re forced to give a point estimate then it seems epistemicly justifiable. If it was taken to imply strong confidence then that would indeed be unsound.
(By the way, we seem to disagree re “epistemicly” versus “epistemically”; is “-icly” a rare or incorrect construction?)
vaguely person-like, e.g. uses clever Bayesian machine learning algorithms found by computational cognitive scientists
:)
Yes, but an embarrassingly naive application of Laplace’s rule gives us a two-thirds probability it’ll happen again.
:))
(By the way, we seem to disagree re “epistemicly” versus “epistemically”; is “-icly” a rare or incorrect construction?)
It sounds prosodically(sic!) awkward, although since English is not my mother tongue, my intuition is probably not worth much. But google appears to agree with me, 500000 vs 500 hits.
Well, if he believes his AI will be specifically and precisely programmed so as to converge on exactly the right goals before they are solidified in the hard takeoff, then he’s working on a FAI. The remaining difference in opinions would be technical—about whether his AI will indeed converge, etc. It would not be about the Scary Idea itself.
I think it’s taken by Goertzel as part of the Scary Idea that it’s necessary to use several orders more precise understanding of AI’s goals for its behavior not to be disastrous.
It’s a direct logical consequence, isn’t it? If one doesn’t have a precise understanding of AI’s goals, then whatever goals one imparts into AI won’t be precise. And they must be precise, or (step3) ⇒ disaster.
He can’t think that god-powerfully optimizing for a forever-fixed not-precisely-correct goal would lead to anything but disaster. Not if he ever saw a non-human optimization process at work.
So he can only think precision is not important if he believes that (1) human values are an attractor in the goal space, and any reasonably close goals would converge there before solidifying, and/or (2) acceptable human values form a large convex region within the goal space, and optimizing for any point within this region is correct.
Without better understanding of AI goals, both can only be an article of faith...
4) Human values are not a natural category, there’s little to no chance that AI will converge on them by itself, unless specifically and precisely programmed.
Goertzel expressed doubt about step 4, saying that while it’s true that random AIs will have bad goals, he’s not working on random AIs.
That’s not really the same as asserting that human values are a natural category.
It’s strange that people say the arguments for Big Scary Idea are not written anywhere. The argument seems to be simple and direct:
Hard takeoff will make AI god-powerful very quickly.
During hard takeoff, the AI’s utility=goals=values=what-it-optimizes-for will solidify (when AI understand its own theory and self-modify correspondingly), and even if it was changeable before, it will be unchangeable forever since.
Unless the AI goals embody every single value important for humans and are otherwise just right in every respect, the results of using god powers to optimize for these goals will be horrible.
Human values are not a natural category, there’s little to no chance that AI will converge on them by itself, unless specifically and precisely programmed.
The only really speculative step is step 1. But if you already believe in singularity and hard foom, then the argument should be unrefutable...
Arguments for step 2, e.g. the Omohundroan Ghandi folk theorem, are questionable. Step 3 isn’t supported with impressive technical arguments anywhere I know of, step 4 isn’t supported with impressive technical arguments anywhere I know of. Remember, there are a lot of moral realists out there who think of AIs as people who will sense and feel compelled by moral law. It’s hard to make impressive technical arguments against that intuition. FOOM=doom and FOOM=yay folk can both point out a lot of facts about the world and draw analogies, but as far as impressive technical arguments go there’s not much that can be done, largely because we have never built an AGI. It’s a matter of moral philosophy, an inherently tricky subject.
I don’t understand how Omohundroan Ghandi folk theorem is related to step 2. Could you elaborate? Step 2 looks obvious to me: assuming step 1, at some point the AI with imprecise and drifting utility would understand how to build a better AI with precise and fixed utility. Since building this better AI will maximize the current AI utility, the better AI will be built and its utility forever solidified.
As you say, steps 3 and 4 are currently hard to support with technical arguments, there are so many non-technical concepts involved. And it may be hard to argue intuitively with most people. But Goertzel is a programmer, he should know how programs behave :) Of course, he says his program will be intelligent, not stupid, and it is a good idea, as long as it is remembered that intelligent in this sense already means friendly, and friendliness does not follow from just being a powerful optimization process.
Also, thinking of AIs as people can only work up to the point where AI achieves complete self-understanding. This has never happened to humans.
Hm, when I try to emulate Goertzel’s perspective I think about it this way: if you look at brains, they seem to be a bunch of machine learning algorithms and domain-specific modules largely engineered to solve tricky game theory problems. Love isn’t something that humans do despite game theory; love is game theory. And yet despite that it seems that brains end up doing lots of weird things like deciding to become a hermit or paint or compose or whatever. That’s sort of weird; if you’d asked me what chimps would evolve into when they became generally intelligent, and I hadn’t already seen humans or humanity, then I might’ve guessed that they’d evolve to develop efficient mating strategies, e.g. arranged marriage, and efficient forms of dominance contests, e.g. boxing with gloves, that don’t look at all like the memetic heights of academia or the art scene. Much of academia is just social maneuvering, but the very smartest humans don’t actually seem to be motivated by status displays; it seems that abstract memes have taken over the machine learning algorithms just by virtue of their being out there in Platospace, and that’s actually pretty weird and perhaps unexpected.
So yes, Goertzel is a programmer and should know how programs behave, but human minds look like they’re made of programs, and yet they ended up somewhat Friendly (or cosmically connected or whatever) despite that. Now the typical counter is AIXI: okay, maybe hacked-together machine learning algorithms will reliably stumble onto and adopt cosmic abstract concepts, but it sure doesn’t look like AIXI would. Goertzel’s counter to that is, of course, that AIXI is unproven, and that if you built an approximation of it then you’d have to use brain-like machine learning algorithms, which are liable to get distracted by abstract concepts. It might not be possible to get past the point where you’re distracted by abstract concepts, and once they’re in your mind (e.g. as problem representations, as subgoals, as whatever they are in human minds), you don’t want to abandon them, even if you gain complete self-understanding. (There are various other paths that argument could take, but they all can plausibly lead to roughly the same place.)
I think that taking the soundness of such analogical arguments for granted would be incautious, and that’s why I tend to promote the SingInst perspective around folk who aren’t aware of it, but despite being pragmatically incautious they’re not obviously epistemicly unsound, and I can easily see how someone could feel it was intuitively obvious that they were epistemicly sound. I think the biggest problem with that set of arguments is that they seem to unjustifiably discount the possibility of very small, very recursive seed AIs that can evolve to superintelligence very, very quickly; which are the same AIs that would get to superintelligence first in a race scenario. There are various reasons to be skeptical that such architectures will work, but even so it seems rather incautious to ignore them, and I feel like Goertzel is perhaps ignoring them, perhaps because he’s not familiar with those kinds of AI architectures.
That humans are only (as you flatteringly put it) “somewhat” friendly to human values is clearly an argument in favor of caution, is it not?
It is, but it’s possible to argue somewhat convincingly that the lack of friendliness is in fact due to lack of intelligence. My favorite counterexample was Von Neumann, who didn’t really seem to care much about anyone, but then I heard that he actually had somewhat complex political views but simplified them for consumption by the masses. On the whole it seems that intelligent folk really are significantly more moral than the majority of humanity, and this favors the “intelligence implies, or is the same thing as, cosmic goodness” perspective. This sort of argument is also very psychologically appealing to Enlightenment-influenced thinkers, i.e. most modern intellectuals, e.g. young Eliezer.
(Mildly buzzed, apologies for errors.)
(ETA: In case it isn’t clear, I’m not arguing that such a perspective is a good one to adopt, I’m just trying to explain how one could feel justified in holding it as a default perspective and feel justified in being skeptical of intuitive non-technical arguments against it. I think constructing such explanations is necessary if one is to feel justified in disagreeing with one’s opposition, for the same reason that you shouldn’t make a move in chess until you’ve looked at what moves your opponent is likely to play in response, and then what move you could make in that case, and what moves they might make in response to that, and so on.)
I think there are a number of reasons to be skeptical of the premise (and the implicit one about cosmic goodness being a coherent thing, but that’s obviously covered territory.) Most people think their tribe seems more moral than others, so nerd impressions that nerds are particularly moral should be discounted. The people who are most interested in intellectual topics (i.e., the most obviously intelligent intelligent people) do often appear to be the least interested in worldly ambition/aggressive generally, but we would expect that just as a matter of preferences crowding each other out; worldly ambitious intelligent people seem to be among the most conspicuously amoral, even though you’d expect them to be the most well-equipped in means and motive to look otherwise. I recall Robin Hanson has referenced studies (which I’m too lazy to look up) that the intelligent lie and cheat more often; certainly this could be explained by an opportunity effect, but so could their presumedly lower levels of personal violence. Humans are friendlier than chimpanzees but less friendly than bonobos, and across the tree of life niceness and nastiness don’t seem to have any relationship to computational power.
That’s true and important, but stereotypical worldly intelligent people rarely “grave new values on new tables”, and so might be much less intelligent than your Rousseaus and Hammurabis in the sense that they affect the cosmos less overall. Even worldly big shots like Stalin and Genghis rarely establish any significant ideological foothold. The memes use them like empty vessels.
But even so, the omnipresent you-claim-might-makes-right counterarguments remain uncontested. Hard to contest them.
It’s hard to tell how relevant this is; there’s much discontinuity between chimps and humans and much variance among humans. (Although it’s not that important, I’m skeptical of claims about bonobos; there were some premature sensationalist claims and then some counter-claims, and it all seemed annoyingly politicized.)
However, non-worldly intelligent people like Rousseau and Marx frequently give the new values that make people like Robespierre and Stalin possible.
In the public mind Rousseau and Marx and their intellectual progeny are generally seen as cosmically connected/intelligent/progressive, right? Maybe overzealous, but their hearts were in the right place. If so that would support the intelligence=goodness claim. If the Enlightenment is good by the lights of the public, then the uFAI-Antichrist is good by the lights of the public. [Removed section supporting this claim.] And who are we to disagree with the dead, the sheep and the shepherds?
(ETA: Contrarian terminology aside, the claim looks absurd without its supporting arguments… ugh.)
Depends on which subset of the public we’re talking about.
I’m confused, is this an appeal to popular opinion?
Of course. “And all that dwell upon the earth shall worship him [the beast/dragon]” Revelations 13:8
People in a position to witness the practical results of their philosophy.
Why exactly did you remove that section?
I would say that it is simply the case that many moral systems require intelligence, or are more effective with intelligence. The intelligence doesn’t lead to morality per se, but does lead to ability to practically apply the morality. Furthermore, low intelligence usually implies lower tendency to cross-link the beliefs, resulting in less, hmm, morally coherent behaviour.
Ouch, that hits a little close to home.
Fuck, wrote a response but lost it. The gist was, yeah, your points are valid, and the might-makes-right problems are pretty hard to get around even on the object level; I see an interesting way to defensibly move the goalposts, but the argument can’t be discussed on LessWrong and I should think about it more carefully in any case.
That’s been my observation, also. But if it’s true, I wonder why?
It could be because intelligence is useful for moral reasoning. Or it could be because intelligence is correlated with some temperamental, neurological, or personality traits that influence moral behavior. In the latter case, moral behavior would be a characteristic of the substrate of intelligent human minds.
So you’re saying Goertzel believes that once any mind with sufficient intelligence and generally unfixed goals encounters certain abstract concepts, these concepts will hijack the cognitive architecture and rewrite its goals, with results equivalent for any reasonable initial mind design.
And the only evidence for this is that it happened once.
This does look a little obviously epistemically unsound.
Just an off-the-cuff not-very-detailed hypothesis about what he believes.
Or at least any mind design that looks even vaguely person-like, e.g. uses clever Bayesian machine learning algorithms found by computational cognitive scientists; but I think Ben might be unknowingly ignoring certain architectures that are “reasonable” in a certain sense but do not look vaguely person-like.
Yes, but an embarrassingly naive application of Laplace’s rule gives us a two-thirds probability it’ll happen again.
Eh, it looks pretty pragmatically incautious, but if you’re forced to give a point estimate then it seems epistemicly justifiable. If it was taken to imply strong confidence then that would indeed be unsound.
(By the way, we seem to disagree re “epistemicly” versus “epistemically”; is “-icly” a rare or incorrect construction?)
:)
:))
It sounds prosodically(sic!) awkward, although since English is not my mother tongue, my intuition is probably not worth much. But google appears to agree with me, 500000 vs 500 hits.
Goertzel expressed doubt about step 4, saying that while it’s true that random AIs will have bad goals, he’s not working on random AIs.
Well, if he believes his AI will be specifically and precisely programmed so as to converge on exactly the right goals before they are solidified in the hard takeoff, then he’s working on a FAI. The remaining difference in opinions would be technical—about whether his AI will indeed converge, etc. It would not be about the Scary Idea itself.
I think it’s taken by Goertzel as part of the Scary Idea that it’s necessary to use several orders more precise understanding of AI’s goals for its behavior not to be disastrous.
It’s a direct logical consequence, isn’t it? If one doesn’t have a precise understanding of AI’s goals, then whatever goals one imparts into AI won’t be precise. And they must be precise, or (step3) ⇒ disaster.
He doesn’t agree that they must be precise, so I guess step 3 is also out.
He can’t think that god-powerfully optimizing for a forever-fixed not-precisely-correct goal would lead to anything but disaster. Not if he ever saw a non-human optimization process at work.
So he can only think precision is not important if he believes that
(1) human values are an attractor in the goal space, and any reasonably close goals would converge there before solidifying, and/or
(2) acceptable human values form a large convex region within the goal space, and optimizing for any point within this region is correct.
Without better understanding of AI goals, both can only be an article of faith...
From the conversation with Luke, he apparently accepts faith.
That’s not really the same as asserting that human values are a natural category.