Some assertions seem correct but some seem unproven, some are normative instead of descriptive, some are a mix.
For example just looking at some remarks from near the beginning.
This is a very lethal problem,
Compared to what? And if you assert it without bounds to timeframe I can confidently state it‘s certainly more than stubbing your little toe, certainly less than Heat Death.
But without fixing a relative range it seems to not carry much weight at all.
Is it more or less than nuclear war, runaway bioweapons, gamma ray burst, etc., on a 10 year 100 year, 1000 year timeframes, etc. ?
it has to be solved one way or another,
According to who? And why a binary choice? Multi-multi scenarios existing in equilibrium have not been disproven yet,
i.e. where even if all worst eventualities come about, some AI faction may keep humans around in pleasant conditions, some in poor conditions, some in despicable conditions, etc., much like human-dog relations
it has to be solved at a minimum strength and difficulty level instead of various easier modes that some dream about,
You need to disprove, or point to evidence that shows, why all ’easier modes’ proposals are incorrect, which admittedly there are lots. I have not yet seen any that is comprehensive, though it seems like something that is unnecessary to base a strong assertion on anyways.
i.e. if even one such proposal contained useful ideas then dismissing them as a class would seem very silly in retrospect.
we do not have any visible option of ‘everyone’ retreating to only solve safe weak problems instead, and failing on the first really dangerous try is fatal.
Why is it ‘fatal’? And who determines what counts as the ‘first really dangerous try’?
I highly highly doubt there will be anything approaching unanimous consensus on defining either terms on this across the world. On LW maybe, though ‘first really dangerous try’ sounds too wish washy for a decent chunk of the regulars.
Stop worrying about whether or not Eliezer has the “right” to say these things and start worrying about whether or not they’re true. You have the null string as well.
Are you sure your responding to the right post? I explicitly was trying to determine what was true or not. In fact that was about as a straightforward and frank as I could see anyone being without being clearly rude.
Maybe your a bit confused and mixed my post up with another?
Though nobody else in the comments seem to have said anything about ‘the ‘right’ to say these things’?
I’m trying to find a charitable interpretation for why you wrote that but I‘m drawing a blank, it really seems like just you saying that and trying to troll.
Your method of trying to determine whether something is true or not relies overly much on feedback from strangers. Your comment demands large amounts of intellectual labor from others (‘disprove why all easier modes are incorrect’), despite the preamble of the post, while seeming unwilling to put much work in yourself.
Yes, when strong assertions are made, a lot of intellectual labor is expected if evidence is lacking or missing. Plus, I wrote it in mind as being the first comment so it raises a few more points than I think is practical for the 100th comment. The preamble cannot justify points that are justified nowhere else, Or else it would be a simple appeal to authority.
In the vast majority of cases people who understand what they don’t understand hedge their assertions, so since there was a lack of equally strong evidence, or hedging, to support the corresponding claims I was intrigued if they did exist and Elizer simply didn’t link it, which could be for a variety of reasons. That is another factor in why I left it open ended.
It does seem I was correct for some of the points that the strongest evidence is less substantial than what the claims imply.
The other way I could see a reasonable person view it, is if I had read everything credible to do with the topic I wouldn’t have phrased it that way.
Though again that seems a bit far fetched since I highly doubt anyone has read through the preexisting literature completely across the many dozens of topics mentioned here and still remembers every point.
In any case it would have been strange to put a detailed and elaborate critique of a single point in the very first comment where common courtesy is to leave it more open ended for engagement and to allow others to chime in.
Which is why lc’s response seems so bizarre since they don’t even address any of the obvious rebuttals of my post and instead opens with a non-sequiter.
“Lethal” here means “lethal enough to kill every living human”. For example, later in the article Eliezer writes this:
When I say that alignment is difficult, I mean that in practice, using the techniques we actually have, “please don’t disassemble literally everyone with probability roughly 1”
...
According to who?
From context, “has to” means “if we humans don’t solve this problem then we will be killed by an unaligned AI”. There’s no person/authority out there threatening us to solve this problem “or else”, that’s just the way that reality seems to be. If you’re trying to ask why does building a Strong unaligned AI result in everyone being killed, then I suggest reading the posts about orthogonality and instrumental convergence linked at the top of this post.
And why a binary choice?
“One way or another” is an English idiom which you can take to mean “somehow”. It doesn’t necessarily imply a binary choice.
Multi-multi scenarios existing in equilibrium have not been disproven yet,
This is addressed by #34: Just because multiple AI factions can coexist and compromise with each other doesn’t mean that any of those factions will be likely to want to keep humans around. It doesn’t seem likely that any AIs will think humans are cute and likeable in the same way that we think dogs are cute and likeable.
You need to disprove, or point to evidence that shows, why all ’easier modes’ proposals are incorrect
This is mostly addressed in #6 and #7, and the evidence given is that “nobody in this community has successfully named a ‘pivotal weak act’”. You could win this part of the argument by pointing out something that could be done with an AI weak enough not to be a threat which could prevent all the AI research groups out there in the world from building a Strong AI.
Why is it ‘fatal’?
Because we expect a Strong AI that hasn’t been aligned to kill everyone. Once again, see the posts about orthogonality and instrumental convergence.
And who determines what counts as the ‘first really dangerous try’?
I’m not quite sure what you’re asking here? I guess Eliezer determines what he meant by writing those words. I don’t think there’s anyone at any of these AI research groups looking over proposals for models and saying “oh this model is moderately dangerous” or “oh this model is really dangerous, you shouldn’t build it”. I think at most of those groups, they only worry about the cost to train the model rather than how dangerous it will be.
If you were unaware, every example in the parent of other types of ‘lethal’ has the possibility of eliminating all human life. And not in a hand wavey sense either, truly 100%, the same death rate as the worse case AGI outcomes.
Which means that to a knowledgeable reader the wording is unpersuasive since the point is made before it’s been established there’s potential for an even worse outcome than 100% extinction.
This shouldn’t be too hard to do since this topic was regularly discussed on LW… dust specks, and simulated tortures, etc.
Idk why neither you or Eliezer include the obvious supporting points or links to someone who does beforehand, or at least not buried way past the assertion, since it seems you are trying to reinforce his points and Eliezer ostensibly wanted to write a summary to begin with for the non-expert reader.
If there’s a new essay style that I didn’t get the memo about to put the weak arguments at the beginning and stronger ones near the end then I could see why it was written in such a way.
For the rest of your points I see the same mistake of strong assertions without equally strong evidence to back it up.
For example, none of the posts from the regulars I’ve seen on LW assert, without any hedging, that there’s a 100% chance of human extinction due to any arbitrary Strong AI.
I’ve seen a few made that there’s a 100% chance Clippy would do such if Clippy arose first, though even those are somewhat iffy. And definitely none saying there’s a 100% chance Clippy, and only Clippy, would arise and reach an equilibrium end state.
Note that “+50 karma” here doesn’t mean 50 people upvoted the post (at the time you read it), since different votes have different karma weights. E.g., as I write this the post has +165 karma but only 50 total voters. (And 30 total comments.) So when you wrote your comment there were probably only 10-20 upvoters.
Compared to what?
‘More likely than not to kill us in the next 40 years’ seems more than sufficient for treating this as an existential emergency, and AFAIK EY’s actual view is a lot doomier and nearer-term.
even if all worst eventualities come about, some AI faction may keep humans around in pleasant conditions, some in poor conditions, some in despicable conditions, etc., much like human-dog relations
Do you think a paperclip maximizer keeps humans around as pets? If not, is there something that makes paperclip maximizers relatively unlikely or unrepresentative, on your model?
i.e. if even one such proposal contained useful ideas then dismissing them as a class would seem very silly in retrospect.
Which proposal do you think is most promising?
(I think lc’s objection is partly coming from a place of: ‘Your comment says very little about yourviews of anything object-level.’)
Huh I didn’t realize +50 karma could mean as few as 10 people. Thanks, that also seems to explain why I got some downvotes. There were a sudden influx of comments in the hour right after I posted so at least it wasn’t for vain.
40 years is a lot different from 10 years, and he sure isn’t doesn’t doing himself any favours by not clarifying. It also seems like something the community has focused quite a bit of effort on narrowing down, so it seems strange he would elide the point.
Idk if it’s for some deep strategic purpose but it certainly puts any serious reader into a skeptical mood.
On the idea of ‘pets’, Clippy perhaps might, splinter AI factions almost surely would.
On the ‘easy’ proposals I was expecting Eliezer to provide a few examples of the strongest proposals of the class, and then develop a few counter examples and show conclusively why they are too naive, thus credibly dismissing the entire class. Or at least link to someone who does.
I personally don’t think any ‘easy alignment’ proposal is likely, though I also wouldn’t phrase the dismissal of the class so strongly either.
lc’s objection is bizarre if that was his intention, since he phrased his comment in a way that was clearly least applicable to what I wrote out of every comment on this post. And he got some non zero number of folks to show agreement. Which leads me to suspect some type of weird trolling behaviour. Since it doesn’t seem credible that multiple folks truly believed that I should have been even more direct and to the point.
If anything I was expecting some mild criticism that I should have been more circumspect and hand wavey.
Clippy is defined as a paperclip maximizer. Humans require lots of resources to keep them alive. Those resources could otherwise be used for making more paperclips. Therefore Clippy would definitely not keep any human pets. I’m curious why you think splinter AI factions would. Could you say a bit more about how you expect splinter AIs to arise, and why you expect them to have a tendency towards keeping pets? Is it just that having many AIs makes it more likely that one of them will have a weird utility function?
In a single-single scenario you are correct that it would very unlikely for Clippy to behave in such a manner.
However in a multi-multi scenario, which is akin to an iterated prisoner’s dilemma of random length with unknown starting conditions, the most likely ‘winning’ outcome would be some variation of tit-for-tat.
And tit-for-tat encourages perpetual cooperation as long as the parties are smart enough to avoid death spirals. Again similar to human-pet relationships.
Of course it’s not guaranteed that any multi-multi situation will in fact arise, but I haven’t seen any convincing disproof, nor for any reason why it should not be treated as the default. The most straightforward reason would be the limitations of light speed on communications guaranteeing value drift for even the mightiest hypothetical AGI, eventually.
No one on LW, or in the broader academic community as far as I’m aware of, has yet managed to present a foolproof argument, or even one convincing on the balance of probabilities, for why single-single outcomes are more likely than multi-multi.
I write with the assumption that no one presumes Eliezer is infallible. And that everyone understands enough of human psychology that it would be very unlikely for anyone to write a long essay with dozens of points completely flawlessly and without error. Hence I wrote the first post as a helpful critique, which seems to be common enough on LW.
If some people truly believe even the most straightforward and frank questioning of weak assertions, with no beating around the bush at all, is ‘nitpicking’ then that’s on them. If you truly believe that then that’s on you.
If anyone actually had an expectation of no critique allowed they’d be just blindly upvoting without regard for being Less Wrong, which would seem fairly silly since everybody here seems like they can write coherent comments and thus understand that appeals to authority are a fallacious argument.
But given that I got at least 7 downvotes, that may sadly be the case. Or some trolls, etc., just downvote the first comment on a reflex basis.
EDIT: Or the downvotes could be folks thinking my comment was too direct, etc., though that would seem to contradict the fact that ‘lc’ got 28 upvotes for saying it wasn’t clear enough in being a criticism of the points?
This is by far the oddest distribution of votes I have ever seen in comment replies.
I had tried writing a point relating to possible social status considerations originally but edited it out as it would seem unfair as a direct reply.
It’s still disappointing anyone would at all instead of posting a substantive response. Ironically, if any of them had looked through my comment history they would have realized how unlikely it was to cow me via social status signalling. Thankfully they didn’t pick an easier target.
And lc‘s apparent doubling down reinforces how silly it all looks.
Social signalling is usually the reserve of those without much in way of substantive prospects so it‘s unfortunate that the community has attracted a few members who feels so strongly about it to use their downvotes on the first comment in a post most likely to arouse suspicions of that.
Since those with productive intentions can write openly, including the majority on LW, I’m fairly convinced the portion with unproductive goals can only ever be temporary.
It’s quite pleasant to see through all layers of dissimulation and sophistry. Certainly more interesting than the usual. And the time savings in not having to remember half truths, empty flatteries, etc., enable intelligent writing with a fraction of the time.
50 upvotes and no comments? Weird.
I’ll take a try then if no one else is willing.
Some assertions seem correct but some seem unproven, some are normative instead of descriptive, some are a mix.
For example just looking at some remarks from near the beginning.
Compared to what? And if you assert it without bounds to timeframe I can confidently state it‘s certainly more than stubbing your little toe, certainly less than Heat Death.
But without fixing a relative range it seems to not carry much weight at all.
Is it more or less than nuclear war, runaway bioweapons, gamma ray burst, etc., on a 10 year 100 year, 1000 year timeframes, etc. ?
According to who? And why a binary choice? Multi-multi scenarios existing in equilibrium have not been disproven yet,
i.e. where even if all worst eventualities come about, some AI faction may keep humans around in pleasant conditions, some in poor conditions, some in despicable conditions, etc., much like human-dog relations
You need to disprove, or point to evidence that shows, why all ’easier modes’ proposals are incorrect, which admittedly there are lots. I have not yet seen any that is comprehensive, though it seems like something that is unnecessary to base a strong assertion on anyways.
i.e. if even one such proposal contained useful ideas then dismissing them as a class would seem very silly in retrospect.
Why is it ‘fatal’? And who determines what counts as the ‘first really dangerous try’?
I highly highly doubt there will be anything approaching unanimous consensus on defining either terms on this across the world. On LW maybe, though ‘first really dangerous try’ sounds too wish washy for a decent chunk of the regulars.
Stop worrying about whether or not Eliezer has the “right” to say these things and start worrying about whether or not they’re true. You have the null string as well.
Are you sure your responding to the right post? I explicitly was trying to determine what was true or not. In fact that was about as a straightforward and frank as I could see anyone being without being clearly rude.
Maybe your a bit confused and mixed my post up with another?
Though nobody else in the comments seem to have said anything about ‘the ‘right’ to say these things’?
I’m trying to find a charitable interpretation for why you wrote that but I‘m drawing a blank, it really seems like just you saying that and trying to troll.
Your method of trying to determine whether something is true or not relies overly much on feedback from strangers. Your comment demands large amounts of intellectual labor from others (‘disprove why all easier modes are incorrect’), despite the preamble of the post, while seeming unwilling to put much work in yourself.
Yes, when strong assertions are made, a lot of intellectual labor is expected if evidence is lacking or missing. Plus, I wrote it in mind as being the first comment so it raises a few more points than I think is practical for the 100th comment. The preamble cannot justify points that are justified nowhere else, Or else it would be a simple appeal to authority.
In the vast majority of cases people who understand what they don’t understand hedge their assertions, so since there was a lack of equally strong evidence, or hedging, to support the corresponding claims I was intrigued if they did exist and Elizer simply didn’t link it, which could be for a variety of reasons. That is another factor in why I left it open ended.
It does seem I was correct for some of the points that the strongest evidence is less substantial than what the claims imply.
The other way I could see a reasonable person view it, is if I had read everything credible to do with the topic I wouldn’t have phrased it that way.
Though again that seems a bit far fetched since I highly doubt anyone has read through the preexisting literature completely across the many dozens of topics mentioned here and still remembers every point.
In any case it would have been strange to put a detailed and elaborate critique of a single point in the very first comment where common courtesy is to leave it more open ended for engagement and to allow others to chime in.
Which is why lc’s response seems so bizarre since they don’t even address any of the obvious rebuttals of my post and instead opens with a non-sequiter.
See my detailed response to Daphne_W ’s comment https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities?commentId=vWjdiSeo2LtMj42wD#vWjdiSeo2LtMj42wD
Otherwise, even though this is strangely out of character for you lc, I have a policy of disengaging from what appears to be low effort trolling.
“Lethal” here means “lethal enough to kill every living human”. For example, later in the article Eliezer writes this:
...
From context, “has to” means “if we humans don’t solve this problem then we will be killed by an unaligned AI”. There’s no person/authority out there threatening us to solve this problem “or else”, that’s just the way that reality seems to be. If you’re trying to ask why does building a Strong unaligned AI result in everyone being killed, then I suggest reading the posts about orthogonality and instrumental convergence linked at the top of this post.
“One way or another” is an English idiom which you can take to mean “somehow”. It doesn’t necessarily imply a binary choice.
This is addressed by #34: Just because multiple AI factions can coexist and compromise with each other doesn’t mean that any of those factions will be likely to want to keep humans around. It doesn’t seem likely that any AIs will think humans are cute and likeable in the same way that we think dogs are cute and likeable.
This is mostly addressed in #6 and #7, and the evidence given is that “nobody in this community has successfully named a ‘pivotal weak act’”. You could win this part of the argument by pointing out something that could be done with an AI weak enough not to be a threat which could prevent all the AI research groups out there in the world from building a Strong AI.
Because we expect a Strong AI that hasn’t been aligned to kill everyone. Once again, see the posts about orthogonality and instrumental convergence.
I’m not quite sure what you’re asking here? I guess Eliezer determines what he meant by writing those words. I don’t think there’s anyone at any of these AI research groups looking over proposals for models and saying “oh this model is moderately dangerous” or “oh this model is really dangerous, you shouldn’t build it”. I think at most of those groups, they only worry about the cost to train the model rather than how dangerous it will be.
If you were unaware, every example in the parent of other types of ‘lethal’ has the possibility of eliminating all human life. And not in a hand wavey sense either, truly 100%, the same death rate as the worse case AGI outcomes.
Which means that to a knowledgeable reader the wording is unpersuasive since the point is made before it’s been established there’s potential for an even worse outcome than 100% extinction.
This shouldn’t be too hard to do since this topic was regularly discussed on LW… dust specks, and simulated tortures, etc.
Idk why neither you or Eliezer include the obvious supporting points or links to someone who does beforehand, or at least not buried way past the assertion, since it seems you are trying to reinforce his points and Eliezer ostensibly wanted to write a summary to begin with for the non-expert reader.
If there’s a new essay style that I didn’t get the memo about to put the weak arguments at the beginning and stronger ones near the end then I could see why it was written in such a way.
For the rest of your points I see the same mistake of strong assertions without equally strong evidence to back it up.
For example, none of the posts from the regulars I’ve seen on LW assert, without any hedging, that there’s a 100% chance of human extinction due to any arbitrary Strong AI.
I’ve seen a few made that there’s a 100% chance Clippy would do such if Clippy arose first, though even those are somewhat iffy. And definitely none saying there’s a 100% chance Clippy, and only Clippy, would arise and reach an equilibrium end state.
If you know of any such please provide the link.
Note that “+50 karma” here doesn’t mean 50 people upvoted the post (at the time you read it), since different votes have different karma weights. E.g., as I write this the post has +165 karma but only 50 total voters. (And 30 total comments.) So when you wrote your comment there were probably only 10-20 upvoters.
‘More likely than not to kill us in the next 40 years’ seems more than sufficient for treating this as an existential emergency, and AFAIK EY’s actual view is a lot doomier and nearer-term.
Do you think a paperclip maximizer keeps humans around as pets? If not, is there something that makes paperclip maximizers relatively unlikely or unrepresentative, on your model?
Which proposal do you think is most promising?
(I think lc’s objection is partly coming from a place of: ‘Your comment says very little about your views of anything object-level.’)
Huh I didn’t realize +50 karma could mean as few as 10 people. Thanks, that also seems to explain why I got some downvotes. There were a sudden influx of comments in the hour right after I posted so at least it wasn’t for vain.
40 years is a lot different from 10 years, and he sure isn’t doesn’t doing himself any favours by not clarifying. It also seems like something the community has focused quite a bit of effort on narrowing down, so it seems strange he would elide the point.
Idk if it’s for some deep strategic purpose but it certainly puts any serious reader into a skeptical mood.
On the idea of ‘pets’, Clippy perhaps might, splinter AI factions almost surely would.
On the ‘easy’ proposals I was expecting Eliezer to provide a few examples of the strongest proposals of the class, and then develop a few counter examples and show conclusively why they are too naive, thus credibly dismissing the entire class. Or at least link to someone who does.
I personally don’t think any ‘easy alignment’ proposal is likely, though I also wouldn’t phrase the dismissal of the class so strongly either.
lc’s objection is bizarre if that was his intention, since he phrased his comment in a way that was clearly least applicable to what I wrote out of every comment on this post. And he got some non zero number of folks to show agreement. Which leads me to suspect some type of weird trolling behaviour. Since it doesn’t seem credible that multiple folks truly believed that I should have been even more direct and to the point.
If anything I was expecting some mild criticism that I should have been more circumspect and hand wavey.
Clippy is defined as a paperclip maximizer. Humans require lots of resources to keep them alive. Those resources could otherwise be used for making more paperclips. Therefore Clippy would definitely not keep any human pets. I’m curious why you think splinter AI factions would. Could you say a bit more about how you expect splinter AIs to arise, and why you expect them to have a tendency towards keeping pets? Is it just that having many AIs makes it more likely that one of them will have a weird utility function?
In a single-single scenario you are correct that it would very unlikely for Clippy to behave in such a manner.
However in a multi-multi scenario, which is akin to an iterated prisoner’s dilemma of random length with unknown starting conditions, the most likely ‘winning’ outcome would be some variation of tit-for-tat.
And tit-for-tat encourages perpetual cooperation as long as the parties are smart enough to avoid death spirals. Again similar to human-pet relationships.
Of course it’s not guaranteed that any multi-multi situation will in fact arise, but I haven’t seen any convincing disproof, nor for any reason why it should not be treated as the default. The most straightforward reason would be the limitations of light speed on communications guaranteeing value drift for even the mightiest hypothetical AGI, eventually.
No one on LW, or in the broader academic community as far as I’m aware of, has yet managed to present a foolproof argument, or even one convincing on the balance of probabilities, for why single-single outcomes are more likely than multi-multi.
It is by Yudkowsky and is more of a reference post summarizing many arguments provided by him elsewhere already.
Feel free to nitpick anway.
I write with the assumption that no one presumes Eliezer is infallible. And that everyone understands enough of human psychology that it would be very unlikely for anyone to write a long essay with dozens of points completely flawlessly and without error. Hence I wrote the first post as a helpful critique, which seems to be common enough on LW.
If some people truly believe even the most straightforward and frank questioning of weak assertions, with no beating around the bush at all, is ‘nitpicking’ then that’s on them. If you truly believe that then that’s on you.
If anyone actually had an expectation of no critique allowed they’d be just blindly upvoting without regard for being Less Wrong, which would seem fairly silly since everybody here seems like they can write coherent comments and thus understand that appeals to authority are a fallacious argument.
But given that I got at least 7 downvotes, that may sadly be the case. Or some trolls, etc., just downvote the first comment on a reflex basis.
EDIT: Or the downvotes could be folks thinking my comment was too direct, etc., though that would seem to contradict the fact that ‘lc’ got 28 upvotes for saying it wasn’t clear enough in being a criticism of the points?
This is by far the oddest distribution of votes I have ever seen in comment replies.
Votes related to posts by the leader of a community are unavoidably influenced by status considerations by a notable fraction of the audience.
Yes, upon reflection I agree.
I had tried writing a point relating to possible social status considerations originally but edited it out as it would seem unfair as a direct reply.
It’s still disappointing anyone would at all instead of posting a substantive response. Ironically, if any of them had looked through my comment history they would have realized how unlikely it was to cow me via social status signalling. Thankfully they didn’t pick an easier target.
And lc‘s apparent doubling down reinforces how silly it all looks.
Social signalling is usually the reserve of those without much in way of substantive prospects so it‘s unfortunate that the community has attracted a few members who feels so strongly about it to use their downvotes on the first comment in a post most likely to arouse suspicions of that.
Since those with productive intentions can write openly, including the majority on LW, I’m fairly convinced the portion with unproductive goals can only ever be temporary.
It happened to me too when I was a newbie. Interesting lesson.
It’s quite pleasant to see through all layers of dissimulation and sophistry. Certainly more interesting than the usual. And the time savings in not having to remember half truths, empty flatteries, etc., enable intelligent writing with a fraction of the time.