If nuclear war occurs over alignment, then in the future people are likely to think about “alignment” much much worse than people currently think about words like “eugenics,” for reasons actually even better than the ones people currently dislike “eugenics.” Additionally, I don’t think it will get easier to coordinate post nuclear war, in general; I think it probably takes us closer to a post-dream-time setting, in the Hansonian sense. So—obviously predicting the aftermath of nuclear war is super chaotic, but my estimate of % of future light-cone utilized does down—and if alignment caused the nuclear war, it should go down even further on models which judge alignment to be important!
This is a complex / chaotic / somewhat impossible calculation of course. But people seem to be talking about nuclear war like it’s a P(doom)-from-AI-risk reset button, and not realizing that there’s an implicit judgement about future probabilities that they are making. Nuclear war isn’t the end of history but another event whose consequences you can keep thinking about.
(Also, we aren’t gods, and EV is by fucking golly the wrong way to model this, but, different convo)
It makes me… surprised? feeling sadly that I don’t understand you?… to read you think a floor is 10% after reading Quintin Pope’s summary of disagreements with EY. His guess was 5%, and his theories seem way more clear, predictive and articulable than EY’s.
I’m unaware of any prior-to-the-end-of-the-world predictions about intelligence that—for lack of a better word—any classical EY / MIRI theory makes, despite the mountains of arguments in that theory. (Contrast with shard theory, which seems to have a functioning research program.) It makes a lot of predictions about superintelligence, but like, none about the Cambrain explosion of intelligence in which we live. I imagine this is a standard objection that you’ve heard before, Raemon, and I know you talk with a lot of people about it, so like, if there’s a standard response (sub 100k words) you point me at it.… but I think superintelligence will be built on the bits of intelligence we’ve made, and if your theory isn’t predictive about those (in the easiest time in history to make predictions about intelligence!) then it’s like, tantamount to the greatest possible amount of evidence that it’s a bad theory. I think there’s a lot of philosophy here that failed to turn into science, and at this point it’s just… philosophy, in the worst possible sense.
And EY’s “shut it all down” seems really driven by classical MIRI theory, in a lot of ways some of which I think I’m pretty sure of and some of which I struggle to articulate. I might have to follow the Nostalgebrist (https://nostalgebraist.tumblr.com/post/712173910926524416/im-re-instating-this-again-31823#notes) and just stop visiting LW because like… I think so much is wrong in the discourse about AI here, I dunno.
I agree pretty strongly with your points here especially the complete lack of good predictions from EY/MIRI about the current Cambrian explosion of intelligence and how any sane agent using a sane updating strategy (like mixture of experts or equivalently solomonof weighting) should more or less now discount/disavow much of their world model.
However I nonetheless agree that AI is by far the dominant x-risk. My doom probability is closer to ~5% perhaps, but the difference between 5% and 50% doesn’t cash out to much policy difference at this point.
So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be, and is arguably doing more harm than good by overpromoting those ideas vs alternate ideas flowing from those who actually did make reasonably good predictions about the current cambrian explosion—in advance.
If there was another site that was a nexus for AI/risk/alignment/etc with similar features but with most of the EY/MIRI legacy cultish stuff removed, I would naturally jump there. But it doesn’t seem to exist yet.
So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be
I don’t think there are many people with alignment strategies and research that they’re working on. Eliezer has a hugely important perspective, Scott Garrabrant, Paul Christiano, John Wentworth, Steve Byrnes, and more, all have approaches and perspectives too that they’re working full-time on. I think if you’re working on this full-time and any of your particular ideas check out as plausible I think there’s space for you to post here and get some engagement respect (if you post in a readable style that isn’t that of obfuscatory-academia). If you’ve got work you’re doing on it full-time I think you can probably post here semi-regularly and eventually find collaborators and people you’re interested in feedback from and eventually funding. You might not get super high karma all the time, but that’s okay, I think a few well-received posts is enough to not have to worry about a bunch of low-karma posts.
The main thing that I think makes space for a perspective here is (a) someone is seriously committed to actually working on it, and (b) they can communicate clearly and well. There’s a lot of different sub-niches on LessWrong that co-exist (e.g. Zvi’s news discussion doesn’t interact with Paul’s Latent Knowledge discussion doesn’t (surprisingly) interact much with Flint’s writing on what knowledge isn’t which doesn’t interact much with Kokotajlo’s writing on takeover). I think it’s fine to develop an area of research here without justifying the whole thing the whole time, I think that’s healthy for paradigms and proposals to go away and not have to engage that much with each other until they’ve made more progress. Overall I think most paradigms here have no results to show for themselves and it is not that worth fighting over which strategy to pick, rather than working ahead on a given strategy for a year or two until you have something to report back. For instance I would mostly encourage Quintin to go and get a serious result in shard theory and bring that back (and I really like that TurnTrout and Quintin have been working seriously on exactly that) and spend less time arguing about which approach is better.
So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be
I don’t think there are many people with alignment strategies and research that they’re working on.
I agree that’s a problem—but causally downstream of the problem I mention. Whereas Bostrom deserves credit for raising awareness of AI-risk in academia, EY/MIRI deserves credit for awakening many young techies to the issue—but also some blame.
Whether intentionally or not, the EY/MIRI worldview aligned itself against DL and its proponents, leading to an antagonistic dynamic that you may not have experienced if you haven’t spent much time on r/MachineLearning or similar. Many people in ML truly hate anything associated with EY/MIRI/LW. Part of that is perhaps just the natural result of someone sounding an alarm that your life’s work could literally kill everyone. But it really really doesn’t help if you then look into their technical arguments and reach the conclusion that they don’t know what they are talking about.
I otherwise agree with much of your comment. I think this site is lucky to have Byrnes and Quintin, and Quintin’s recent critique is the best recent critique of the EY/MIRI position from the DL perspective.
I have not engaged much with your and Quintin’s recent arguments about how deep learning may change the basic arguments, so I want to acknowledge that I would probably shift my opinion a bunch in some direction if I did. Nonetheless, a few related points:
I do want to say that on-priors the level of anger and antagonism that appears on most internet comment sections is substantially higher than what happens when the people meet in-person, and do not suspect a corresponding about of active antagonism would happen if Nate or Eliezer or John Wentworth went to an ML conference. Perhaps stated more strongly: I think 99% of internet ‘hate’ is performative only.
You write “But it really really doesn’t help if you then look into their technical arguments and reach the conclusion that they don’t know what they are talking about.” I would respect any ML researchers making this claim more if they wrote a thoughtful rebuttal to AGI: A List of Lethalities (or really literally any substantive piece of Eliezer’s on the subject that they cared to — There’s No Fire Alarm, Security Mindset, Rocket Alignment, etc). I think Eliezer not knowing what he’s talking about would make rebutting him easier. As far as I’m aware literally zero significant ML researchers have written such a thing, Not Dario, not Demis, not Sutskever, not LeCun, nor basically anyone senior in their orgs. Eliezer has thought quite a lot and put forth some quite serious argument that seemed shockingly prescient to me, and I dunno, it seems maximally inconvenient for all the people earning multi-million-dollar annual salaries in this new field of ML to seriously to engage with a good-faith and prescient outsider with thoughtful arguments that their work risks extinction. If they’re dismissing him as “not getting it” yet don’t seriously engage with the arguments or make a positive case for how alignment can be solved, I think I ought to default to thinking of them as not morally serious in their statements. Relatedly I am pretty deeply disappointed by the speed at which intellectuals like Pinker and Cowen quickly come up with reasons to dismiss and avoid engaging with the arguments when the alternative is to seriously grapple with an extinction-level threat.
I am not compelled by the idea that if you haven’t restated your arguments to fit in with the new paradigm that’s shown up, then you must be out of the loop and wrong. Rather than “your arguments don’t seem perfectly suited to our new paradigm, look at all of these little holes I’ve found” I would be far more compelled by “here is a positive proposal for how to align a system that we build, with active reason to suspect it is aligned” or similar. Paul is the only person I know to propose specific algorithms for how to align systems and Eliezer has engaged seriously on Paul’s terms and found many holes in the proposal that Paul agreed with. I expect Eliezer would do the same if anyone working in the major labs did the same.
I understand that you and Quintin have criticisms (looking through bits of Quintin’s post it seems interesting, as do your claims here) as does Paul and others who all agree on the basics that this is an extinction-level threat, I think it is more productive for Eliezer to critique positive proposals than it is to update his arguments identifying the problem, especially when defending them from criticism from people who still think the extinction risk from misalignment is at least 5% and thus a top priority for civilization right now. If there was a leading ML practitioner arguing that ML was not an extinction-level threat and who was engaged with Eliezer’s arguments, I would consider it more worthwhile for Eliezer to respond. Meanwhile I think people working in alignment research should prefer to get on with the work at-hand, and that LessWrong is clearly the best forum to get engagement from people who understand what the problem is that is trying to actually be solved (and to find collaborators/funders/etc).
As far as I’m aware literally zero significant ML researchers have written such a thing, Not Dario, not Demis, not Sutskever, not LeCun, nor basically anyone senior in their orgs.
I just want to point out that seems like a ridiculous standard. Quintin’s recent critique is not that dissimilar to the one I would write (and I already have spent some time trying to point out the various flaws in the EY/MIRI world model), and I expect that you would get many of the same objections if you elicited a number of thoughtful DL researchers. But few if any have been motivated—what’s the point?
Here’s my critique in simplified form: the mainstream AI futurists (moravec,kurzweil,etc) predicted that AGI would be brain-like and thus close to a virtual brain emulation. Thus they were not so concerned about doom, because brain-like AGI seems like a more natural extension of humanity (moravec’s book is named ‘mind children’ for a reason), and an easier transition to manage.
In most ways that matter, Moravec/Kurzweil were correct, and EY was wrong. That really shouldn’t be even up for debate at this point. The approach that worked—DL—is essentially reverse engineering the brain. This is in part due to how the successful techniques all ended up being directly inspired by neuroscience and the now proven universal learning & scaling hypotheses[1] (deep and or recurrent ANNs in general, sparse coding, normalization, relus, etc) OR indirectly recapitulated neural circuitry (transformer ‘attention’ equivalence to fast weight memory, etc).
But in even simpler form: If you take a first already trained NN A and run it on a bunch of data and capture all its outputs, then train a second NN B on the input output dataset, the result is that B becomes a distilled copy—a distillation, of A.
This is in fact how we train large scale AI systems. They are trained on human thoughts.
The universal learning hypothesis is that the brain (and thus DL) uses simple universal learning algorithms, and all circuit content is learned automatically, which leads to the scaling hypothesis—intelligence comes from scaling up simple architectures and learning algorithms with massive compute, not continually explicitly “rewriting your source code” ala EY’s model.
Can I ask what your epistemic state here is exactly? Here are some options:
The arguments Eliezer put forward do not clearly apply to Deep Learning and therefore we don’t have any positive reason to believe that alignment will be an issue in ML
The arguments Eliezer put forward never made sense in the first place and therefore we do not have to worry about the alignment problem
The arguments Eliezer put forward captured a bunch of important things about the alignment problem but due to some differences in how we get to build ML systems we actually know of a promising route to aligning the systems
The arguments Eliezer put forward are basically accurate but with concepts that feel slightly odd for thinking about machine learning, and due to machine learning advances we have a concrete (and important) research route that seems worth investing in that Eliezer’s conceptual landscape doesn’t notice and that he is pushing against
The arguments Eliezer put forward do not clearly apply to Deep Learning
Yes but
and therefore we don’t have any positive reason to believe that alignment will be an issue in ML
does not follow.
The arguments Eliezer put forward never made sense in the first place
Yes (for some of the arguments), but again:
and therefore we do not have to worry about the alignment problem
does not follow.
The arguments Eliezer put forward captured a bunch of important things about the alignment problem but due to some differences in how we get to build ML systems we actually know of a promising route to aligning the systems
Yes—such as the various more neuroscience/DL inspired approaches (Byrnes, simboxes, shard theory, etc.), or others a bit harder to categorize like davdidad’s approach, or external empowerment.
But also I should point out that RLHF may work better for longer than most here anticipate, simply because if you distill the (curated) thoughts of mostly aligned humans you may just get mostly aligned agents.
I’m not sure if it’s worth us having more back-and-forth, so I’ll say my general feelings right now:
I think it’s of course healthy and fine to have a bunch of major disagreements with Eliezer
I would avoid building “hate” toward him or building resentment as those things are generally not healthy for people to cultivate in themselves toward people who have not done evil things, as I think it will probably cause them to make worse choices by their own judgment
By-default do not count on anyone doing the hard work of making another forum for serious discussion of this subject, especially one that’s so open to harsh criticism and has high standards for comments (I know LessWrong could be better in lots of ways but c’mon have you seen Reddit/Facebook/Twitter?)
There is definitely a bunch of space on this forum for people like yourself to develop different research proposals and find thoughtful collaborators and get input from smart people who care about the problem you’re trying to solve (I think Shard Theory is such an example here)
I wish you every luck in doing so and am happy to know if there are ways to further support you trying to solve the alignment problem (of course I have limits on my time/resources and how much I can help out different people)
I would avoid building “hate” toward him or building resentment as those things are generally not healthy for people to cultivate in themselves toward people who have not done evil things, as I think it will probably cause them to make worse choices by their own judgment
Of course—my use of the word hate here is merely in reporting impressions from other ML/DL forums and the schism between the communities.
I obviously generally agree with EY on many things, and to the extent I critique his positions here its simply a straightforward result of some people here assuming their correctness a priori.
Also, can I just remind you that for most of LessWrong’s history the top-karma post was Holden’s critique of SingInst where he recommended against funding SingInst and argued in favor of Tool AI as the solution. Recently Eliezer’s List-of-Lethalities became the top-karma post, but less than a month later Paul’s response-and-critique post became the top-karma post where he argued that the problem is much more tractable than Eliezer thinks, and generally advocates a very different research strategy for dealing with alignment.
Eliezer is the primary person responsible for noticing and causing people to work on the alignment problem, due to his superior foresight and writing skill, and also founded this site, so most people here have read his perspective and understand it somewhat, but any notion that dissent isn’t welcomed here (which I am perhaps over-reading into your comment) seems kind of obviously not the case.
It makes me… surprised? feeling sadly that I don’t understand you?… to read you think a floor is 10% after reading Quintin Pope’s summary of disagreements with EY. His guess was 5%, and his theories seem way more clear, predictive and articulable than EY’s.
The main answer here is I hadn’t read Quintin’s post in full detail and didn’t know that. I’ll want to read it in more detail but mostly expect to update my statement to “5%”. Thank you for pointing it out.
(I was aware of Scott Aaronson being like 3%, but honestly hadn’t been very impressed with his reasoning and understanding and was explicitly not counting him. Sorry Scott).
I have more thoughts on where my own P(Doom) comes from, and how I relate to all this, but I think basically I should write a top level post about it and take some time to get it well articulated. I think I already said, but a quick recap: I don’t think you need particularly Yudkowskian views to think an international shut down treaty is a good idea. My own P(Doom) is somewhat confused but I put >50% odds. A major reason is the additional disjunctive worries of “you don’t just need the first superintelligence to go well, you need a world with lots of strong-but-narrow AIs interacting to go well, or a multipolar take off to go well.
Sooner or later you definitely need something about as strict (well, more actually) as the global control Eliezer advocates here, since compute costs go down, compute itself goes up, and AI models become more accessible and more powerful. Even if alignment is easy I don’t see how you can expect to survive an AI-heavy world without a level of control and international alignment that feels draconian by today’s standards.
(I don’t know yet if Quinton argues against all these points, but will give it a read. I haven’t been keeping up with everything because there’s a lot to read but seems important to be familiar with his take)
But maybe for right now maybe I most want to say “Yeah man this is very intense and sad. It sounds like I disagree with your epistemic state but I don’t think your epistemic state is crazy.”
If nuclear war occurs over alignment, then in the future people are likely to think about “alignment” much much worse than people currently think about words like “eugenics,” for reasons actually even better than the ones people currently dislike “eugenics.” Additionally, I don’t think it will get easier to coordinate post nuclear war, in general; I think it probably takes us closer to a post-dream-time setting, in the Hansonian sense. So—obviously predicting the aftermath of nuclear war is super chaotic, but my estimate of % of future light-cone utilized does down—and if alignment caused the nuclear war, it should go down even further on models which judge alignment to be important!
This is a complex / chaotic / somewhat impossible calculation of course. But people seem to be talking about nuclear war like it’s a P(doom)-from-AI-risk reset button, and not realizing that there’s an implicit judgement about future probabilities that they are making. Nuclear war isn’t the end of history but another event whose consequences you can keep thinking about.
(Also, we aren’t gods, and EV is by fucking golly the wrong way to model this, but, different convo)
It makes me… surprised? feeling sadly that I don’t understand you?… to read you think a floor is 10% after reading Quintin Pope’s summary of disagreements with EY. His guess was 5%, and his theories seem way more clear, predictive and articulable than EY’s.
I’m unaware of any prior-to-the-end-of-the-world predictions about intelligence that—for lack of a better word—any classical EY / MIRI theory makes, despite the mountains of arguments in that theory. (Contrast with shard theory, which seems to have a functioning research program.) It makes a lot of predictions about superintelligence, but like, none about the Cambrain explosion of intelligence in which we live. I imagine this is a standard objection that you’ve heard before, Raemon, and I know you talk with a lot of people about it, so like, if there’s a standard response (sub 100k words) you point me at it.… but I think superintelligence will be built on the bits of intelligence we’ve made, and if your theory isn’t predictive about those (in the easiest time in history to make predictions about intelligence!) then it’s like, tantamount to the greatest possible amount of evidence that it’s a bad theory. I think there’s a lot of philosophy here that failed to turn into science, and at this point it’s just… philosophy, in the worst possible sense.
And EY’s “shut it all down” seems really driven by classical MIRI theory, in a lot of ways some of which I think I’m pretty sure of and some of which I struggle to articulate. I might have to follow the Nostalgebrist (https://nostalgebraist.tumblr.com/post/712173910926524416/im-re-instating-this-again-31823#notes) and just stop visiting LW because like… I think so much is wrong in the discourse about AI here, I dunno.
I agree pretty strongly with your points here especially the complete lack of good predictions from EY/MIRI about the current Cambrian explosion of intelligence and how any sane agent using a sane updating strategy (like mixture of experts or equivalently solomonof weighting) should more or less now discount/disavow much of their world model.
However I nonetheless agree that AI is by far the dominant x-risk. My doom probability is closer to ~5% perhaps, but the difference between 5% and 50% doesn’t cash out to much policy difference at this point.
So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be, and is arguably doing more harm than good by overpromoting those ideas vs alternate ideas flowing from those who actually did make reasonably good predictions about the current cambrian explosion—in advance.
If there was another site that was a nexus for AI/risk/alignment/etc with similar features but with most of the EY/MIRI legacy cultish stuff removed, I would naturally jump there. But it doesn’t seem to exist yet.
I don’t think there are many people with alignment strategies and research that they’re working on. Eliezer has a hugely important perspective, Scott Garrabrant, Paul Christiano, John Wentworth, Steve Byrnes, and more, all have approaches and perspectives too that they’re working full-time on. I think if you’re working on this full-time and any of your particular ideas check out as plausible I think there’s space for you to post here and get some engagement respect (if you post in a readable style that isn’t that of obfuscatory-academia). If you’ve got work you’re doing on it full-time I think you can probably post here semi-regularly and eventually find collaborators and people you’re interested in feedback from and eventually funding. You might not get super high karma all the time, but that’s okay, I think a few well-received posts is enough to not have to worry about a bunch of low-karma posts.
The main thing that I think makes space for a perspective here is (a) someone is seriously committed to actually working on it, and (b) they can communicate clearly and well. There’s a lot of different sub-niches on LessWrong that co-exist (e.g. Zvi’s news discussion doesn’t interact with Paul’s Latent Knowledge discussion doesn’t (surprisingly) interact much with Flint’s writing on what knowledge isn’t which doesn’t interact much with Kokotajlo’s writing on takeover). I think it’s fine to develop an area of research here without justifying the whole thing the whole time, I think that’s healthy for paradigms and proposals to go away and not have to engage that much with each other until they’ve made more progress. Overall I think most paradigms here have no results to show for themselves and it is not that worth fighting over which strategy to pick, rather than working ahead on a given strategy for a year or two until you have something to report back. For instance I would mostly encourage Quintin to go and get a serious result in shard theory and bring that back (and I really like that TurnTrout and Quintin have been working seriously on exactly that) and spend less time arguing about which approach is better.
I agree that’s a problem—but causally downstream of the problem I mention. Whereas Bostrom deserves credit for raising awareness of AI-risk in academia, EY/MIRI deserves credit for awakening many young techies to the issue—but also some blame.
Whether intentionally or not, the EY/MIRI worldview aligned itself against DL and its proponents, leading to an antagonistic dynamic that you may not have experienced if you haven’t spent much time on r/MachineLearning or similar. Many people in ML truly hate anything associated with EY/MIRI/LW. Part of that is perhaps just the natural result of someone sounding an alarm that your life’s work could literally kill everyone. But it really really doesn’t help if you then look into their technical arguments and reach the conclusion that they don’t know what they are talking about.
I otherwise agree with much of your comment. I think this site is lucky to have Byrnes and Quintin, and Quintin’s recent critique is the best recent critique of the EY/MIRI position from the DL perspective.
I have not engaged much with your and Quintin’s recent arguments about how deep learning may change the basic arguments, so I want to acknowledge that I would probably shift my opinion a bunch in some direction if I did. Nonetheless, a few related points:
I do want to say that on-priors the level of anger and antagonism that appears on most internet comment sections is substantially higher than what happens when the people meet in-person, and do not suspect a corresponding about of active antagonism would happen if Nate or Eliezer or John Wentworth went to an ML conference. Perhaps stated more strongly: I think 99% of internet ‘hate’ is performative only.
You write “But it really really doesn’t help if you then look into their technical arguments and reach the conclusion that they don’t know what they are talking about.” I would respect any ML researchers making this claim more if they wrote a thoughtful rebuttal to AGI: A List of Lethalities (or really literally any substantive piece of Eliezer’s on the subject that they cared to — There’s No Fire Alarm, Security Mindset, Rocket Alignment, etc). I think Eliezer not knowing what he’s talking about would make rebutting him easier. As far as I’m aware literally zero significant ML researchers have written such a thing, Not Dario, not Demis, not Sutskever, not LeCun, nor basically anyone senior in their orgs. Eliezer has thought quite a lot and put forth some quite serious argument that seemed shockingly prescient to me, and I dunno, it seems maximally inconvenient for all the people earning multi-million-dollar annual salaries in this new field of ML to seriously to engage with a good-faith and prescient outsider with thoughtful arguments that their work risks extinction. If they’re dismissing him as “not getting it” yet don’t seriously engage with the arguments or make a positive case for how alignment can be solved, I think I ought to default to thinking of them as not morally serious in their statements. Relatedly I am pretty deeply disappointed by the speed at which intellectuals like Pinker and Cowen quickly come up with reasons to dismiss and avoid engaging with the arguments when the alternative is to seriously grapple with an extinction-level threat.
I am not compelled by the idea that if you haven’t restated your arguments to fit in with the new paradigm that’s shown up, then you must be out of the loop and wrong. Rather than “your arguments don’t seem perfectly suited to our new paradigm, look at all of these little holes I’ve found” I would be far more compelled by “here is a positive proposal for how to align a system that we build, with active reason to suspect it is aligned” or similar. Paul is the only person I know to propose specific algorithms for how to align systems and Eliezer has engaged seriously on Paul’s terms and found many holes in the proposal that Paul agreed with. I expect Eliezer would do the same if anyone working in the major labs did the same.
I understand that you and Quintin have criticisms (looking through bits of Quintin’s post it seems interesting, as do your claims here) as does Paul and others who all agree on the basics that this is an extinction-level threat, I think it is more productive for Eliezer to critique positive proposals than it is to update his arguments identifying the problem, especially when defending them from criticism from people who still think the extinction risk from misalignment is at least 5% and thus a top priority for civilization right now. If there was a leading ML practitioner arguing that ML was not an extinction-level threat and who was engaged with Eliezer’s arguments, I would consider it more worthwhile for Eliezer to respond. Meanwhile I think people working in alignment research should prefer to get on with the work at-hand, and that LessWrong is clearly the best forum to get engagement from people who understand what the problem is that is trying to actually be solved (and to find collaborators/funders/etc).
I just want to point out that seems like a ridiculous standard. Quintin’s recent critique is not that dissimilar to the one I would write (and I already have spent some time trying to point out the various flaws in the EY/MIRI world model), and I expect that you would get many of the same objections if you elicited a number of thoughtful DL researchers. But few if any have been motivated—what’s the point?
Here’s my critique in simplified form: the mainstream AI futurists (moravec,kurzweil,etc) predicted that AGI would be brain-like and thus close to a virtual brain emulation. Thus they were not so concerned about doom, because brain-like AGI seems like a more natural extension of humanity (moravec’s book is named ‘mind children’ for a reason), and an easier transition to manage.
In most ways that matter, Moravec/Kurzweil were correct, and EY was wrong. That really shouldn’t be even up for debate at this point. The approach that worked—DL—is essentially reverse engineering the brain. This is in part due to how the successful techniques all ended up being directly inspired by neuroscience and the now proven universal learning & scaling hypotheses[1] (deep and or recurrent ANNs in general, sparse coding, normalization, relus, etc) OR indirectly recapitulated neural circuitry (transformer ‘attention’ equivalence to fast weight memory, etc).
But in even simpler form: If you take a first already trained NN A and run it on a bunch of data and capture all its outputs, then train a second NN B on the input output dataset, the result is that B becomes a distilled copy—a distillation, of A.
This is in fact how we train large scale AI systems. They are trained on human thoughts.
The universal learning hypothesis is that the brain (and thus DL) uses simple universal learning algorithms, and all circuit content is learned automatically, which leads to the scaling hypothesis—intelligence comes from scaling up simple architectures and learning algorithms with massive compute, not continually explicitly “rewriting your source code” ala EY’s model.
Can I ask what your epistemic state here is exactly? Here are some options:
The arguments Eliezer put forward do not clearly apply to Deep Learning and therefore we don’t have any positive reason to believe that alignment will be an issue in ML
The arguments Eliezer put forward never made sense in the first place and therefore we do not have to worry about the alignment problem
The arguments Eliezer put forward captured a bunch of important things about the alignment problem but due to some differences in how we get to build ML systems we actually know of a promising route to aligning the systems
The arguments Eliezer put forward are basically accurate but with concepts that feel slightly odd for thinking about machine learning, and due to machine learning advances we have a concrete (and important) research route that seems worth investing in that Eliezer’s conceptual landscape doesn’t notice and that he is pushing against
Yes but
does not follow.
Yes (for some of the arguments), but again:
does not follow.
Yes—such as the various more neuroscience/DL inspired approaches (Byrnes, simboxes, shard theory, etc.), or others a bit harder to categorize like davdidad’s approach, or external empowerment.
But also I should point out that RLHF may work better for longer than most here anticipate, simply because if you distill the (curated) thoughts of mostly aligned humans you may just get mostly aligned agents.
Thanks!
I’m not sure if it’s worth us having more back-and-forth, so I’ll say my general feelings right now:
I think it’s of course healthy and fine to have a bunch of major disagreements with Eliezer
I would avoid building “hate” toward him or building resentment as those things are generally not healthy for people to cultivate in themselves toward people who have not done evil things, as I think it will probably cause them to make worse choices by their own judgment
By-default do not count on anyone doing the hard work of making another forum for serious discussion of this subject, especially one that’s so open to harsh criticism and has high standards for comments (I know LessWrong could be better in lots of ways but c’mon have you seen Reddit/Facebook/Twitter?)
There is definitely a bunch of space on this forum for people like yourself to develop different research proposals and find thoughtful collaborators and get input from smart people who care about the problem you’re trying to solve (I think Shard Theory is such an example here)
I wish you every luck in doing so and am happy to know if there are ways to further support you trying to solve the alignment problem (of course I have limits on my time/resources and how much I can help out different people)
Of course—my use of the word hate here is merely in reporting impressions from other ML/DL forums and the schism between the communities.
I obviously generally agree with EY on many things, and to the extent I critique his positions here its simply a straightforward result of some people here assuming their correctness a priori.
Okay! Good to know we concur on this. Was a bit worried, so thought I’d mention it.
Also, can I just remind you that for most of LessWrong’s history the top-karma post was Holden’s critique of SingInst where he recommended against funding SingInst and argued in favor of Tool AI as the solution. Recently Eliezer’s List-of-Lethalities became the top-karma post, but less than a month later Paul’s response-and-critique post became the top-karma post where he argued that the problem is much more tractable than Eliezer thinks, and generally advocates a very different research strategy for dealing with alignment.
Eliezer is the primary person responsible for noticing and causing people to work on the alignment problem, due to his superior foresight and writing skill, and also founded this site, so most people here have read his perspective and understand it somewhat, but any notion that dissent isn’t welcomed here (which I am perhaps over-reading into your comment) seems kind of obviously not the case.
The main answer here is I hadn’t read Quintin’s post in full detail and didn’t know that. I’ll want to read it in more detail but mostly expect to update my statement to “5%”. Thank you for pointing it out.
(I was aware of Scott Aaronson being like 3%, but honestly hadn’t been very impressed with his reasoning and understanding and was explicitly not counting him. Sorry Scott).
I have more thoughts on where my own P(Doom) comes from, and how I relate to all this, but I think basically I should write a top level post about it and take some time to get it well articulated. I think I already said, but a quick recap: I don’t think you need particularly Yudkowskian views to think an international shut down treaty is a good idea. My own P(Doom) is somewhat confused but I put >50% odds. A major reason is the additional disjunctive worries of “you don’t just need the first superintelligence to go well, you need a world with lots of strong-but-narrow AIs interacting to go well, or a multipolar take off to go well.
Sooner or later you definitely need something about as strict (well, more actually) as the global control Eliezer advocates here, since compute costs go down, compute itself goes up, and AI models become more accessible and more powerful. Even if alignment is easy I don’t see how you can expect to survive an AI-heavy world without a level of control and international alignment that feels draconian by today’s standards.
(I don’t know yet if Quinton argues against all these points, but will give it a read. I haven’t been keeping up with everything because there’s a lot to read but seems important to be familiar with his take)
But maybe for right now maybe I most want to say “Yeah man this is very intense and sad. It sounds like I disagree with your epistemic state but I don’t think your epistemic state is crazy.”
I hope you do, since these might reveal cruxes about AI safety, and I might agree or disagree with the post you write.
I don’t blame you if you leave LW, though I do want to mention that Eliezer is mostly the problem here, rather than a broader problem of LW.
That stated, LW probably needs to disaffiliate from Eliezer fast, because Eliezer is the source of the extreme rhetoric.