I agree pretty strongly with your points here especially the complete lack of good predictions from EY/MIRI about the current Cambrian explosion of intelligence and how any sane agent using a sane updating strategy (like mixture of experts or equivalently solomonof weighting) should more or less now discount/disavow much of their world model.
However I nonetheless agree that AI is by far the dominant x-risk. My doom probability is closer to ~5% perhaps, but the difference between 5% and 50% doesn’t cash out to much policy difference at this point.
So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be, and is arguably doing more harm than good by overpromoting those ideas vs alternate ideas flowing from those who actually did make reasonably good predictions about the current cambrian explosion—in advance.
If there was another site that was a nexus for AI/risk/alignment/etc with similar features but with most of the EY/MIRI legacy cultish stuff removed, I would naturally jump there. But it doesn’t seem to exist yet.
So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be
I don’t think there are many people with alignment strategies and research that they’re working on. Eliezer has a hugely important perspective, Scott Garrabrant, Paul Christiano, John Wentworth, Steve Byrnes, and more, all have approaches and perspectives too that they’re working full-time on. I think if you’re working on this full-time and any of your particular ideas check out as plausible I think there’s space for you to post here and get some engagement respect (if you post in a readable style that isn’t that of obfuscatory-academia). If you’ve got work you’re doing on it full-time I think you can probably post here semi-regularly and eventually find collaborators and people you’re interested in feedback from and eventually funding. You might not get super high karma all the time, but that’s okay, I think a few well-received posts is enough to not have to worry about a bunch of low-karma posts.
The main thing that I think makes space for a perspective here is (a) someone is seriously committed to actually working on it, and (b) they can communicate clearly and well. There’s a lot of different sub-niches on LessWrong that co-exist (e.g. Zvi’s news discussion doesn’t interact with Paul’s Latent Knowledge discussion doesn’t (surprisingly) interact much with Flint’s writing on what knowledge isn’t which doesn’t interact much with Kokotajlo’s writing on takeover). I think it’s fine to develop an area of research here without justifying the whole thing the whole time, I think that’s healthy for paradigms and proposals to go away and not have to engage that much with each other until they’ve made more progress. Overall I think most paradigms here have no results to show for themselves and it is not that worth fighting over which strategy to pick, rather than working ahead on a given strategy for a year or two until you have something to report back. For instance I would mostly encourage Quintin to go and get a serious result in shard theory and bring that back (and I really like that TurnTrout and Quintin have been working seriously on exactly that) and spend less time arguing about which approach is better.
So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be
I don’t think there are many people with alignment strategies and research that they’re working on.
I agree that’s a problem—but causally downstream of the problem I mention. Whereas Bostrom deserves credit for raising awareness of AI-risk in academia, EY/MIRI deserves credit for awakening many young techies to the issue—but also some blame.
Whether intentionally or not, the EY/MIRI worldview aligned itself against DL and its proponents, leading to an antagonistic dynamic that you may not have experienced if you haven’t spent much time on r/MachineLearning or similar. Many people in ML truly hate anything associated with EY/MIRI/LW. Part of that is perhaps just the natural result of someone sounding an alarm that your life’s work could literally kill everyone. But it really really doesn’t help if you then look into their technical arguments and reach the conclusion that they don’t know what they are talking about.
I otherwise agree with much of your comment. I think this site is lucky to have Byrnes and Quintin, and Quintin’s recent critique is the best recent critique of the EY/MIRI position from the DL perspective.
I have not engaged much with your and Quintin’s recent arguments about how deep learning may change the basic arguments, so I want to acknowledge that I would probably shift my opinion a bunch in some direction if I did. Nonetheless, a few related points:
I do want to say that on-priors the level of anger and antagonism that appears on most internet comment sections is substantially higher than what happens when the people meet in-person, and do not suspect a corresponding about of active antagonism would happen if Nate or Eliezer or John Wentworth went to an ML conference. Perhaps stated more strongly: I think 99% of internet ‘hate’ is performative only.
You write “But it really really doesn’t help if you then look into their technical arguments and reach the conclusion that they don’t know what they are talking about.” I would respect any ML researchers making this claim more if they wrote a thoughtful rebuttal to AGI: A List of Lethalities (or really literally any substantive piece of Eliezer’s on the subject that they cared to — There’s No Fire Alarm, Security Mindset, Rocket Alignment, etc). I think Eliezer not knowing what he’s talking about would make rebutting him easier. As far as I’m aware literally zero significant ML researchers have written such a thing, Not Dario, not Demis, not Sutskever, not LeCun, nor basically anyone senior in their orgs. Eliezer has thought quite a lot and put forth some quite serious argument that seemed shockingly prescient to me, and I dunno, it seems maximally inconvenient for all the people earning multi-million-dollar annual salaries in this new field of ML to seriously to engage with a good-faith and prescient outsider with thoughtful arguments that their work risks extinction. If they’re dismissing him as “not getting it” yet don’t seriously engage with the arguments or make a positive case for how alignment can be solved, I think I ought to default to thinking of them as not morally serious in their statements. Relatedly I am pretty deeply disappointed by the speed at which intellectuals like Pinker and Cowen quickly come up with reasons to dismiss and avoid engaging with the arguments when the alternative is to seriously grapple with an extinction-level threat.
I am not compelled by the idea that if you haven’t restated your arguments to fit in with the new paradigm that’s shown up, then you must be out of the loop and wrong. Rather than “your arguments don’t seem perfectly suited to our new paradigm, look at all of these little holes I’ve found” I would be far more compelled by “here is a positive proposal for how to align a system that we build, with active reason to suspect it is aligned” or similar. Paul is the only person I know to propose specific algorithms for how to align systems and Eliezer has engaged seriously on Paul’s terms and found many holes in the proposal that Paul agreed with. I expect Eliezer would do the same if anyone working in the major labs did the same.
I understand that you and Quintin have criticisms (looking through bits of Quintin’s post it seems interesting, as do your claims here) as does Paul and others who all agree on the basics that this is an extinction-level threat, I think it is more productive for Eliezer to critique positive proposals than it is to update his arguments identifying the problem, especially when defending them from criticism from people who still think the extinction risk from misalignment is at least 5% and thus a top priority for civilization right now. If there was a leading ML practitioner arguing that ML was not an extinction-level threat and who was engaged with Eliezer’s arguments, I would consider it more worthwhile for Eliezer to respond. Meanwhile I think people working in alignment research should prefer to get on with the work at-hand, and that LessWrong is clearly the best forum to get engagement from people who understand what the problem is that is trying to actually be solved (and to find collaborators/funders/etc).
As far as I’m aware literally zero significant ML researchers have written such a thing, Not Dario, not Demis, not Sutskever, not LeCun, nor basically anyone senior in their orgs.
I just want to point out that seems like a ridiculous standard. Quintin’s recent critique is not that dissimilar to the one I would write (and I already have spent some time trying to point out the various flaws in the EY/MIRI world model), and I expect that you would get many of the same objections if you elicited a number of thoughtful DL researchers. But few if any have been motivated—what’s the point?
Here’s my critique in simplified form: the mainstream AI futurists (moravec,kurzweil,etc) predicted that AGI would be brain-like and thus close to a virtual brain emulation. Thus they were not so concerned about doom, because brain-like AGI seems like a more natural extension of humanity (moravec’s book is named ‘mind children’ for a reason), and an easier transition to manage.
In most ways that matter, Moravec/Kurzweil were correct, and EY was wrong. That really shouldn’t be even up for debate at this point. The approach that worked—DL—is essentially reverse engineering the brain. This is in part due to how the successful techniques all ended up being directly inspired by neuroscience and the now proven universal learning & scaling hypotheses[1] (deep and or recurrent ANNs in general, sparse coding, normalization, relus, etc) OR indirectly recapitulated neural circuitry (transformer ‘attention’ equivalence to fast weight memory, etc).
But in even simpler form: If you take a first already trained NN A and run it on a bunch of data and capture all its outputs, then train a second NN B on the input output dataset, the result is that B becomes a distilled copy—a distillation, of A.
This is in fact how we train large scale AI systems. They are trained on human thoughts.
The universal learning hypothesis is that the brain (and thus DL) uses simple universal learning algorithms, and all circuit content is learned automatically, which leads to the scaling hypothesis—intelligence comes from scaling up simple architectures and learning algorithms with massive compute, not continually explicitly “rewriting your source code” ala EY’s model.
Can I ask what your epistemic state here is exactly? Here are some options:
The arguments Eliezer put forward do not clearly apply to Deep Learning and therefore we don’t have any positive reason to believe that alignment will be an issue in ML
The arguments Eliezer put forward never made sense in the first place and therefore we do not have to worry about the alignment problem
The arguments Eliezer put forward captured a bunch of important things about the alignment problem but due to some differences in how we get to build ML systems we actually know of a promising route to aligning the systems
The arguments Eliezer put forward are basically accurate but with concepts that feel slightly odd for thinking about machine learning, and due to machine learning advances we have a concrete (and important) research route that seems worth investing in that Eliezer’s conceptual landscape doesn’t notice and that he is pushing against
The arguments Eliezer put forward do not clearly apply to Deep Learning
Yes but
and therefore we don’t have any positive reason to believe that alignment will be an issue in ML
does not follow.
The arguments Eliezer put forward never made sense in the first place
Yes (for some of the arguments), but again:
and therefore we do not have to worry about the alignment problem
does not follow.
The arguments Eliezer put forward captured a bunch of important things about the alignment problem but due to some differences in how we get to build ML systems we actually know of a promising route to aligning the systems
Yes—such as the various more neuroscience/DL inspired approaches (Byrnes, simboxes, shard theory, etc.), or others a bit harder to categorize like davdidad’s approach, or external empowerment.
But also I should point out that RLHF may work better for longer than most here anticipate, simply because if you distill the (curated) thoughts of mostly aligned humans you may just get mostly aligned agents.
I’m not sure if it’s worth us having more back-and-forth, so I’ll say my general feelings right now:
I think it’s of course healthy and fine to have a bunch of major disagreements with Eliezer
I would avoid building “hate” toward him or building resentment as those things are generally not healthy for people to cultivate in themselves toward people who have not done evil things, as I think it will probably cause them to make worse choices by their own judgment
By-default do not count on anyone doing the hard work of making another forum for serious discussion of this subject, especially one that’s so open to harsh criticism and has high standards for comments (I know LessWrong could be better in lots of ways but c’mon have you seen Reddit/Facebook/Twitter?)
There is definitely a bunch of space on this forum for people like yourself to develop different research proposals and find thoughtful collaborators and get input from smart people who care about the problem you’re trying to solve (I think Shard Theory is such an example here)
I wish you every luck in doing so and am happy to know if there are ways to further support you trying to solve the alignment problem (of course I have limits on my time/resources and how much I can help out different people)
I would avoid building “hate” toward him or building resentment as those things are generally not healthy for people to cultivate in themselves toward people who have not done evil things, as I think it will probably cause them to make worse choices by their own judgment
Of course—my use of the word hate here is merely in reporting impressions from other ML/DL forums and the schism between the communities.
I obviously generally agree with EY on many things, and to the extent I critique his positions here its simply a straightforward result of some people here assuming their correctness a priori.
Also, can I just remind you that for most of LessWrong’s history the top-karma post was Holden’s critique of SingInst where he recommended against funding SingInst and argued in favor of Tool AI as the solution. Recently Eliezer’s List-of-Lethalities became the top-karma post, but less than a month later Paul’s response-and-critique post became the top-karma post where he argued that the problem is much more tractable than Eliezer thinks, and generally advocates a very different research strategy for dealing with alignment.
Eliezer is the primary person responsible for noticing and causing people to work on the alignment problem, due to his superior foresight and writing skill, and also founded this site, so most people here have read his perspective and understand it somewhat, but any notion that dissent isn’t welcomed here (which I am perhaps over-reading into your comment) seems kind of obviously not the case.
I agree pretty strongly with your points here especially the complete lack of good predictions from EY/MIRI about the current Cambrian explosion of intelligence and how any sane agent using a sane updating strategy (like mixture of experts or equivalently solomonof weighting) should more or less now discount/disavow much of their world model.
However I nonetheless agree that AI is by far the dominant x-risk. My doom probability is closer to ~5% perhaps, but the difference between 5% and 50% doesn’t cash out to much policy difference at this point.
So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be, and is arguably doing more harm than good by overpromoting those ideas vs alternate ideas flowing from those who actually did make reasonably good predictions about the current cambrian explosion—in advance.
If there was another site that was a nexus for AI/risk/alignment/etc with similar features but with most of the EY/MIRI legacy cultish stuff removed, I would naturally jump there. But it doesn’t seem to exist yet.
I don’t think there are many people with alignment strategies and research that they’re working on. Eliezer has a hugely important perspective, Scott Garrabrant, Paul Christiano, John Wentworth, Steve Byrnes, and more, all have approaches and perspectives too that they’re working full-time on. I think if you’re working on this full-time and any of your particular ideas check out as plausible I think there’s space for you to post here and get some engagement respect (if you post in a readable style that isn’t that of obfuscatory-academia). If you’ve got work you’re doing on it full-time I think you can probably post here semi-regularly and eventually find collaborators and people you’re interested in feedback from and eventually funding. You might not get super high karma all the time, but that’s okay, I think a few well-received posts is enough to not have to worry about a bunch of low-karma posts.
The main thing that I think makes space for a perspective here is (a) someone is seriously committed to actually working on it, and (b) they can communicate clearly and well. There’s a lot of different sub-niches on LessWrong that co-exist (e.g. Zvi’s news discussion doesn’t interact with Paul’s Latent Knowledge discussion doesn’t (surprisingly) interact much with Flint’s writing on what knowledge isn’t which doesn’t interact much with Kokotajlo’s writing on takeover). I think it’s fine to develop an area of research here without justifying the whole thing the whole time, I think that’s healthy for paradigms and proposals to go away and not have to engage that much with each other until they’ve made more progress. Overall I think most paradigms here have no results to show for themselves and it is not that worth fighting over which strategy to pick, rather than working ahead on a given strategy for a year or two until you have something to report back. For instance I would mostly encourage Quintin to go and get a serious result in shard theory and bring that back (and I really like that TurnTrout and Quintin have been working seriously on exactly that) and spend less time arguing about which approach is better.
I agree that’s a problem—but causally downstream of the problem I mention. Whereas Bostrom deserves credit for raising awareness of AI-risk in academia, EY/MIRI deserves credit for awakening many young techies to the issue—but also some blame.
Whether intentionally or not, the EY/MIRI worldview aligned itself against DL and its proponents, leading to an antagonistic dynamic that you may not have experienced if you haven’t spent much time on r/MachineLearning or similar. Many people in ML truly hate anything associated with EY/MIRI/LW. Part of that is perhaps just the natural result of someone sounding an alarm that your life’s work could literally kill everyone. But it really really doesn’t help if you then look into their technical arguments and reach the conclusion that they don’t know what they are talking about.
I otherwise agree with much of your comment. I think this site is lucky to have Byrnes and Quintin, and Quintin’s recent critique is the best recent critique of the EY/MIRI position from the DL perspective.
I have not engaged much with your and Quintin’s recent arguments about how deep learning may change the basic arguments, so I want to acknowledge that I would probably shift my opinion a bunch in some direction if I did. Nonetheless, a few related points:
I do want to say that on-priors the level of anger and antagonism that appears on most internet comment sections is substantially higher than what happens when the people meet in-person, and do not suspect a corresponding about of active antagonism would happen if Nate or Eliezer or John Wentworth went to an ML conference. Perhaps stated more strongly: I think 99% of internet ‘hate’ is performative only.
You write “But it really really doesn’t help if you then look into their technical arguments and reach the conclusion that they don’t know what they are talking about.” I would respect any ML researchers making this claim more if they wrote a thoughtful rebuttal to AGI: A List of Lethalities (or really literally any substantive piece of Eliezer’s on the subject that they cared to — There’s No Fire Alarm, Security Mindset, Rocket Alignment, etc). I think Eliezer not knowing what he’s talking about would make rebutting him easier. As far as I’m aware literally zero significant ML researchers have written such a thing, Not Dario, not Demis, not Sutskever, not LeCun, nor basically anyone senior in their orgs. Eliezer has thought quite a lot and put forth some quite serious argument that seemed shockingly prescient to me, and I dunno, it seems maximally inconvenient for all the people earning multi-million-dollar annual salaries in this new field of ML to seriously to engage with a good-faith and prescient outsider with thoughtful arguments that their work risks extinction. If they’re dismissing him as “not getting it” yet don’t seriously engage with the arguments or make a positive case for how alignment can be solved, I think I ought to default to thinking of them as not morally serious in their statements. Relatedly I am pretty deeply disappointed by the speed at which intellectuals like Pinker and Cowen quickly come up with reasons to dismiss and avoid engaging with the arguments when the alternative is to seriously grapple with an extinction-level threat.
I am not compelled by the idea that if you haven’t restated your arguments to fit in with the new paradigm that’s shown up, then you must be out of the loop and wrong. Rather than “your arguments don’t seem perfectly suited to our new paradigm, look at all of these little holes I’ve found” I would be far more compelled by “here is a positive proposal for how to align a system that we build, with active reason to suspect it is aligned” or similar. Paul is the only person I know to propose specific algorithms for how to align systems and Eliezer has engaged seriously on Paul’s terms and found many holes in the proposal that Paul agreed with. I expect Eliezer would do the same if anyone working in the major labs did the same.
I understand that you and Quintin have criticisms (looking through bits of Quintin’s post it seems interesting, as do your claims here) as does Paul and others who all agree on the basics that this is an extinction-level threat, I think it is more productive for Eliezer to critique positive proposals than it is to update his arguments identifying the problem, especially when defending them from criticism from people who still think the extinction risk from misalignment is at least 5% and thus a top priority for civilization right now. If there was a leading ML practitioner arguing that ML was not an extinction-level threat and who was engaged with Eliezer’s arguments, I would consider it more worthwhile for Eliezer to respond. Meanwhile I think people working in alignment research should prefer to get on with the work at-hand, and that LessWrong is clearly the best forum to get engagement from people who understand what the problem is that is trying to actually be solved (and to find collaborators/funders/etc).
I just want to point out that seems like a ridiculous standard. Quintin’s recent critique is not that dissimilar to the one I would write (and I already have spent some time trying to point out the various flaws in the EY/MIRI world model), and I expect that you would get many of the same objections if you elicited a number of thoughtful DL researchers. But few if any have been motivated—what’s the point?
Here’s my critique in simplified form: the mainstream AI futurists (moravec,kurzweil,etc) predicted that AGI would be brain-like and thus close to a virtual brain emulation. Thus they were not so concerned about doom, because brain-like AGI seems like a more natural extension of humanity (moravec’s book is named ‘mind children’ for a reason), and an easier transition to manage.
In most ways that matter, Moravec/Kurzweil were correct, and EY was wrong. That really shouldn’t be even up for debate at this point. The approach that worked—DL—is essentially reverse engineering the brain. This is in part due to how the successful techniques all ended up being directly inspired by neuroscience and the now proven universal learning & scaling hypotheses[1] (deep and or recurrent ANNs in general, sparse coding, normalization, relus, etc) OR indirectly recapitulated neural circuitry (transformer ‘attention’ equivalence to fast weight memory, etc).
But in even simpler form: If you take a first already trained NN A and run it on a bunch of data and capture all its outputs, then train a second NN B on the input output dataset, the result is that B becomes a distilled copy—a distillation, of A.
This is in fact how we train large scale AI systems. They are trained on human thoughts.
The universal learning hypothesis is that the brain (and thus DL) uses simple universal learning algorithms, and all circuit content is learned automatically, which leads to the scaling hypothesis—intelligence comes from scaling up simple architectures and learning algorithms with massive compute, not continually explicitly “rewriting your source code” ala EY’s model.
Can I ask what your epistemic state here is exactly? Here are some options:
The arguments Eliezer put forward do not clearly apply to Deep Learning and therefore we don’t have any positive reason to believe that alignment will be an issue in ML
The arguments Eliezer put forward never made sense in the first place and therefore we do not have to worry about the alignment problem
The arguments Eliezer put forward captured a bunch of important things about the alignment problem but due to some differences in how we get to build ML systems we actually know of a promising route to aligning the systems
The arguments Eliezer put forward are basically accurate but with concepts that feel slightly odd for thinking about machine learning, and due to machine learning advances we have a concrete (and important) research route that seems worth investing in that Eliezer’s conceptual landscape doesn’t notice and that he is pushing against
Yes but
does not follow.
Yes (for some of the arguments), but again:
does not follow.
Yes—such as the various more neuroscience/DL inspired approaches (Byrnes, simboxes, shard theory, etc.), or others a bit harder to categorize like davdidad’s approach, or external empowerment.
But also I should point out that RLHF may work better for longer than most here anticipate, simply because if you distill the (curated) thoughts of mostly aligned humans you may just get mostly aligned agents.
Thanks!
I’m not sure if it’s worth us having more back-and-forth, so I’ll say my general feelings right now:
I think it’s of course healthy and fine to have a bunch of major disagreements with Eliezer
I would avoid building “hate” toward him or building resentment as those things are generally not healthy for people to cultivate in themselves toward people who have not done evil things, as I think it will probably cause them to make worse choices by their own judgment
By-default do not count on anyone doing the hard work of making another forum for serious discussion of this subject, especially one that’s so open to harsh criticism and has high standards for comments (I know LessWrong could be better in lots of ways but c’mon have you seen Reddit/Facebook/Twitter?)
There is definitely a bunch of space on this forum for people like yourself to develop different research proposals and find thoughtful collaborators and get input from smart people who care about the problem you’re trying to solve (I think Shard Theory is such an example here)
I wish you every luck in doing so and am happy to know if there are ways to further support you trying to solve the alignment problem (of course I have limits on my time/resources and how much I can help out different people)
Of course—my use of the word hate here is merely in reporting impressions from other ML/DL forums and the schism between the communities.
I obviously generally agree with EY on many things, and to the extent I critique his positions here its simply a straightforward result of some people here assuming their correctness a priori.
Okay! Good to know we concur on this. Was a bit worried, so thought I’d mention it.
Also, can I just remind you that for most of LessWrong’s history the top-karma post was Holden’s critique of SingInst where he recommended against funding SingInst and argued in favor of Tool AI as the solution. Recently Eliezer’s List-of-Lethalities became the top-karma post, but less than a month later Paul’s response-and-critique post became the top-karma post where he argued that the problem is much more tractable than Eliezer thinks, and generally advocates a very different research strategy for dealing with alignment.
Eliezer is the primary person responsible for noticing and causing people to work on the alignment problem, due to his superior foresight and writing skill, and also founded this site, so most people here have read his perspective and understand it somewhat, but any notion that dissent isn’t welcomed here (which I am perhaps over-reading into your comment) seems kind of obviously not the case.