(Note: I feel nervous posting this under my own name, in part because my Dad is considering transitioning at the moment and I worry he’d read it as implying some hurtful thing I don’t mean, but I do want to declare the conflict of interest that I work at CFAR or MIRI).
The large majority of folks described in the OP as experiencing psychosis are transgender. Given the extremely high base rate of mental illness in this demographic, my guess is this is more explanatorily relevant than the fact that they interacted with rationalist institutions or memes.
I do think the memes around here can be unusually destabilizing. I have personally experienced significant psychological distress thinking about s-risk scenarios, for example, and it feels easy to imagine how this distress could have morphed into something serious if I’d started with worse mental health.
But if we’re exploring analogies between what happened at Leverage and these rationalist social circles, it strikes me as relevant to ask why each of these folks were experiencing poor mental health. My impression from reading Zoe’s writeup is that she thinks her poor mental health resulted from memes/policies/conversations that were at best accidentally mindfucky, and often intentionally abusive and manipulative.
In contrast, my impression of what happened in these rationalist social circles is more like “friends or colleagues earnestly introduced people (who happened to be drawn from a population with unusually high rates of mental illness) to upsetting plausible ideas.”
At Leverage people were mainly harmed by people threatening them, whether intentionally or not. By contrast, in the MIRICFAR social cluster, people were mainly harmed by plausible upsetting ideas. (Implausible ideas that aren’t also threats couldn’t harm someone because there’s no perceived incentive to believe them.)
An example of a threat is Roko’s Basilisk. An example of an upsetting plausible idea was the idea in early 2020 that there was going to be a huge pandemic soon. Serious attempts were made to suppress the former meme and promote the latter.
If someone threatens me I am likely to become upset. If someone informs me about something bad, I am also likely to become upset. Psychotic breaks are often a way of getting upset about one’s prior situation.. People who transition genders are also usually responding to something in their prior situation that they were upset about.
Sometimes people get upset in productive ways. When Justin Shovelain called me to tell me that there was going to be a giant pandemic, I called up some friends and talked through self-quarantine thresholds, resulting in this blog post. Later, some friends and I did some other things to help out people in danger from COVID, because we continued to be upset about the problem. Zack Davis’s LessWrong posts on epistemology also seem like a productive way to get upset.
Sometimes people get upset in unproductive ways. Once, a psychotic friend peed on their couch “in order to make things worse” (their words). People getting upset in unproductive ways is an important common unsolved problem.
The rate at which people are getting upset unproductively is an interesting metric but a poor target because while it is positively related to how bad problems are, it is also inversely related to the flow of information about problems. But that means it can be inversely related to the rate at which problems are getting solved and therefore to the rate at which things are getting better.
The large majority of folks described in the OP as experiencing psychosis are transgender.
That would be, arguably, 3 of the 4 cases of psychosis I knew about (if Zack Davis is included as transgender) and not the case of jail time I knew about. So 60% total. [EDIT: See PhoneixFriend’s comment, there were 4 cases who weren’t talking with Michael and who probably also weren’t trans (although that’s unknown); obviously my own knowledge is limited to my own social circle and people including me weren’t accounting for this in statistical inference]
My impression from reading Zoe’s writeup is that she thinks her poor mental health resulted from memes/policies/conversations that were at best accidentally mindfucky, and often intentionally abusive and manipulative.
In contrast, my impression of what happened in these rationalist social circles is more like “friends or colleagues earnestly introduced people (who happened to be drawn from a population with unusually high rates of mental illness) to upsetting plausible ideas.”
These don’t seem like mutually exclusive categories? Like, “upsetting plausible ideas” would be “memes” and “conversations” that could include things like AI probably coming soon, high amounts of secrecy being necessary, and the possibility of “mental objects” being transferred between people, right?
Even people not at the organizations themselves were an important cause, everyone was in a similar social context and responding to it, e.g. a lot of what Michael Vassar said was in response to and critical of lots of ideas institutional people had.
It seems like something strange is happening with some ideas that were different from the mainstream being labeled as “memes” and others, some of which are counter to the first set of ideas and some of which are counter to mainstream understanding, being labeled as “upsetting plausible ideas” with more causal attribution to the second class.
If a certain scene is a “cult” and people who “exit the cult” have higher rates of psychosis than people who don’t even attempt to “exit the cult” then this is consistent with the observations so far. Which could happen in part due to the ontological shift necessary to “exit the cult” and also because exiting would increase social isolation (increasing social dependence on a small number of people), which is a known risk factor.
Both Zoe and I were at one time “in a cult” and at a later time “out of the cult” with some in-between stage of “believing what we were in was a cult”, where both being “in the cult” and “coming to believe what we were in was a cult” involved “memes” and “upsetting plausible ideas”, which doesn’t seem like enough to differentiate.
Overall this doesn’t immediately match my subjective experience and seems like it’s confusing a lot of things.
[EDIT: The case I’m making here is even stronger given PhoenixFriend’s comment.]
exiting would increase social isolation (increasing social dependence on a small number of people), which is a known risk factor
If exiting makes you socially isolated, it means that (before exiting) all/most of your contacts were within the group.
That suggests that the safest way to exit is to gradually start meeting new people outside the group, start spending more time with them and less time with other group member, until the majority of your social life happens outside the group, which is when you should quit.
Cults typically try to prevent you from doing this, to keep the exit costly and dangerous. One method is to monitor you and your communications all the time. (For example, Jehovah Witnesses are always out there in pairs, because they have a sacred duty to snitch on each other. Another way is to keep you at the group compound where you simply can’t meet non-members. Yet another way is to establish a duty to regularly confess what you did and who you talked to, and to chastise you for spending time with unbelievers.) Another method is simply to keep you so busy all day long that you have no time left to interact with strangers.
To revert this—a healthy group will provide you enough private free time. (With the emphasis on all the three words: “free”, “private”, and “enough”.)
Both Zoe and I were at one time “in a cult”
We know that Zoe had little free time, she had to spend a lot of time reporting her thoughts to her supervisors, and she was pressured to abandon her hobbies and not socialize.
2–6hr long group debugging sessions in which we as a sub-faction (Alignment Group) would attempt to articulate a “demon” which had infiltrated our psyches from one of the rival groups, its nature and effects, and get it out of our systems using debugging tools.
it was suggested I cancel my intended trip to Europe to show my commitment, which I did.
There was no vacation policy, which seemed good, but in reality panned out in my having no definitively free, personal time that couldn’t be infringed upon by expectations of project prioritization.
One day, I was debugging with a supervisor and we got to the topic of my desire to perform as an actor. [...] he thought that wanting to do acting [...] was “honestly sociopathic.”
We were kept extremely busy. [...] Here are four screenshots of my calendar, showing an average month in my last 6 months at Leverage.
I was regularly left with the feeling that I was low status, uncommitted, and kind of useless for wanting to socialize on the weekends or in the evenings.
Also, the group belief that if you meet outsiders they may “mentally invade you”, the rival groups (does this refer to rationalists and EAs? not sure) will “infiltrate” you with “demons”, and ordinary people will intentionally or subconsciously “leave objects” in you… does not sound like it would exactly encourage you to make friends outside the group, to put it mildly.
Now, do you insist that your experience in MIRI/CFAR was of the same kind? -- Like, what was your schedule, approximately? Did you have free weekends? Were you criticized for socializing with people outside MIRI/CFAR, especially with “rival groups”? Did you have to debug your thoughts and exorcise the mental invasions left by your interaction with nonmembers? If possible, please be specific.
Were you criticized for socializing with people outside MIRI/CFAR, especially with “rival groups”?
As a datapoint, while working at MIRI I started dating someone working at OpenAI, and never felt any pressure from MIRI people to drop the relationship (and he was welcomed at the MIRI events that we did, and so on), despite Eliezer’s tweets discussed here representing a pretty widespread belief at MIRI. (He wasn’t one of the founders, and I think people at MIRI saw a clear difference between “founding OpenAI” and “working at OpenAI given that it was founded”, so idk if they would agree with the frame that OpenAI was a ‘rival group’.)
That suggests that the safest way to exit is to gradually start meeting new people outside the group, start spending more time with them and less time with other group member, until the majority of your social life happens outside the group, which is when you should quit.
This is what I did, it was just still a pretty small social group, and getting it and “quitting” were part of the same process.
(does this refer to rationalists and EAs? not sure)
I think it was other subgroups at Leverage, at least primarily. So “mental objects” would be a consideration in favor of making friends outside of the group. Unless one is worried about spreading mental objects to outsiders.
Now, do you insist that your experience in MIRI/CFAR was of the same kind?
Most of this is answered in the post, e.g. I made it clear that the over-scheduling issue was not a problem for me at MIRI, which is an important difference. I was certainly spending a lot of time outside of work doing psychological work, and I noted friendships including one with a housemate formed around a shared interest in such work (Zoe notes that a lot of things on her schedule were internal psychological work). There wasn’t active prevention of talking to people outside the community but it’s common for it to happen anyway which is influenced by soft social pressure (e.g. looking down on people as “normies”). Zoe also is saying a lot of the pressure at Leverage was soft/nonexplicit, e.g. “being looked down on” for taking normal weekends.
I do remember Nate Soares who was executive director at the time telling me that “work-life balance is overrated/not really necessary” and if I’d been more sensitive to this I might have spent a lot more time on work. (I’m not even sure he’s “wrong” in that the way “normal people” do this has a lot of problems and integrating different domains of life can help sometimes, it still could have been taken as encouragement in the direction of working on weekends etc.)
Just want to register that this comment seemed overly aggressive to me on a first read, even though I probably have many sympathies in your direction (that Leverage is importantly disanalogous to MIRI/CFAR)
The following recent Twitter thread by Eliezer is interesting in the context of the discussion of whether “upsetting but plausible ideas” are coming from central or non-central community actors, and Eliezer’s description of Michael Vassar as “causing psychotic breaks”:
if you actually knew how deep neural networks were solving your important mission-critical problems, you’d never stop screaming
(no, I don’t know how they’re doing it either, I just know that you’d update in a predictable net direction if you found out)
(in reply to “My model of Eliezer is not so different from his constantly screaming, silently to himself, at all times, pausing only to scream non-silently to others, so he doesn’t have to predictably update in the future.”:)
This state of affairs sounds indistinguishable from coherent Bayesian thought inside a world like this one, so I suppose that’s confirmation, yes.
A few takeaways from this:
Obviously, Eliezer is saying that there is a plausible but extremely upsetting idea that could be learned by studying neural networks sufficiently competently. [EDIT: Maybe I’m wrong that this is indicating neural nets being powerful and is just indicating them being unreliable for mission-critical applications? Both interpretations seem plausible...]
This statement, itself, is plausible and upsetting, though presumably less upsetting than if one actually knew the thing that could be learned about neural networks.
Someone who was “constantly screaming” would be considered, by those around them, to be having a psychotic break (or an even worse mental health problem), and be almost certain to be psychiatrically incarcerated.
Eliezer is, to all appearances, trying to convey these upsetting ideas on Twitter.
It follows that, to the extent that Eliezer is not “causing psychotic breaks”, it’s only because he’s insufficiently capable of causing people to believe “upsetting but plausible ideas” that he thinks are true, i.e. because he’s failing (or perhaps not-really-trying, only pretending to try) to actually convey them.
This does not seem like the obvious reading of the thread to me.
Obviously, Eliezer is saying that there is a plausible but extremely upsetting idea that could be learned by studying neural networks sufficiently competently.
I think Eliezer is saying that if you understood on a gut level how messy deep networks are, you’d realize how doomed prosaic alignment is. And that would be horrible news. And that might make you scream, although perhaps not constantly.
After all, Eliezer is known to use… dashes… of colorful imagery. Do you really think he is literally constantly screaming silently to himself? No? Then he was probably also being hyperbolic about how he truly thinks a person would respond to understanding a deep network in great detail.
That’s why I feel that your interpretation is grasping really hard at straws. This is a standard “we’re doomed by inadequate AI alignment” thread from Eliezer.
Even though it’s an exaggeration, Eliezer is, with this exaggeration, trying to indicate an extremely high level of fear, off the charts compared with what people are normally used to, as a result of really taking in the information. Such a level of fear is not clearly lower than the level of fear experienced by the psychotic people in question, who experienced e.g. serious sleep loss due to fear.
I strong-upvoted both of Jessica’s comments in this thread despite disagreeing with her interpretation in the strongest possible terms; I did so because I think it is important to note that, for every “common-sense” interpretation of a community leader’s words, there will be some small minority who interpret it in some other (possibly more damaging) way—and while I think (importantly) this does not imply it is the community leader’s responsibility to manage their words in such a way that no misinterpretation is possible (which I think is simply completely unfeasible), I am nonetheless in favor of people sharing their (non-standard) interpretations, given the variation in potential responses.
As Eliezer once said (I’m paraphrasing from memory here, so the following may not be word-for-word accurate, but I am >95% confident I’m not misremembering the thrust of what he said), “The question I have to ask myself is, will this drive more than 5% of my readers insane?”
EDIT: I have located the text of the original comment. I note (with some vindication) that once again, it seems that Eliezer was sensitive to this concern way ahead of when it actually became a thing.
Hm, I thought that the upsetting thing is how neural networks work in general. Like the ones that can correctly classify pictures with 99% probability… and then you slightly adjust a few pixels in such way that a human sees no difference, but the neural network suddenly makes a completely absurd claim with high certainty.
And, if you are using neural networks to solve important problems, and become aware of this, then you realize that despite them doing a great job in 99% of situations and a random stupid thing in the remaining 1%, there is actually no limit to how insanely wrong they can get, and that it can happen in circumstances that would seem perfectly harmless to you. That the underlying logic is just… inhuman.
(To make an analogy, imagine that you hire a human to translate from French to English. The human is pretty good but not perfect, which means that he gets 99% right. In the remaining 1% he either translates the word incorrectly or says that he doesn’t know. These two options are the only results you expect. -- Now instead of a human, you hire a robot. He also translates 99% correctly and 1% incorrectly or with no output. But in addition to this, if you give him a specifically designed input, he will say a complete absurdity. Like, he would translate “UN CHAT” as “A CAT”, but when you strategically add a few dots and make it “ỤN ĊHAṬ”, he will suddenly insist that is means “CENTRUM FOR APPLIED RATIONALITY” and will assign a 99.9999999% certainty to this translation. Note that this is not its usual reaction to dots; the input papers usually contain some impurities or random dots, and the algorithm has always successfully ignored them… until now. -- The answer is not just wrong, but absurdly wrong, it happened in the situation where you felt quite sure nothing wrong can happen, and the robot didn’t even feel uncertain.)
Obviously, Eliezer is saying that there is a plausible but extremely upsetting idea that could be learned by studying neural networks sufficiently competently.
So, I think that you got this part wrong (and that putting “obviously” in front of it makes this weirdly ironic in given context), and the following conclusions are therefore also wrong.
Eliezer is simply saying (not “constantly screaming”) “do not trust neural networks, they randomly make big errors”. That message, even if perceived 100% correctly, should not cause a psychotic break in the average listener.
Given what else Eliezer has said, it’s reasonable to infer that the screaming is due to the possibility of everyone dying due to neural network based AIs being powerful but unalignable, not merely that your AI application might fail unexpectedly.
It’s really strange to think the idea isn’t upsetting when Eliezer says understanding it would cause “constant screaming”. Even if that’s an exaggeration, really??????? Maybe ask someone who doesn’t read LW regularly whether Elizer is saying the idea you could get by knowing how neural nets work is upsetting, I think they would agree with me.
He specified “mission-critical”. An AI’s ability to take over other machines in the network, take over the internet, manufacture grey goo, etc. (choose your favorite doomsday scenario), is not really related to how mission-critical its original task was. (In fact, someone’s AI to choose the best photo filters to match the current mood on Instagram to maximize “likes” seems both more likely to have arbitrary network access and less likely to have careful oversight than a self-driving car AI.) Therefore I do think his comment was about the likelihood of failure in the critical task, and not about alignment.
I think he meant something like this: The neural net, used e.g. to recognize cars on the road, makes most of its deductions based on accidental correlations and shortcuts in the training data—things like “it was sunny in all the pictures of trucks”, or “if it recognizes the exact shape and orientation of the car’s mirror, then it knows which model of car it is, and deduces the rest of the car’s shape and position from that, rather than by observing the rest of the car”. (Actually they’d be lower-level and less human-legible than this. It’s like someone parsing tables out of Wikipedia pages’ HTML, but instead of matching th/tr/td elements, it just counts “<” characters, and God help us if one of the elements has an extra < due to holding a link or something.) If you understood just how fragile and divorced from reality the shortcuts were, while you were sitting in such a car rushing down the highway, you would scream.
(The counterargument to screaming, it seems to me, is that it’s relying on 100 different fragile accidental correlations, any 70 of which are sufficient—and it’s unlikely that more than 10 of them will break at once, especially if the neural net gets updated every few months, so the ensemble is robust even though the parts are not. I expect one could develop confidence in this by measuring just how overdetermined the “this is a car” deductions are, and how much they vary. But that requires careful measurement and calculation, and many people might not get past the intuitive “JFC my life depends on the equivalent of 100 of those reckless HTML-parsing shortcuts, I’m going to die”. And I expect there are plenty of applications where the ensemble really is fragile and has a >10% chance of serious failure within a few months.)
Ok, I see how this is plausible. I do think that the reply to Zvi adds some context where Zvi is basically saying “Eliezer is always screaming, taking pauses to scream at others”, and the thing Eliezer is usually expressing fear about is AI killing everyone. I see how it could go either way though.
(Note: I feel nervous posting this under my own name, in part because my Dad is considering transitioning at the moment and I worry he’d read it as implying some hurtful thing I don’t mean, but I do want to declare the conflict of interest that I work at CFAR or MIRI).
The large majority of folks described in the OP as experiencing psychosis are transgender. Given the extremely high base rate of mental illness in this demographic, my guess is this is more explanatorily relevant than the fact that they interacted with rationalist institutions or memes.
I do think the memes around here can be unusually destabilizing. I have personally experienced significant psychological distress thinking about s-risk scenarios, for example, and it feels easy to imagine how this distress could have morphed into something serious if I’d started with worse mental health.
But if we’re exploring analogies between what happened at Leverage and these rationalist social circles, it strikes me as relevant to ask why each of these folks were experiencing poor mental health. My impression from reading Zoe’s writeup is that she thinks her poor mental health resulted from memes/policies/conversations that were at best accidentally mindfucky, and often intentionally abusive and manipulative.
In contrast, my impression of what happened in these rationalist social circles is more like “friends or colleagues earnestly introduced people (who happened to be drawn from a population with unusually high rates of mental illness) to upsetting plausible ideas.”
As I understand it you’re saying:
At Leverage people were mainly harmed by people threatening them, whether intentionally or not. By contrast, in the MIRICFAR social cluster, people were mainly harmed by plausible upsetting ideas. (Implausible ideas that aren’t also threats couldn’t harm someone because there’s no perceived incentive to believe them.)
An example of a threat is Roko’s Basilisk. An example of an upsetting plausible idea was the idea in early 2020 that there was going to be a huge pandemic soon. Serious attempts were made to suppress the former meme and promote the latter.
If someone threatens me I am likely to become upset. If someone informs me about something bad, I am also likely to become upset. Psychotic breaks are often a way of getting upset about one’s prior situation.. People who transition genders are also usually responding to something in their prior situation that they were upset about.
Sometimes people get upset in productive ways. When Justin Shovelain called me to tell me that there was going to be a giant pandemic, I called up some friends and talked through self-quarantine thresholds, resulting in this blog post. Later, some friends and I did some other things to help out people in danger from COVID, because we continued to be upset about the problem. Zack Davis’s LessWrong posts on epistemology also seem like a productive way to get upset.
Sometimes people get upset in unproductive ways. Once, a psychotic friend peed on their couch “in order to make things worse” (their words). People getting upset in unproductive ways is an important common unsolved problem.
The rate at which people are getting upset unproductively is an interesting metric but a poor target because while it is positively related to how bad problems are, it is also inversely related to the flow of information about problems. But that means it can be inversely related to the rate at which problems are getting solved and therefore to the rate at which things are getting better.
That would be, arguably, 3 of the 4 cases of psychosis I knew about (if Zack Davis is included as transgender) and not the case of jail time I knew about. So 60% total. [EDIT: See PhoneixFriend’s comment, there were 4 cases who weren’t talking with Michael and who probably also weren’t trans (although that’s unknown); obviously my own knowledge is limited to my own social circle and people including me weren’t accounting for this in statistical inference]
These don’t seem like mutually exclusive categories? Like, “upsetting plausible ideas” would be “memes” and “conversations” that could include things like AI probably coming soon, high amounts of secrecy being necessary, and the possibility of “mental objects” being transferred between people, right?
Even people not at the organizations themselves were an important cause, everyone was in a similar social context and responding to it, e.g. a lot of what Michael Vassar said was in response to and critical of lots of ideas institutional people had.
It seems like something strange is happening with some ideas that were different from the mainstream being labeled as “memes” and others, some of which are counter to the first set of ideas and some of which are counter to mainstream understanding, being labeled as “upsetting plausible ideas” with more causal attribution to the second class.
If a certain scene is a “cult” and people who “exit the cult” have higher rates of psychosis than people who don’t even attempt to “exit the cult” then this is consistent with the observations so far. Which could happen in part due to the ontological shift necessary to “exit the cult” and also because exiting would increase social isolation (increasing social dependence on a small number of people), which is a known risk factor.
Both Zoe and I were at one time “in a cult” and at a later time “out of the cult” with some in-between stage of “believing what we were in was a cult”, where both being “in the cult” and “coming to believe what we were in was a cult” involved “memes” and “upsetting plausible ideas”, which doesn’t seem like enough to differentiate.
Overall this doesn’t immediately match my subjective experience and seems like it’s confusing a lot of things.
[EDIT: The case I’m making here is even stronger given PhoenixFriend’s comment.]
If exiting makes you socially isolated, it means that (before exiting) all/most of your contacts were within the group.
That suggests that the safest way to exit is to gradually start meeting new people outside the group, start spending more time with them and less time with other group member, until the majority of your social life happens outside the group, which is when you should quit.
Cults typically try to prevent you from doing this, to keep the exit costly and dangerous. One method is to monitor you and your communications all the time. (For example, Jehovah Witnesses are always out there in pairs, because they have a sacred duty to snitch on each other. Another way is to keep you at the group compound where you simply can’t meet non-members. Yet another way is to establish a duty to regularly confess what you did and who you talked to, and to chastise you for spending time with unbelievers.) Another method is simply to keep you so busy all day long that you have no time left to interact with strangers.
To revert this—a healthy group will provide you enough private free time. (With the emphasis on all the three words: “free”, “private”, and “enough”.)
We know that Zoe had little free time, she had to spend a lot of time reporting her thoughts to her supervisors, and she was pressured to abandon her hobbies and not socialize.
Also, the group belief that if you meet outsiders they may “mentally invade you”, the rival groups (does this refer to rationalists and EAs? not sure) will “infiltrate” you with “demons”, and ordinary people will intentionally or subconsciously “leave objects” in you… does not sound like it would exactly encourage you to make friends outside the group, to put it mildly.
Now, do you insist that your experience in MIRI/CFAR was of the same kind? -- Like, what was your schedule, approximately? Did you have free weekends? Were you criticized for socializing with people outside MIRI/CFAR, especially with “rival groups”? Did you have to debug your thoughts and exorcise the mental invasions left by your interaction with nonmembers? If possible, please be specific.
As a datapoint, while working at MIRI I started dating someone working at OpenAI, and never felt any pressure from MIRI people to drop the relationship (and he was welcomed at the MIRI events that we did, and so on), despite Eliezer’s tweets discussed here representing a pretty widespread belief at MIRI. (He wasn’t one of the founders, and I think people at MIRI saw a clear difference between “founding OpenAI” and “working at OpenAI given that it was founded”, so idk if they would agree with the frame that OpenAI was a ‘rival group’.)
This is what I did, it was just still a pretty small social group, and getting it and “quitting” were part of the same process.
I think it was other subgroups at Leverage, at least primarily. So “mental objects” would be a consideration in favor of making friends outside of the group. Unless one is worried about spreading mental objects to outsiders.
Most of this is answered in the post, e.g. I made it clear that the over-scheduling issue was not a problem for me at MIRI, which is an important difference. I was certainly spending a lot of time outside of work doing psychological work, and I noted friendships including one with a housemate formed around a shared interest in such work (Zoe notes that a lot of things on her schedule were internal psychological work). There wasn’t active prevention of talking to people outside the community but it’s common for it to happen anyway which is influenced by soft social pressure (e.g. looking down on people as “normies”). Zoe also is saying a lot of the pressure at Leverage was soft/nonexplicit, e.g. “being looked down on” for taking normal weekends.
I do remember Nate Soares who was executive director at the time telling me that “work-life balance is overrated/not really necessary” and if I’d been more sensitive to this I might have spent a lot more time on work. (I’m not even sure he’s “wrong” in that the way “normal people” do this has a lot of problems and integrating different domains of life can help sometimes, it still could have been taken as encouragement in the direction of working on weekends etc.)
Just want to register that this comment seemed overly aggressive to me on a first read, even though I probably have many sympathies in your direction (that Leverage is importantly disanalogous to MIRI/CFAR)
The following recent Twitter thread by Eliezer is interesting in the context of the discussion of whether “upsetting but plausible ideas” are coming from central or non-central community actors, and Eliezer’s description of Michael Vassar as “causing psychotic breaks”:
(in reply to “My model of Eliezer is not so different from his constantly screaming, silently to himself, at all times, pausing only to scream non-silently to others, so he doesn’t have to predictably update in the future.”:)
A few takeaways from this:
Obviously, Eliezer is saying that there is a plausible but extremely upsetting idea that could be learned by studying neural networks sufficiently competently. [EDIT: Maybe I’m wrong that this is indicating neural nets being powerful and is just indicating them being unreliable for mission-critical applications? Both interpretations seem plausible...]
This statement, itself, is plausible and upsetting, though presumably less upsetting than if one actually knew the thing that could be learned about neural networks.
Someone who was “constantly screaming” would be considered, by those around them, to be having a psychotic break (or an even worse mental health problem), and be almost certain to be psychiatrically incarcerated.
Eliezer is, to all appearances, trying to convey these upsetting ideas on Twitter.
It follows that, to the extent that Eliezer is not “causing psychotic breaks”, it’s only because he’s insufficiently capable of causing people to believe “upsetting but plausible ideas” that he thinks are true, i.e. because he’s failing (or perhaps not-really-trying, only pretending to try) to actually convey them.
This does not seem like the obvious reading of the thread to me.
I think Eliezer is saying that if you understood on a gut level how messy deep networks are, you’d realize how doomed prosaic alignment is. And that would be horrible news. And that might make you scream, although perhaps not constantly.
After all, Eliezer is known to use… dashes… of colorful imagery. Do you really think he is literally constantly screaming silently to himself? No? Then he was probably also being hyperbolic about how he truly thinks a person would respond to understanding a deep network in great detail.
That’s why I feel that your interpretation is grasping really hard at straws. This is a standard “we’re doomed by inadequate AI alignment” thread from Eliezer.
Even though it’s an exaggeration, Eliezer is, with this exaggeration, trying to indicate an extremely high level of fear, off the charts compared with what people are normally used to, as a result of really taking in the information. Such a level of fear is not clearly lower than the level of fear experienced by the psychotic people in question, who experienced e.g. serious sleep loss due to fear.
I strong-upvoted both of Jessica’s comments in this thread despite disagreeing with her interpretation in the strongest possible terms; I did so because I think it is important to note that, for every “common-sense” interpretation of a community leader’s words, there will be some small minority who interpret it in some other (possibly more damaging) way—and while I think (importantly) this does not imply it is the community leader’s responsibility to manage their words in such a way that no misinterpretation is possible (which I think is simply completely unfeasible), I am nonetheless in favor of people sharing their (non-standard) interpretations, given the variation in potential responses.
As Eliezer once said (I’m paraphrasing from memory here, so the following may not be word-for-word accurate, but I am >95% confident I’m not misremembering the thrust of what he said), “The question I have to ask myself is, will this drive more than 5% of my readers insane?”
EDIT: I have located the text of the original comment. I note (with some vindication) that once again, it seems that Eliezer was sensitive to this concern way ahead of when it actually became a thing.
Hm, I thought that the upsetting thing is how neural networks work in general. Like the ones that can correctly classify pictures with 99% probability… and then you slightly adjust a few pixels in such way that a human sees no difference, but the neural network suddenly makes a completely absurd claim with high certainty.
And, if you are using neural networks to solve important problems, and become aware of this, then you realize that despite them doing a great job in 99% of situations and a random stupid thing in the remaining 1%, there is actually no limit to how insanely wrong they can get, and that it can happen in circumstances that would seem perfectly harmless to you. That the underlying logic is just… inhuman.
(To make an analogy, imagine that you hire a human to translate from French to English. The human is pretty good but not perfect, which means that he gets 99% right. In the remaining 1% he either translates the word incorrectly or says that he doesn’t know. These two options are the only results you expect. -- Now instead of a human, you hire a robot. He also translates 99% correctly and 1% incorrectly or with no output. But in addition to this, if you give him a specifically designed input, he will say a complete absurdity. Like, he would translate “UN CHAT” as “A CAT”, but when you strategically add a few dots and make it “ỤN ĊHAṬ”, he will suddenly insist that is means “CENTRUM FOR APPLIED RATIONALITY” and will assign a 99.9999999% certainty to this translation. Note that this is not its usual reaction to dots; the input papers usually contain some impurities or random dots, and the algorithm has always successfully ignored them… until now. -- The answer is not just wrong, but absurdly wrong, it happened in the situation where you felt quite sure nothing wrong can happen, and the robot didn’t even feel uncertain.)
So, I think that you got this part wrong (and that putting “obviously” in front of it makes this weirdly ironic in given context), and the following conclusions are therefore also wrong.
Eliezer is simply saying (not “constantly screaming”) “do not trust neural networks, they randomly make big errors”. That message, even if perceived 100% correctly, should not cause a psychotic break in the average listener.
I think it’s important that the errors are not random; I think you mean something more like “they make large opaque errors.”
Given what else Eliezer has said, it’s reasonable to infer that the screaming is due to the possibility of everyone dying due to neural network based AIs being powerful but unalignable, not merely that your AI application might fail unexpectedly.
It’s really strange to think the idea isn’t upsetting when Eliezer says understanding it would cause “constant screaming”. Even if that’s an exaggeration, really??????? Maybe ask someone who doesn’t read LW regularly whether Elizer is saying the idea you could get by knowing how neural nets work is upsetting, I think they would agree with me.
He specified “mission-critical”. An AI’s ability to take over other machines in the network, take over the internet, manufacture grey goo, etc. (choose your favorite doomsday scenario), is not really related to how mission-critical its original task was. (In fact, someone’s AI to choose the best photo filters to match the current mood on Instagram to maximize “likes” seems both more likely to have arbitrary network access and less likely to have careful oversight than a self-driving car AI.) Therefore I do think his comment was about the likelihood of failure in the critical task, and not about alignment.
I think he meant something like this: The neural net, used e.g. to recognize cars on the road, makes most of its deductions based on accidental correlations and shortcuts in the training data—things like “it was sunny in all the pictures of trucks”, or “if it recognizes the exact shape and orientation of the car’s mirror, then it knows which model of car it is, and deduces the rest of the car’s shape and position from that, rather than by observing the rest of the car”. (Actually they’d be lower-level and less human-legible than this. It’s like someone parsing tables out of Wikipedia pages’ HTML, but instead of matching th/tr/td elements, it just counts “<” characters, and God help us if one of the elements has an extra < due to holding a link or something.) If you understood just how fragile and divorced from reality the shortcuts were, while you were sitting in such a car rushing down the highway, you would scream.
(The counterargument to screaming, it seems to me, is that it’s relying on 100 different fragile accidental correlations, any 70 of which are sufficient—and it’s unlikely that more than 10 of them will break at once, especially if the neural net gets updated every few months, so the ensemble is robust even though the parts are not. I expect one could develop confidence in this by measuring just how overdetermined the “this is a car” deductions are, and how much they vary. But that requires careful measurement and calculation, and many people might not get past the intuitive “JFC my life depends on the equivalent of 100 of those reckless HTML-parsing shortcuts, I’m going to die”. And I expect there are plenty of applications where the ensemble really is fragile and has a >10% chance of serious failure within a few months.)
(NB. I’ve never worked on neural nets.)
Ok, I see how this is plausible. I do think that the reply to Zvi adds some context where Zvi is basically saying “Eliezer is always screaming, taking pauses to scream at others”, and the thing Eliezer is usually expressing fear about is AI killing everyone. I see how it could go either way though.