When evaluating the reputation of your organization, community, or project, many people flock to surveys in which you ask randomly selected people what they think of your thing, or what their attitudes towards your organization, community or project are.
If you do this, you will very reliably get back data that looks like people are indifferent to you and your projects, and your results will probably be dominated by extremely shallow things like “do the words in your name invoke positive or negative associations”.
People largely only form opinions of you or your projects when they have some reason to do that, like trying to figure out whether to buy your product, or join your social movement, or vote for you in an election. You basically never care about what people think about you while engaging in activities completely unrelated to you, you care about what people will do when they have to take any action that is related to your goals. But the former is exactly what you are measuring in attitude surveys.
As an example of this (used here for illustrative purposes, and what caused me to form strong opinions on this, but not intended as the central point of this post): Many leaders in the Effective Altruism community ran various surveys after the collapse of FTX trying to understand what the reputation of “Effective Altruism” is. The results were basically always the same: People mostly didn’t know what EA was, and had vaguely positive associations with the term when asked. The people who had recently become familiar with it (which weren’t that many) did lower their opinions of EA, but the vast majority of people did not (because they mostly didn’t know what it was).
As far as I can tell, these surveys left most EA leaders thinking that the reputational effects of FTX were limited. After all, most people never heard about EA in the context of FTX, and seemed to mostly have positive associations with the term, and the average like or dislike in surveys barely budged. In reflections at the time, conclusions looked like this:
The fact that most people don’t really care much about EA is both a blessing and a curse. But either way, it’s a fact of life; and even as we internally try to learn what lessons we can from FTX, we should keep in mind that people outside EA mostly can’t be bothered to pay attention.
An incident rate in the single digit percents means that most community builders will have at least one example of someone raising FTX-related concerns—but our guess is that negative brand-related reactions are more likely to come from things like EA’s perceived affiliation with tech or earning to give than FTX.
We have some uncertainty about how well these results generalize outside the sample populations. E.g. we have heard claims that people who work in policy were unusually spooked by FTX. That seems plausible to us, though Ben would guess that policy EAs similarly overestimate the extent to which people outside EA care about EA drama.
Yes, my best understanding is still that people mostly don’t know what EA is, the small fraction that do mostly have a mildly positive opinion, and that neither of these points were affected much by FTX.[1]
Most programmers are familiar with the idea of a “lazily evaluated variable”—a value that isn’t computed until the exact moment you try to use it. Instead of calculating the value upfront, the system maintains just enough information to be able to calculate it when needed. If you never end up using that value, you never pay the computational cost of calculating it. Similarly, most people don’t form meaningful opinions about organizations or projects until the moment they need to make a decision that involves that organization. Just as a lazy variable suddenly gets evaluated when you first try to read its value, people’s real opinions about projects don’t materialize until they’re in a position where that opinion matters—like when deciding whether to donate, join, or support the project’s initiatives.
Reputation is lazily evaluated. People conserve their mental energy, time, and social capital by not forming detailed opinions about things until those opinions become relevant to their decisions. When surveys try to force early evaluation of these “lazy” opinions, they get something more like a placeholder value than the actual opinion that would form in a real decision-making context.
This computation is not purely cognitive. As people encounter a product, organization or community that they are considering doing something with, they will ask their friends whether they have any opinions, perform online searches, and generally seek out information to help them with whatever decision they are facing. This is part of the reason for why this metaphorical computation is costly and put off until it’s necessary.
So when you are trying to understand what people think of you, or how people’s opinions of you are changing, pay much more attention to the attitudes of people who have recently put in the effort to learn about you, or were facing some decision related to you, and so are more representative of where people tend to end up at when they are in a similar position. These will be much better indicators of your actual latent reputation than what happens when you ask people on a survey.
For the EA surveys, these indicators looked very bleak:
“Results demonstrated that FTX had decreased satisfaction by 0.5-1 points on a 10-point scale within the EA community”
“Among those aware of EA, attitudes remain positive and actually maybe increased post-FTX —though they were lower (d = −1.5, with large uncertainty) among those who were additionally aware of FTX.”
“Most respondents reported continuing to trust EA organizations, though over 30% said they had substantially lost trust in EA public figures or leadership.”
If various people in EA had paid attention to these, instead of to the approximately meaningless placeholder variables that you get when you ask people what they think of you without actually getting them to perform the costly computation associated with forming an opinion of you, I think they would have made substantially better predictions.
I don’t like the fact that this essay is a mix of an insightful generic argument and a contentious specific empirical claim that I don’t think you support strongly; it feels like the rhetorical strength of the former lends credence to the latter in a way that isn’t very truth-tracking.
I’m not claiming you did anything wrong here, I just don’t like something about this dynamic.
I do think the EA example is quite good on an illustrative level. It really strikes me as a rare case where we have an enormous pile of public empirical evidence (which is linked in the post) and it also seems by now really quite clear from a common-sense perspective.
I don’t think it makes sense to call this point “contentious”. I think it’s about as clear as these cases go. At least of the top of my head I can’t think of an example that would have been clearer (maybe if you had some social movement that more fully collapsed and where you could do a retrospective root cause analysis, but it’s extremely rare to have as clear of a natural experiment as the FTX one). I do think it’s political in our local social environment, and so is harder to talk about, so I agree on that dimension a different example would be better.
I do think it would be good/nice to add an additional datapoint, but I also think this would risk being misleading. The point about reputation being lazily evaluated is mostly true from common-sense observations and logical reasoning, and the EA point is mostly trying to provide evidence for “yes, this is a real mistake that real people make”. I think even if you dispute EAs reputation having gotten worse, I think the quotes from people above are still invalid and would mislead people (and I had this model before we observed the empirical evidence, and am writing it up because people told me they found it helpful for thinking through the FTX stuff as it was happening).
I think if I had a lot more time, I think the best thing to do would be to draw on some literature on polling errors or marketing, since the voting situation seems quite analogous. This might even get us some estimates of how strong the correlation between unevaluated and evaluated attitudes are, and how much they diverge for different levels of investment, if there exists any measurable one, and that would be cool.
I am persuaded by neither the common sense or the empirical evidence for the point about EA. To be clear (as I’ve said to you privately) I’m not at all trying to imply that I specifically disagree with you, I’m just saying that the evidence you’ve provided doesn’t persuade me of your claims.
Yeah, makes sense. I don’t think I am providing a full paper trail of evidence one can easily travel along, but I would take bets you would come to agree with it if you did spend the effort to look into it.
(Someone is welcome to link post, but indeed I am somewhat hoping to avoid posting over there as much, as I find it reliably stressful in mostly unproductive ways)
There’s another important effect here: a laggy time course of public opinion. I saw more popular press articles about EA than I ever have, linking SBF to them, but with a large lag after the events. So the early surveys showing a small effect happened before public conversation really bounced around the idea that SBFs crimes were motivated by EA utilitarian logic. The first time many people would remember hearing about EA would be from those later articles and discussions.
The effect probably amplified considerably over time as that hypothesis bounced through public discourse.
The original point stands but this is making the effect look much larger in this case.
Edit 2: after checking, I now believe the data strongly suggest FTX had a large negative effect on EA community metrics. (I still agree with Buck: “I don’t like the fact that this essay is a mix of an insightful generic argument and a contentious specific empirical claim that I don’t think you support strongly; it feels like the rhetorical strength of the former lends credence to the latter in a way that isn’t very truth-tracking.” And I disagree with habryka’s claims that the effect of FTX is obvious.)
I want more evidence on your claim that FTX had a major effect on EA reputation. Or: why do you believe it?
Edit: relevant thing habryka said that I didn’t quote above:
For the EA surveys, these indicators looked very bleak:
“Results demonstrated that FTX had decreased satisfaction by 0.5-1 points on a 10-point scale within the EA community”
“Among those aware of EA, attitudes remain positive and actually maybe increased post-FTX —though they were lower (d = −1.5, with large uncertainty) among those who were additionally aware of FTX.”
“Most respondents reported continuing to trust EA organizations, though over 30% said they had substantially lost trust in EA public figures or leadership.”
Practically allgrowthmetricsare down (and have indeed turned negative on most measures), a substantial fraction of core contributors are distancing themselves from the EA affiliation, surveys among EA community builders report EA-affiliation as a major recurring obstacle[1], and many of the leaders who previously thought it wasn’t a big deal now concede that it was/is a huge deal.
Also, informally, recruiting for things like EA Fund managers, or getting funding for EA Funds has become substantially harder. EA leadership positions appear to be filled by less competent people, and in most conversations I have with various people who have been around for a while, people seem to both express much less personal excitement or interest in identifying or championing anything EA-related, and report the same for most other people.
Related to the concepts in my essay, when measured the reputational differential also seem to reliably point towards people updating negatively towards EA as they learn more about EA (which shows up in the quotes you mentioned, and which more recently shows up in the latest Pulse survey, though I mostly consider that survey uninformative for roughly the reasons outlined in this post).
Hey! Sorry for the silence, I was feeling a bit stressed by this whole thread, and so I wanted to step away and think about this before responding. I’ve decided to revert the dashboard back to its original state & have republished the stale data. I did some quick/light data checks but prioritised getting this out fast. For transparency: I’ve also added stronger context warnings and I took down the form to access our raw data in sheet form but intend to add it back once we’ve fixed the data. It’s still on our stack to Actually Fix this at some point but we’re still figuring out the timing on that.
On reflection, I think I probably made the wrong call here (although I still feel a bit sad / misunderstood but 🤷🏻♀️). It was a unilateral + lightly held call I made in the middle of my work day — like truly I spent 5 min deciding this & maybe another ~15 updating the thing / leaving a comment. I think if I had a better model for what people wanted from the data, I would have made a different call. I’ve updated on “huh, people really care about not deleting data from the internet!” — although I get that the reaction here might be especially strong because it’s about CEA (vs the general case). Sorry, I made a mistake.
Future facing thoughts: I generally hold myself to a higher standard for accuracy when putting data on the internet, but I also do value not bottlenecking people in investigating questions that feel important to me (e.g. qs about EA growth rates), so to be clear I’m prioritizing the latter goal right now. I still in general stand by, “what even is the point of my job if I don’t stand by the data I communicate to others?” :) I want people to be able to trust that the work they see me put out in the world has been red-teamed & critiqued before publication.
Although I’m sad this caused an unintended kerfuffle, it’s a positive update for me that “huh wow, people actually care a lot that this project is kept alive!”. This honestly wasn’t obvious to me — this is a low traffic website that I worked on a while ago, and don’t hear about much. Oli says somewhere that he’s seen it linked to “many other times” in the past year, but TBH no one has flagged that to me (I’ve been busy with other projects). I’m still glad that we made this thing in the first place and am glad people find the data interesting / valuable (for general CEA transparency reasons, as an input to these broader questions about EA, etc.). I’ll probably prioritize maintenance on this higher in the future.
Now that the data is back up I’m going to go back to ignoring this thread!
[musing] Actually another mistake here which I wish I just said in the first comment: I didn’t have a strong enough TAP for, if someone says a negative thing about your org (or something that could be interpreted negatively), you should have a high bar for not taking away data (meaning more broadly than numbers) that they were using to form that perception, even if you think the data is wrong for reasons they’re not tracking. You can like, try and clarify the misconception (ideally, given time & energy constraints etc.), and you can try harder to avoid putting wrong things out there, but don’t just take it away—it’s not on to reader to treat you charitably and it kind of doesn’t matter what your motives were.
I think I mostly agree with something like that / I do think people should hold orgs to high standards here. I didn’t pay enough attention to this and regret it. Sorry! (I’m back to ignoring this thread lol but just felt like sharing a reflection 🤷🏻♀️)
Thank you! I appreciate the quick oops here, and agree it was a mistake (but fixing it as quickly as you did I think basically made up for all the costs, and I greatly appreciate it).
Just to clarify, I don’t want to make a strong statement that it’s worth updating the data and maintaining the dashboard. By my lights it would be good enough to just have a static snapshot of it forever. The thing that seemed so costly to me was breaking old links and getting rid of data that you did think was correct.
I suspect fixing this would need to involve creating something new which doesn’t have the structural problems in EA which produced this, and would involve talking to people who are non-sensationalist EA detractors but who are involved with similarly motivated projects. I’d start here and skip past the ones that are arguing “EA good” to find the ones that are “EA bad, because [list of reasons ea principles are good, and implication that EA is bad because it fails at its stated principles]”
I suspect, even without seeking that out, the spirit of EA that made it ever partly good has already and will further metastasize into genpop.
Hi! A quick note: I created the CEA Dashboard which is the 2nd link you reference. The data here hadn’t been updated since August 2024, and so was quite out of date at the time of your comment. I’ve now taken this dashboard down, since I think it’s overall more confusing than helpful for grokking the state of CEA’s work. We still intend to come back and update it within a few months.
Just to be clear on why / what’s going on:
I stopped updating the dashboard in August because I started getting busy with some other projects, and my manager & I decided to deprioritize this. (There are some manual steps needed to keep the data live).
I’ve now seen several people refer to that dashboard as a reference for how CEA is doing in ways I think are pretty misleading.
We (CEA) still intend to come back and fix this, and this is a good nudge to prioritize it.
Oh, huh, that seems very sad. Why would you do that? Please leave up the data that we have. I think it’s generally bad form to break links that people relied on. The data was accurate as far as I can tell until August 2024, and you linked to it yourself a bunch over the years, don’t just break all of those links.
I am pretty up-to-date with other EA metrics and I don’t really see how this would be misleading. You had a disclaimer at the top that I think gave all the relevant context. Let people make their own inferences, or add more context, but please don’t just take things down.
Unfortunately, archive.org doesn’t seem to have worked for that URL, so we can’t even rely on that to show the relevant data trends.
Edit: I’ll be honest, after thinking about it for longer, the only reason I can think of why you would take down the data is because it makes CEA and EA look less on an upwards trajectory. But this seems so crazy. How can I trust data coming out of CEA if you have a policy of retracting data that doesn’t align with the story you want to tell about CEA and EA? The whole point of sharing raw data is to allow other people to come to their own conclusions. This really seems like such a dumb move from a trust perspective.
I also believe that the data making EA+CEA looks bad is the causal reason why it was taken down. However, I want to add some slight nuance.
I want to contrast a model whereby Angelina Li did this while explicitly trying to stop CEA from looking bad, versus a model whereby she senses that something bad might be happening, she might be held responsible (e.g. within her organization / community), and is executing a move that she’s learned is ‘responsible’ from the culture around her.
I think many people have learned to believe the reasoning step “If people believe bad things about my team I think are mistaken with the information I’ve given them, then I am responsible for not misinforming people, so I should take the information away, because it is irresponsible to cause people to have false beliefs”. I think many well-intentioned people will say something like this, and that this is probably because of two reasons (borrowing from The Gervais Principle):
This is a useful argument for powerful sociopaths to use when they are trying to suppress negative information about themselves.
The clueless people below them in the hierarchy need to rationalize why they are following the orders of the sociopaths to prevent people from accessing information. The idea that they are ‘acting responsibly’ is much more palatable than the idea that they are trying to control people, so they willingly spread it and act in accordance with it.
A broader model I have is that there are many such inference-steps floating around the culture that well-intentioned people can accept as received wisdom, and they got there because sociopaths needed a cover for their bad behavior and the clueless people wanted reasons to feel good about their behavior; and that each of these adversarially optimized inference-steps need to be fought and destroyed.
I agree, and I am a bit disturbed that it needs to be said.
At normal, non-EA organizations—and not only particularly villainous ones, either! -- it is understood that you need to avoid sharing any information that reflects poorly on the organization, unless it’s required by law or contract or something. The purpose of public-facing communications is to burnish the org’s reputation. This is so obvious that they do not actually spell it out to employees.
Of COURSE any organization that has recently taken down unflattering information is doing it to maintain its reputation.
I’m sorry, but this is how “our people” get taken for a ride. Be more cynical, including about people you like.
I think many people have learned to believe the reasoning step “If people believe bad things about my team I think are mistaken with the information I’ve given them, then I am responsible for not misinforming people, so I should take the information away, because it is irresponsible to cause people to have false beliefs”. I think many well-intentioned people will say something like this, and that this is probably because of two reasons (borrowing from The Gervais Principle):
(Comment not specific to the particulars of this issue but noted as a general policy:) I think that as a general rule, if you are hypothesizing reasons for why somebody might say a thing, you should always also include the hypothesis that “people say a thing because they actually believe in it”. This is especially so if you are hypothesizing bad reasons for why people might say it.
It’s very annoying when someone hypothesizes various psychological reasons for your behavior and beliefs but never even considers as a possibility the idea that maybe you might have good reasons to believe in it. Compare e.g. “rationalists seem to believe that superintelligence is imminent; I think this is probably because that lets them avoid taking responsibility about their current problems if AI will make those irrelevant anyway, or possibly because they come from religious backgrounds and can’t get over their subconscious longing for a god-like figure”.
I feel more responsibility to be the person holding/tracking the earnest hypothesis in a 1-1 context, or if I am the only one speaking; in larger group contexts I tend to mostly ask “Is there a hypothesis here that isn’t or likely won’t be tracked unless I speak up” and then I mostly focus on adding hypotheses to track (or adding evidence that nobody else is adding).
(Did Ben indicate he didn’t consider it? My guess is he considered it, but thinks it’s not that likely and doesn’t have amazingly interesting things to say on it.
I think having a norm of explicitly saying “I considered whether you were saying the truth but I don’t believe it” seems like an OK norm, but not obviously a great one. In this case Ben also responded to a comment of mine which already said this, and so I really don’t see a reason for repeating it.)
I gave my strongest hypothesis for why it looks to me that many many people believe it’s responsible to take down information that makes your org look bad. I don’t think alternative stories have negligible probability, nor does what I wrote imply that, though it is logically consistent with that.
There are many anti-informative behaviors that are widespread for which people do for poor reasons, like saying that their spouse is the best spouse in the world, or telling customers that their business is the best business in the industry, or saying exclusively glowing things about people in reference letters, that are best explained by the incentives on the person to present themselves in the best light; at the same time, it is respectful to a person, while in dialogue with them, to keep a track of the version of them who is trying their best to have true beliefs and honestly inform others around them, in order to help them become that person (and notice the delta between their current behavior and what they hopefully aspire to).
Seeing orgs in the self-identified-EA space take down information that makes them look bad is (to me) not that dissimilar to the other things I listed.
I think it’s good to discuss norms about how appropriate it is to bring up cynical hypotheses about someone during a discussion in which they’re present. In this case I think raising this hypothesis was worthwhile it for the discussion, and I didn’t cut off any way for the person in question to continue to show themselves to be broadly acting in good faith, so I think it went fine. Li replied to Habryka, and left a thoughtful pair of comments retracting and apologizing, which reflected well on them in my eyes.
I don’t think alternative stories have negligible probability
Okay! Good clarification.
I think it’s good to discuss norms about how appropriate it is to bring up cynical hypotheses about someone during a discussion in which they’re present.
To clarify, my comment wasn’t specific to the case where the person is present. There are obvious reasons why the consideration should get extra weight when the person is present, but there’s also a reason to give it extra weight if none of the people discussed are present—namely that they won’t be able to correct any incorrect claims if they’re not around.
so I think it went fine
Agree.
(As I mentioned in the original comment, the point I made was not specific to the details of this case, but noted as a general policy. But yes, in this specific case it went fine.)
“The data was accurate as far as I can tell until August 2024”
I’ve heard a few reports over the last few weeks that made me unsure whether the pre-Aug data was actually correct. I haven’t had time to dig into this.
In one case (e.g. with the EA.org data) we have a known problem with the historical data that I haven’t had time to fix, that probably means the reported downward trend in views is misleading. Again I haven’t had time to scope the magnitude of this etc.
I’m going to check internally to see if we can just get this back up in a week or two (It was already high on our stack, so this just nudges up timelines a bit). I will update this thread once I have a plan to share.
I’m probably going to drop responding to “was this a bad call” and prioritize “just get the dashboard back up soon”.
More thoughts here, but TL;DR I’ve decided to revert the dashboard back to its original state & have republished the stale data. (Just flagging for readers who wanted to dig into the metrics.)
Hey! I just saw your edited text and wanted to jot down a response:
Edit: I’ll be honest, after thinking about it for longer, the only reason I can think of why you would take down the data is because it makes CEA and EA look less on an upwards trajectory. But this seems so crazy. How can I trust data coming out of CEA if you have a policy of retracting data that doesn’t align with the story you want to tell about CEA and EA? The whole point of sharing raw data is to allow other people to come to their own conclusions. This really seems like such a dumb move from a trust perspective.
I’m sorry this feels bad to you. I care about being truth seeking and care about the empirical question of “what’s happening with EA growth?”. Part of my motivation in getting this dashboard published in the first place was to contribute to the epistemic commons on this question.
I also disagree that CEA retracts data that doesn’t align with “the right story on growth”. E.g. here’s a post I wrote in mid 2023 where the bottom line conclusion was that growth in meta EA projects was down in 2023 v 2022. It also publishes data on several cases where CEA programs grew slower in 2023 or shrank. TBH I also think of this as CEA contributing to the epistemic commons here — it took us a long time to coordinate and then get permission from people to publish this. And I’m glad we did it!
On the specific call here, I’m not really sure what else to tell you re: my motivations other than what I’ve already said. I’m going to commit to not responding further to protect my attention, but I thought I’d respond at least once :)
I would currently be quite surprised if you had taken the same action if I was instead making an inference that positively reflects on CEA or EA. I might of course be wrong, but you did do it right after I wrote something critical of EA and CEA, and did not do it the many other times it was linked in the past year. Sadly your institution has a long history of being pretty shady with data and public comms this way, and so my priors are not very positively inclined.
I continue to think that it would make sense to at least leave the data up that CEA did feel comfortable linking in the last 1.5 years. By my norms invalidating links like this, especially if the underlying page happens to be unscrapeable by the internet archive, is really very bad form.
Reputation is lazily evaluated
When evaluating the reputation of your organization, community, or project, many people flock to surveys in which you ask randomly selected people what they think of your thing, or what their attitudes towards your organization, community or project are.
If you do this, you will very reliably get back data that looks like people are indifferent to you and your projects, and your results will probably be dominated by extremely shallow things like “do the words in your name invoke positive or negative associations”.
People largely only form opinions of you or your projects when they have some reason to do that, like trying to figure out whether to buy your product, or join your social movement, or vote for you in an election. You basically never care about what people think about you while engaging in activities completely unrelated to you, you care about what people will do when they have to take any action that is related to your goals. But the former is exactly what you are measuring in attitude surveys.
As an example of this (used here for illustrative purposes, and what caused me to form strong opinions on this, but not intended as the central point of this post): Many leaders in the Effective Altruism community ran various surveys after the collapse of FTX trying to understand what the reputation of “Effective Altruism” is. The results were basically always the same: People mostly didn’t know what EA was, and had vaguely positive associations with the term when asked. The people who had recently become familiar with it (which weren’t that many) did lower their opinions of EA, but the vast majority of people did not (because they mostly didn’t know what it was).
As far as I can tell, these surveys left most EA leaders thinking that the reputational effects of FTX were limited. After all, most people never heard about EA in the context of FTX, and seemed to mostly have positive associations with the term, and the average like or dislike in surveys barely budged. In reflections at the time, conclusions looked like this:
Or this:
This, I think, was an extremely costly mistake to make. Since then, practically all metrics of the EA community’s health and growth have sharply declined, and the extremely large and negative reputational effects have become clear.
Most programmers are familiar with the idea of a “lazily evaluated variable”—a value that isn’t computed until the exact moment you try to use it. Instead of calculating the value upfront, the system maintains just enough information to be able to calculate it when needed. If you never end up using that value, you never pay the computational cost of calculating it. Similarly, most people don’t form meaningful opinions about organizations or projects until the moment they need to make a decision that involves that organization. Just as a lazy variable suddenly gets evaluated when you first try to read its value, people’s real opinions about projects don’t materialize until they’re in a position where that opinion matters—like when deciding whether to donate, join, or support the project’s initiatives.
Reputation is lazily evaluated. People conserve their mental energy, time, and social capital by not forming detailed opinions about things until those opinions become relevant to their decisions. When surveys try to force early evaluation of these “lazy” opinions, they get something more like a placeholder value than the actual opinion that would form in a real decision-making context.
This computation is not purely cognitive. As people encounter a product, organization or community that they are considering doing something with, they will ask their friends whether they have any opinions, perform online searches, and generally seek out information to help them with whatever decision they are facing. This is part of the reason for why this metaphorical computation is costly and put off until it’s necessary.
So when you are trying to understand what people think of you, or how people’s opinions of you are changing, pay much more attention to the attitudes of people who have recently put in the effort to learn about you, or were facing some decision related to you, and so are more representative of where people tend to end up at when they are in a similar position. These will be much better indicators of your actual latent reputation than what happens when you ask people on a survey.
For the EA surveys, these indicators looked very bleak:
If various people in EA had paid attention to these, instead of to the approximately meaningless placeholder variables that you get when you ask people what they think of you without actually getting them to perform the costly computation associated with forming an opinion of you, I think they would have made substantially better predictions.
I don’t like the fact that this essay is a mix of an insightful generic argument and a contentious specific empirical claim that I don’t think you support strongly; it feels like the rhetorical strength of the former lends credence to the latter in a way that isn’t very truth-tracking.
I’m not claiming you did anything wrong here, I just don’t like something about this dynamic.
I do think the EA example is quite good on an illustrative level. It really strikes me as a rare case where we have an enormous pile of public empirical evidence (which is linked in the post) and it also seems by now really quite clear from a common-sense perspective.
I don’t think it makes sense to call this point “contentious”. I think it’s about as clear as these cases go. At least of the top of my head I can’t think of an example that would have been clearer (maybe if you had some social movement that more fully collapsed and where you could do a retrospective root cause analysis, but it’s extremely rare to have as clear of a natural experiment as the FTX one). I do think it’s political in our local social environment, and so is harder to talk about, so I agree on that dimension a different example would be better.
I do think it would be good/nice to add an additional datapoint, but I also think this would risk being misleading. The point about reputation being lazily evaluated is mostly true from common-sense observations and logical reasoning, and the EA point is mostly trying to provide evidence for “yes, this is a real mistake that real people make”. I think even if you dispute EAs reputation having gotten worse, I think the quotes from people above are still invalid and would mislead people (and I had this model before we observed the empirical evidence, and am writing it up because people told me they found it helpful for thinking through the FTX stuff as it was happening).
I think if I had a lot more time, I think the best thing to do would be to draw on some literature on polling errors or marketing, since the voting situation seems quite analogous. This might even get us some estimates of how strong the correlation between unevaluated and evaluated attitudes are, and how much they diverge for different levels of investment, if there exists any measurable one, and that would be cool.
I am persuaded by neither the common sense or the empirical evidence for the point about EA. To be clear (as I’ve said to you privately) I’m not at all trying to imply that I specifically disagree with you, I’m just saying that the evidence you’ve provided doesn’t persuade me of your claims.
Yeah, makes sense. I don’t think I am providing a full paper trail of evidence one can easily travel along, but I would take bets you would come to agree with it if you did spend the effort to look into it.
This is good. Please consider making it a top level post.
It ought to be a top-level post on the EA forum as well.
(Someone is welcome to link post, but indeed I am somewhat hoping to avoid posting over there as much, as I find it reliably stressful in mostly unproductive ways)
There’s another important effect here: a laggy time course of public opinion. I saw more popular press articles about EA than I ever have, linking SBF to them, but with a large lag after the events. So the early surveys showing a small effect happened before public conversation really bounced around the idea that SBFs crimes were motivated by EA utilitarian logic. The first time many people would remember hearing about EA would be from those later articles and discussions.
The effect probably amplified considerably over time as that hypothesis bounced through public discourse.
The original point stands but this is making the effect look much larger in this case.
This lag effect might amplify a lot more when big budget movies about SBF/FTX come out.
Edit 2: after checking, I now believe the data strongly suggest FTX had a large negative effect on EA community metrics. (I still agree with Buck: “I don’t like the fact that this essay is a mix of an insightful generic argument and a contentious specific empirical claim that I don’t think you support strongly; it feels like the rhetorical strength of the former lends credence to the latter in a way that isn’t very truth-tracking.” And I disagree with habryka’s claims that the effect of FTX is obvious.)
I want more evidence on your claim that FTX had a major effect on EA reputation. Or: why do you believe it?
Edit: relevant thing habryka said that I didn’t quote above:
Practically all growth metrics are down (and have indeed turned negative on most measures), a substantial fraction of core contributors are distancing themselves from the EA affiliation, surveys among EA community builders report EA-affiliation as a major recurring obstacle[1], and many of the leaders who previously thought it wasn’t a big deal now concede that it was/is a huge deal.
Also, informally, recruiting for things like EA Fund managers, or getting funding for EA Funds has become substantially harder. EA leadership positions appear to be filled by less competent people, and in most conversations I have with various people who have been around for a while, people seem to both express much less personal excitement or interest in identifying or championing anything EA-related, and report the same for most other people.
Related to the concepts in my essay, when measured the reputational differential also seem to reliably point towards people updating negatively towards EA as they learn more about EA (which shows up in the quotes you mentioned, and which more recently shows up in the latest Pulse survey, though I mostly consider that survey uninformative for roughly the reasons outlined in this post).
As reported to me by someone I trust working in the space recently. I don’t have a link at hand.
Hey! Sorry for the silence, I was feeling a bit stressed by this whole thread, and so I wanted to step away and think about this before responding. I’ve decided to revert the dashboard back to its original state & have republished the stale data. I did some quick/light data checks but prioritised getting this out fast. For transparency: I’ve also added stronger context warnings and I took down the form to access our raw data in sheet form but intend to add it back once we’ve fixed the data. It’s still on our stack to Actually Fix this at some point but we’re still figuring out the timing on that.
On reflection, I think I probably made the wrong call here (although I still feel a bit sad / misunderstood but 🤷🏻♀️). It was a unilateral + lightly held call I made in the middle of my work day — like truly I spent 5 min deciding this & maybe another ~15 updating the thing / leaving a comment. I think if I had a better model for what people wanted from the data, I would have made a different call. I’ve updated on “huh, people really care about not deleting data from the internet!” — although I get that the reaction here might be especially strong because it’s about CEA (vs the general case). Sorry, I made a mistake.
Future facing thoughts: I generally hold myself to a higher standard for accuracy when putting data on the internet, but I also do value not bottlenecking people in investigating questions that feel important to me (e.g. qs about EA growth rates), so to be clear I’m prioritizing the latter goal right now. I still in general stand by, “what even is the point of my job if I don’t stand by the data I communicate to others?” :) I want people to be able to trust that the work they see me put out in the world has been red-teamed & critiqued before publication.
Although I’m sad this caused an unintended kerfuffle, it’s a positive update for me that “huh wow, people actually care a lot that this project is kept alive!”. This honestly wasn’t obvious to me — this is a low traffic website that I worked on a while ago, and don’t hear about much. Oli says somewhere that he’s seen it linked to “many other times” in the past year, but TBH no one has flagged that to me (I’ve been busy with other projects). I’m still glad that we made this thing in the first place and am glad people find the data interesting / valuable (for general CEA transparency reasons, as an input to these broader questions about EA, etc.). I’ll probably prioritize maintenance on this higher in the future.
Now that the data is back up I’m going to go back to ignoring this thread!
[musing] Actually another mistake here which I wish I just said in the first comment: I didn’t have a strong enough TAP for, if someone says a negative thing about your org (or something that could be interpreted negatively), you should have a high bar for not taking away data (meaning more broadly than numbers) that they were using to form that perception, even if you think the data is wrong for reasons they’re not tracking. You can like, try and clarify the misconception (ideally, given time & energy constraints etc.), and you can try harder to avoid putting wrong things out there, but don’t just take it away—it’s not on to reader to treat you charitably and it kind of doesn’t matter what your motives were.
I think I mostly agree with something like that / I do think people should hold orgs to high standards here. I didn’t pay enough attention to this and regret it. Sorry! (I’m back to ignoring this thread lol but just felt like sharing a reflection 🤷🏻♀️)
Thank you! I appreciate the quick oops here, and agree it was a mistake (but fixing it as quickly as you did I think basically made up for all the costs, and I greatly appreciate it).
Just to clarify, I don’t want to make a strong statement that it’s worth updating the data and maintaining the dashboard. By my lights it would be good enough to just have a static snapshot of it forever. The thing that seemed so costly to me was breaking old links and getting rid of data that you did think was correct.
Thanks again!
I suspect fixing this would need to involve creating something new which doesn’t have the structural problems in EA which produced this, and would involve talking to people who are non-sensationalist EA detractors but who are involved with similarly motivated projects. I’d start here and skip past the ones that are arguing “EA good” to find the ones that are “EA bad, because [list of reasons ea principles are good, and implication that EA is bad because it fails at its stated principles]”
I suspect, even without seeking that out, the spirit of EA that made it ever partly good has already and will further metastasize into genpop.
Hi! A quick note: I created the CEA Dashboard which is the 2nd link you reference. The data here hadn’t been updated since August 2024, and so was quite out of date at the time of your comment. I’ve now taken this dashboard down, since I think it’s overall more confusing than helpful for grokking the state of CEA’s work. We still intend to come back and update it within a few months.
Just to be clear on why / what’s going on:
I stopped updating the dashboard in August because I started getting busy with some other projects, and my manager & I decided to deprioritize this. (There are some manual steps needed to keep the data live).
I’ve now seen several people refer to that dashboard as a reference for how CEA is doing in ways I think are pretty misleading.
We (CEA) still intend to come back and fix this, and this is a good nudge to prioritize it.
Thanks!
Oh, huh, that seems very sad. Why would you do that? Please leave up the data that we have. I think it’s generally bad form to break links that people relied on. The data was accurate as far as I can tell until August 2024, and you linked to it yourself a bunch over the years, don’t just break all of those links.
I am pretty up-to-date with other EA metrics and I don’t really see how this would be misleading. You had a disclaimer at the top that I think gave all the relevant context. Let people make their own inferences, or add more context, but please don’t just take things down.
Unfortunately, archive.org doesn’t seem to have worked for that URL, so we can’t even rely on that to show the relevant data trends.
Edit: I’ll be honest, after thinking about it for longer, the only reason I can think of why you would take down the data is because it makes CEA and EA look less on an upwards trajectory. But this seems so crazy. How can I trust data coming out of CEA if you have a policy of retracting data that doesn’t align with the story you want to tell about CEA and EA? The whole point of sharing raw data is to allow other people to come to their own conclusions. This really seems like such a dumb move from a trust perspective.
I also believe that the data making EA+CEA looks bad is the causal reason why it was taken down. However, I want to add some slight nuance.
I want to contrast a model whereby Angelina Li did this while explicitly trying to stop CEA from looking bad, versus a model whereby she senses that something bad might be happening, she might be held responsible (e.g. within her organization / community), and is executing a move that she’s learned is ‘responsible’ from the culture around her.
I think many people have learned to believe the reasoning step “If people believe bad things about my team I think are mistaken with the information I’ve given them, then I am responsible for not misinforming people, so I should take the information away, because it is irresponsible to cause people to have false beliefs”. I think many well-intentioned people will say something like this, and that this is probably because of two reasons (borrowing from The Gervais Principle):
This is a useful argument for powerful sociopaths to use when they are trying to suppress negative information about themselves.
The clueless people below them in the hierarchy need to rationalize why they are following the orders of the sociopaths to prevent people from accessing information. The idea that they are ‘acting responsibly’ is much more palatable than the idea that they are trying to control people, so they willingly spread it and act in accordance with it.
A broader model I have is that there are many such inference-steps floating around the culture that well-intentioned people can accept as received wisdom, and they got there because sociopaths needed a cover for their bad behavior and the clueless people wanted reasons to feel good about their behavior; and that each of these adversarially optimized inference-steps need to be fought and destroyed.
I agree, and I am a bit disturbed that it needs to be said.
At normal, non-EA organizations—and not only particularly villainous ones, either! -- it is understood that you need to avoid sharing any information that reflects poorly on the organization, unless it’s required by law or contract or something. The purpose of public-facing communications is to burnish the org’s reputation. This is so obvious that they do not actually spell it out to employees.
Of COURSE any organization that has recently taken down unflattering information is doing it to maintain its reputation.
I’m sorry, but this is how “our people” get taken for a ride. Be more cynical, including about people you like.
(Comment not specific to the particulars of this issue but noted as a general policy:) I think that as a general rule, if you are hypothesizing reasons for why somebody might say a thing, you should always also include the hypothesis that “people say a thing because they actually believe in it”. This is especially so if you are hypothesizing bad reasons for why people might say it.
It’s very annoying when someone hypothesizes various psychological reasons for your behavior and beliefs but never even considers as a possibility the idea that maybe you might have good reasons to believe in it. Compare e.g. “rationalists seem to believe that superintelligence is imminent; I think this is probably because that lets them avoid taking responsibility about their current problems if AI will make those irrelevant anyway, or possibly because they come from religious backgrounds and can’t get over their subconscious longing for a god-like figure”.
I feel more responsibility to be the person holding/tracking the earnest hypothesis in a 1-1 context, or if I am the only one speaking; in larger group contexts I tend to mostly ask “Is there a hypothesis here that isn’t or likely won’t be tracked unless I speak up” and then I mostly focus on adding hypotheses to track (or adding evidence that nobody else is adding).
(Did Ben indicate he didn’t consider it? My guess is he considered it, but thinks it’s not that likely and doesn’t have amazingly interesting things to say on it.
I think having a norm of explicitly saying “I considered whether you were saying the truth but I don’t believe it” seems like an OK norm, but not obviously a great one. In this case Ben also responded to a comment of mine which already said this, and so I really don’t see a reason for repeating it.)
(I read
as implying that the list of reasons is considered to exhaustive, such that any reasons besides those two have negligible probability.)
I gave my strongest hypothesis for why it looks to me that many many people believe it’s responsible to take down information that makes your org look bad. I don’t think alternative stories have negligible probability, nor does what I wrote imply that, though it is logically consistent with that.
There are many anti-informative behaviors that are widespread for which people do for poor reasons, like saying that their spouse is the best spouse in the world, or telling customers that their business is the best business in the industry, or saying exclusively glowing things about people in reference letters, that are best explained by the incentives on the person to present themselves in the best light; at the same time, it is respectful to a person, while in dialogue with them, to keep a track of the version of them who is trying their best to have true beliefs and honestly inform others around them, in order to help them become that person (and notice the delta between their current behavior and what they hopefully aspire to).
Seeing orgs in the self-identified-EA space take down information that makes them look bad is (to me) not that dissimilar to the other things I listed.
I think it’s good to discuss norms about how appropriate it is to bring up cynical hypotheses about someone during a discussion in which they’re present. In this case I think raising this hypothesis was worthwhile it for the discussion, and I didn’t cut off any way for the person in question to continue to show themselves to be broadly acting in good faith, so I think it went fine. Li replied to Habryka, and left a thoughtful pair of comments retracting and apologizing, which reflected well on them in my eyes.
Okay! Good clarification.
To clarify, my comment wasn’t specific to the case where the person is present. There are obvious reasons why the consideration should get extra weight when the person is present, but there’s also a reason to give it extra weight if none of the people discussed are present—namely that they won’t be able to correct any incorrect claims if they’re not around.
Agree.
(As I mentioned in the original comment, the point I made was not specific to the details of this case, but noted as a general policy. But yes, in this specific case it went fine.)
Quick thoughts on this:
“The data was accurate as far as I can tell until August 2024”
I’ve heard a few reports over the last few weeks that made me unsure whether the pre-Aug data was actually correct. I haven’t had time to dig into this.
In one case (e.g. with the EA.org data) we have a known problem with the historical data that I haven’t had time to fix, that probably means the reported downward trend in views is misleading. Again I haven’t had time to scope the magnitude of this etc.
I’m going to check internally to see if we can just get this back up in a week or two (It was already high on our stack, so this just nudges up timelines a bit). I will update this thread once I have a plan to share.
I’m probably going to drop responding to “was this a bad call” and prioritize “just get the dashboard back up soon”.
More thoughts here, but TL;DR I’ve decided to revert the dashboard back to its original state & have republished the stale data. (Just flagging for readers who wanted to dig into the metrics.)
Hey! I just saw your edited text and wanted to jot down a response:
I’m sorry this feels bad to you. I care about being truth seeking and care about the empirical question of “what’s happening with EA growth?”. Part of my motivation in getting this dashboard published in the first place was to contribute to the epistemic commons on this question.
I also disagree that CEA retracts data that doesn’t align with “the right story on growth”. E.g. here’s a post I wrote in mid 2023 where the bottom line conclusion was that growth in meta EA projects was down in 2023 v 2022. It also publishes data on several cases where CEA programs grew slower in 2023 or shrank. TBH I also think of this as CEA contributing to the epistemic commons here — it took us a long time to coordinate and then get permission from people to publish this. And I’m glad we did it!
On the specific call here, I’m not really sure what else to tell you re: my motivations other than what I’ve already said. I’m going to commit to not responding further to protect my attention, but I thought I’d respond at least once :)
I would currently be quite surprised if you had taken the same action if I was instead making an inference that positively reflects on CEA or EA. I might of course be wrong, but you did do it right after I wrote something critical of EA and CEA, and did not do it the many other times it was linked in the past year. Sadly your institution has a long history of being pretty shady with data and public comms this way, and so my priors are not very positively inclined.
I continue to think that it would make sense to at least leave the data up that CEA did feel comfortable linking in the last 1.5 years. By my norms invalidating links like this, especially if the underlying page happens to be unscrapeable by the internet archive, is really very bad form.
I did really appreciate your mid 2023 post!
I spent 8 years working in strategy departments for Ad Agencies. If you’re interested in the science behind brand tracking, I recommend you check out the Ehrenberg-Bass Institutes work on Category Entry Points: https://marketingscience.info/research-services/identifying-and-prioritising-category-entry-points/