Key takeaways from our EA and alignment research surveys
Many thanks to Spencer Greenberg, Lucius Caviola, Josh Lewis, John Bargh, Ben Pace, Diogo de Lucena, and Philip Gubbins for their valuable ideas and feedback at each stage of this project—as well as the ~375 EAs + alignment researchers who provided the data that made this project possible.
Background
Last month, AE Studio launched two surveys: one for alignment researchers, and another for the broader EA community.
We got some surprisingly interesting results, and we’re excited to share them here.
We set out to better explore and compare various population-level dynamics within and across both groups. We examined everything from demographics and personality traits to community views on specific EA/alignment-related topics. We took on this project because it seemed to be largely unexplored and rife with potentially-very-high-value insights. In this post, we’ll present what we think are the most important findings from this project.
Meanwhile, we’re also sharing and publicly releasing a tool we built for analyzing both datasets. The tool has some handy features, including customizable filtering of the datasets, distribution comparisons within and across the datasets, automatic classification/regression experiments, LLM-powered custom queries, and more. We’re excited for the wider community to use the tool to explore these questions further in whatever manner they desire. There are many open questions we haven’t tackled here related to the current psychological and intellectual make-up of both communities that we hope others will leverage the dataset to explore further.
(Note: if you want to see all results, navigate to the tool, select the analysis type of interest, and click ‘Select All.’ If you have additional questions not covered by the existing analyses, the GPT-4 integration at the bottom of the page should ideally help answer them. The code running the tool and the raw anonymized data are both also publicly available.)
We incentivized participation by offering to donate $40 per eligible[1] respondent—strong participation in both surveys enabled us to donate over $10,000 to both AI safety orgs as well as a number of different high impact organizations (see here[2] for the exact breakdown across the two surveys). Thanks again to all of those who participated in both surveys!
Three miscellaneous points on the goals and structure of this post before diving in:
Our goal here is to share the most impactful takeaways rather than simply regurgitating every conceivable result. This is largely why we are also releasing the data analysis tool, where anyone interested can explore the dataset and the results at whatever level of detail they please.
This post collectively represents what we at AE found to be the most relevant and interesting findings from these experiments. We sorted the TL;DR below by perceived importance of findings. We are personally excited about pursuing neglected approaches to alignment, but we have attempted to be as deliberate as possible throughout this write-up in striking the balance between presenting the results as straightforwardly as possible and sharing our views about implications of certain results where we thought it was appropriate.
This project was descriptive and exploratory in nature. Our goal was to cast a wide psychometric net in order to get a broad sense of the psychological and intellectual make-up of both communities. We used standard frequentist statistical analyses to probe for significance where appropriate, but we definitely still think it is important for ourselves and others to perform follow-up experiments to those presented here with a more tightly controlled scope to replicate and further sharpen the key results we present here.
Seven key results and implications
Here we present each key result, ordered by perceived relevance, as well as what we think are the fundamental implications of that result. We hyperlink each result to the associated sections in this post for easier navigation.
(Please note that there are also a bunch of miscellaneous results that people have found interesting that are not included in this list or in the main body of the piece.)
Alignment researchers don’t think the field is poised to solve alignment
Result: Alignment researchers generally do not believe that current alignment research is on track to solve alignment and do not think that current research agendas exhibit strong coverage of the full space of plausible alignment approaches. However, alignment researchers did prove impressively accurate at predicting the research community’s overall views on the relative promise of various technical alignment research directions (as systematized by Shallow review of live agendas in alignment & safety).
Implications: Alignment researchers’ general models of the field are well-calibrated, but the fact that they don’t think the field is on track to solve alignment suggests that additional approaches should be pursued beyond what is currently being undertaken—a view which was also echoed continuously throughout alignment researchers’ free responses. We think this results lends additional credence to pursuing neglected approaches to alignment.
Capabilities and alignment research not viewed as mutually exclusive
Result: Alignment researchers generally disagree with statements like ‘alignment research that has some probability of advancing capabilities should not be done’ and ‘advancing AI capabilities and doing alignment research are mutually exclusive goals.’ Interestingly, researchers also erroneously predicted that the community would generally view safety and capabilities work as incompatible.
Implications: This finding merits a more precise follow-up and discussion to better understand what exactly alignment researchers believe the relationship is and ideally should be between AI alignment and capabilities research—especially given that roughly two-thirds of alignment researchers also seem to support pausing or slowing AI development. Our general interpretation of this cluster of findings is that alignment researchers believe that capabilities research is proceeding so quickly and aggressively that the probability of alignment research being a meaningful contributor to further capabilities speed-ups is actually low—despite mispredicting that other alignment researchers would view this probability as higher. This alignment-versus-capabilities position is potentially quite action-guiding for policy efforts as well as technical alignment work.
Overestimating the perceived value of intelligence, underestimating ‘softer’ skills
Result: We find in both the EA and alignment communities—but more dramatically in the EA sample—that respondents significantly overestimate (e.g., by a factor of ~2.5 for EAs) how much high intelligence is actually valued in the community. EAs also tend to underestimate how much the community actually values ‘softer’ skills like having a strong work ethic, ability to collaborate, and people skills.
Implications: Those in charge of hiring/funding/bringing people into these communities should consider (at least as a datapoint) what skills and traits are actually most valued within that community. They should probably treat high intelligence as something more like a necessary-but-not-unilaterally-sufficient trait rather than the ultimate criterion. We agree that softer skills like a strong work ethic and a strong ability to collaborate can render highly intelligent individuals dramatically more effective at driving results[3].
EAs have lukewarm views about longtermism
Result: EAs (actively involved across 10+ cause areas) generally seem to think that AI risk and x-risk are less promising cause areas than ones like global health and development and animal welfare—despite the EA community’s own strong predictions that EAs would consider AI risk and x-risk to be the most promising cause areas. EAs also predicted the EA community would view its own shift towards longtermism positively, but the community’s actual views on its own longtermist shift skew slightly negatively.
Implications: It is important to caveat this finding with the fact that, overall, EAs still view AI risk and x-risk as promising in an absolute sense, despite seeming to consider more ‘classic’ EA cause areas as relatively more promising overall. We think this result merits follow-up and independent replication and invites further discussion within the EA community about the optimal allocation of time, resources, attention, etc. between classic cause areas and longtermist ones.
Alignment researchers think AGI >5 years away
Result: Alignment researchers generally do not expect there to be AGI within the next five years—but erroneously predict that the alignment research community does generally expect this.
Implications: Perceived timelines will naturally calibrate the speed and intensity of research being undertaken. If most AI safety researchers think they have >5 years to attempt to solve alignment, it might be worth funding and pursuing additional ‘expedited’ research agendas in the chance that AGI comes sooner than this.
Moral foundations of EAs and alignment researchers
Result: EAs and alignment researchers have reasonably distinct moral foundations. We tested a model of moral foundations that uses three factors: traditionalism, compassion, and liberty. While both communities place low value in traditionalism, EAs seem to value compassion significantly more than alignment researchers. By contrast, both communities are fairly normally distributed in valuing liberty, but alignment researchers tend to skew towards liberty and EAs tend to skew away from it.
Implications: EAs may be more receptive to work with more straightforwardly humanitarian outcomes than alignment researchers (as is indeed demonstrated elsewhere in our results). In general, the generally-normally-distributed nature of both populations on the moral foundation of liberty suggests that this value is either considered orthogonal[4] to these communities’ guiding philosophies (which seems less likely to us) or otherwise underexplored in relation to them (which seems more likely to us).
Personality traits and demographics of EAs and alignment researchers
Result: Both EAs and alignment researchers score significantly higher than the general population in neuroticism, openness, conscientiousness, extraversion (ordered here by the magnitude of the delta). EAs score significantly higher than alignment researchers in both agreeableness and conscientiousness. Males outnumber females 2 to 1 in EA and 9 to 1 in alignment. Both communities lean left politically and exhibit a diversity of other (albeit nonconservative) political views, but EAs appear to be significantly more progressive overall.
Implications: Both communities’ heightened sensitivity to negative emotion and risk aversion may be part of what motivates interest in the causes associated with EA/alignment—but these traits may also prevent bold and potentially risky work from being pursued where it might be necessary to do so. Alignment researchers should also probably put explicit effort into recruiting highly qualified female researchers, especially given that current female alignment researchers generally do seem to have meaningfully different views on foundational questions about alignment.
Survey contents and motivation
We launched two surveys: one for technical alignment researchers, and another for EA community members (who are explicitly not involved in technical alignment efforts). Both surveys largely shared the same structure.
First, we asked for general demographic information, including the extent to which the respondent has engaged with the associated community, as well as the nature of the role they currently play in their community.
Next, we had respondents answer a series of Likert scale questions from a set of well-validated psychometric scales, including the Five Factor Model (‘Big Five’), an updated version of the Moral Foundations Questionnaire (MFQ), and a number of other miscellaneous scales (probing things like risk-taking, delay discounting, self-control, and communal orientation). We included these questions because we think it is important to better understand the dominant cognitive and behavioral traits at play in the EA/alignment communities, especially with an eye towards how these mechanisms might help uncover what otherwise-promising research directions are currently being neglected.
In the final part of each survey, we asked people to respond on five-point Likert scales (strongly disagree, somewhat disagree, …, strongly agree) to statements related to specific topics in EA/alignment. These items were first framed in the general form ‘I think X’ (e.g., I think that effective altruism is a force for good in the world) and subsequently framed in the general form ‘I think the community believes X’ (e.g., I think the EA community as a whole believes that effective altruism is a force for good in the world).
Our motivation in this final section was two-fold: (1) we can straightforwardly understand the distribution of both communities’ views on a given relevant topic, but also (2) we can compare this ground truth distribution against individuals’ predictions of the community’s views in order to probe for false-consensus-effect-style results. Interestingly, we indeed found that both communities significantly mispredict their own views on key questions.
Who took these surveys?
Approximately 250 EAs and 125 alignment researchers. We recruited virtually all of these participants by simply posting on LW and the EA Forum, where we asked each community to fill out their associated survey via a simple Google Form.
We found that each sample includes people working across a wide diversity of research orgs and cause areas at varying levels of seniority. For instance, 18% of the alignment sample self-identifies as actively leading or helping to lead an alignment org, and significant numbers of EAs were sampled from virtually every cause area we listed (see plots below).
Here is the full list of the alignment orgs who had at least one researcher complete the survey (and who also elected to share what org they are working for): OpenAI, Meta, Anthropic, FHI, CMU, Redwood Research, Dalhousie University, AI Safety Camp, Astera Institute, Atlas Computing Institute, Model Evaluation and Threat Research (METR, formerly ARC Evals), Apart Research, Astra Fellowship, AI Standards Lab, Confirm Solutions Inc., PAISRI, MATS, FOCAL, EffiSciences, FAR AI, aintelope, Constellation, Causal Incentives Working Group, Formalizing Boundaries, AISC.
Of note, the majority of alignment researchers are under 30, while the majority of EAs are over 30. Males outnumber females approximately 2 to 1 in EA—but almost 9 to 1 in alignment. While this gender distribution is not unfamiliar in engineering spaces, it certainly seems worth explicitly highlighting, especially to the degree that male and female alignment researchers do seem to exhibit meaningfully different views about the core aims of alignment research (including, critically, the very question of whether alignment research explicitly requires an engineering-style background).
Overall, we find that approximately 55% of alignment researchers identify as politically progressive to some extent, while approximately 80% of EAs identify in the same way. While there appear to be a negligible number of self-identified conservatives in either community (n=4 in alignment, n=2 in EA), there do appear to be a diversity of other political views at play in both samples (including a significant number of highly unique written-in affiliations/leanings across both samples that we somewhat crudely lumped under ‘Other’). It is worth noting that the lack of self-identified conservatives could fuel similar problems as has been well-documented in academia, especially to the degree that policy advocacy is becoming an increasingly prominent cause area of both communities.
Roughly 65% of EA respondents and 40% of alignment researchers have been involved in the space for 2 or more years. EA respondents demonstrate significant diversity in the cause area in which they are actively involved, and the alignment dataset is shown to include researchers at various stages of their careers, including a significant sample of researchers who are actively leading alignment organizations.
(As with each part of this write-up, there are numerous additional results in this section to explore that are not explicitly called out here. We also want to call out that we generally opted to keep both samples intact in subsequent analyses and found that adopting additional exclusion criteria for either population does not statistically affect the key results reported here; the community can easily further filter either dataset however they see fit using the data analysis tool.)
Community views on specific topics (ground truth vs. predictions)
We asked each community to rate the extent to which they agreed with a number of specific claims in the general form, ‘I think X’ (e.g., I think EA is a force for good in the world). Later on, we asked respondents to predict how their community in general would respond to these same questions in the general form, ‘I think the EA/alignment community as a whole believes X’ (e.g., I think the EA community as a whole believes that EA is a force for good in the world). In this way, we position ourselves to be able to address two important questions simultaneously:
What do the ground truth distributions of views on specific field-level topics look like within the EA and alignment communities?
How do these ground truth distributions compare to the community’s prediction of these distributions? In slightly less statistical language—how well does each community actually know itself?
Cause area prioritization (ground truth vs. predictions)
We asked each community to rate the extent to which they considered a large number of relevant cause areas/research directions to be promising—and proceeded to compare these distributions to each community’s predictions of how others would respond in general.
EA community
For the sake of demonstrating this section’s key results as clearly as possible, we translate each available Likert scale option to a number of ‘points’ (‘very unpromising’ = −2, ‘somewhat unpromising’ = −1, …, ‘very promising’ = +2) and proceed to tally the total actual and predicted points allotted to each cause area/research direction. Presented with the core topics of effective altruism, here is how the EA community sample’s ground truth and predicted point allotments look:
We find that EAs generally believe that global health and development, farmed/wild animal welfare, and cause prioritization/effective giving are the most promising cause areas—but EAs themselves thought that EAs would consider AI risk and general existential risk are most promising (predicted mean = 4.43, actual mean = 3.84; U = 14264, p ≈ 0). The magnitude of the misprediction here—particularly with respect to AI risk—was quite surprising to us (potentially by definition, given the nature of the result). To be clear, most EAs do think AI risk is ‘somewhat promising,’ but overwhelmingly predicted the community would consider AI risk ‘very promising.’ EAs’ generally lukewarm feelings towards longtermist causes are demonstrated in a few places in our results.
Interestingly, the causes that currently receive the most funding align more closely with the EA community’s predictions rather than the ground-truth distributions. It seems this misalignment may therefore be more straightforwardly understood as key funders like Open Philanthropy viewing x-risk as significantly more important than the general EA community, and EAs reflecting this preference in their perceptions of the community writ large.
(We personally consider it important to note here that we certainly don’t think funding alignment should be deprioritized, and that AI-related risks clearly qualify as essential to address under the ITN framework. We are excited that Open Phil plans to double its Global Catastrophic Risk (GCR) funding over the next few years. We ourselves wish that orders of magnitude more AI safety orgs, individual researchers, and for-profit AI-safety-driven businesses were being funded—and we suspect far more will be funded as AI development accelerates and the mainstream comes to care far more about making sure AI is built safely.[5])
Alignment community
By contrast, the alignment community proved impressively accurate at predicting their own views on the relative promise of various alignment research directions as captured by the rough factor structure presented in Shallow review of live agendas in alignment & safety:
This result indicates that alignment researchers are most excited about evals and interpretability work, followed by various prosaic alignment approaches (eliminating deception, finetuning/model edits, goal robustness, etc.), are relatively less excited about ‘make the AI solve it’ approaches (the most prominent example being superalignment), and are even less excited about more theoretical approaches, including provably safe architectures, corrigibility, and the like. This result also clearly demonstrates that alignment researchers are well-calibrated in understanding that the community has this general prioritization.
As an organization that is particularly interested in pursuing neglected approaches (which would likely all fall into the unpopular ‘theory work’ bin), we certainly think it is worth cautioning (as many others did in free response questions) that this result only tells us what the idiosyncratic set of current alignment researchers think about what should be pursued within the general constraints of the Shallow review framework. We do not think it is valid to conclude from results like this that people should stop doing theory work and all become mechanistic interpretability researchers.
The prioritization here should also be tempered with the parallel findings that alignment researchers generally think (1) that current alignment research (i.e., everything encompassed by the Shallow review framework) is not on track to solve alignment before we get AGI, and (2) that the current research landscape does not demonstrate strong coverage of the space of plausible approaches:
Taken together, these results reinforce to us that additional neglected approaches to alignment are very much worth identifying and pursuing. We suspect that alignment researchers are most excited about evals and interpretability work because they feel they can make more direct, tangible, measurable, and prestigious[6] progress in them in the short-term—but that these approaches appear to be something of a local optimum in the current research landscape rather than the global best strategy that will solve alignment.
Other interesting field-level distributions (ground truth vs. predictions)
In addition to cause/research area prioritization, we asked both communities to share the extent to which they agreed with a number of claims specific to their respective communities. All of these distributions are available here; in this section, we will only highlight and comment on what we think are the most relevant couple of results for each community.
EA community
Dovetailing with the earlier EA cause area finding, we also find that EAs are fairly heterogeneous with a slight negative skew towards the related claims that longtermist causes should be the primary focus of EA and that EA’s shift towards longtermism was positive (for both, only ~25% agree to some extent)—but the community predicted a strongly positive skew (for both, that >40% would agree to some extent).
We also find in both datasets—but most dramatically in the EA community sample, plotted below—that respondents vastly overestimate (≈2.5x) how much high intelligence is actually valued, and underestimate other cognitive features like having strong work ethics, abilities to collaborate, and people skills. One potentially clear interpretation of this finding is that EAs/alignment researchers actually believe that high intelligence is necessary but not sufficient for being impactful—but perceive other EAs/alignment researchers as thinking high intelligence is basically sufficient. The community aligning on these questions seems of very high practical importance for hiring/grantmaking criteria and decision-making.
Finally—and not entirely unrelatedly—we highlight the finding that EAs have diverse views on the most important areas for upskilling (options pulled directly from 80000 Hours’ skills list). While generally well-calibrated, the community appears to overestimate the predicted value of upskilling in ‘harder’ skills like research and coding, while underestimating the predicted value of ‘softer’ skills like communicating ideas and being good with people. Overall, EAs think (and predict correctly) that gaining expertise relevant to a top problem is the most valuable area to upskill.
Alignment community
We asked alignment researchers multiple questions to evaluate the extent to which they generally view capabilities research and alignment research as compatible.[7] Interestingly, researchers predicted that the community would view progress in capabilities and alignment as fundamentally incompatible, but the community actually skews fairly strongly in the opposite direction—ie, towards thinking that capabilities and alignment are decidedly not mutually exclusive. As described earlier, our general interpretation of this cluster of findings is that alignment researchers believe that capabilities research is proceeding so hastily that the probability of alignment research being a meaningful contributor to further capabilities speed-ups is actually low—despite mispredicting that other alignment researchers would view this probability as higher.
We find this mismatch particularly interesting for our own alignment agenda and intend to follow up on the implications of this specific development in later work.
Another relevant misprediction of note relates to AGI timelines. Most alignment researchers do not actively expect there to be AGI in the next five years—but incorrectly predict that other alignment researchers do expect this in general. In other words, this distribution’s skew was systematically mispredicted. Similar distributions can be seen for the related item, ‘I expect there will be superintelligent AI in the next five years.’
Finally, we share here that the majority of alignment researchers (>55%) agree to some extent that alignment should be a more multidisciplinary field, despite community expectations of a more lukewarm response to this question.
Personality, values, moral foundations
Background on the Big Five
There are many different models of personality (≈ ‘broad patterns of behavior and cognition over time’). The Five Factor Model, or ‘Big Five,’ is widely considered to be the most scientifically rigorous personality model (though it certainly isn’t without its own criticisms). It was developed by performing factor analyses on participants’ ratings over thousands of self-descriptions, and has been generally replicated cross-culturally and over time. Big Five scores for a given individual are also demonstrated to remain fairly consistent over the lifespan. For these reasons, we used this model to measure personality traits in both the EA and the alignment samples.
(We show later on that Big Five + Moral Foundations scores can be used to predict alignment-specific views of researchers significantly above chance level, demonstrating that these tools are picking up on some predictive signal.)
The five factors/traits are as follows:
Openness: Creativity and willingness to explore new experiences. Lower scores indicate a preference for routine and tradition, while higher scores denote a readiness to engage with new ideas and experiences.
Conscientiousness: Organization, thoroughness, and responsibility. Individuals with lower scores might tend towards spontaneity and flexibility, whereas those with higher scores demonstrate meticulousness and reliability.
Extraversion: Outgoingness, energy, and sociability. Lower scores are characteristic of introverted, reflective, and reserved individuals, while higher scores are indicative of sociability, enthusiasm, and assertiveness.
Agreeableness: Cooperativeness, compassion, and friendliness. Lower scores may suggest a more competitive or skeptical approach to social interactions, whereas higher scores reflect a predisposition towards empathy and cooperation.
Neuroticism: Tendency and sensitivity towards negative emotionality. Lower scores suggest emotional stability and resilience, whereas higher scores indicate a greater susceptibility to stress and mood swings.
Personality similarities and differences
In general, the results of the Big Five assessment we administered indicate that both EAs and alignment researchers tend to be fairly extraverted, moderately neurotic, intellectually open-minded, generally industrious, and generally quite compassionate. Compared to the general population, both EAs and alignment researchers are significantly more extraverted, conscientious, neurotic, and open. Only EAs are significantly more agreeable than the general population—alignment researchers score slightly lower in agreeableness than the general population mean (but not significantly so).
This result is not the first to demonstrate that the psychological combination of intellectualism (≈ openness), competence (≈ conscientiousness), and compassion (≈ agreeableness) corresponds intuitively to the core philosophies of effective altruism/AI alignment.
It is also somewhat unsurprising that two key differentiators between both communities and the general population appear to be (1) significantly higher sensitivity to negative emotion and (2) significantly higher openness. It seems clear that individuals attracted to EA/alignment are particularly calibrated towards avoidance of negative long-term outcomes, which seems to be reflected not only in both communities’ higher neuroticism scores, but also in our measurements of fairly tepid attitudes towards risk-taking in general (particularly in the alignment community). Additionally, higher openness should certainly be expected in communities organized around ideas, rationality, and intellectual exchange. However, it also seems likely that EAs and alignment researchers may score significantly higher in intellect (often described as ‘truth-oriented’)—one of the two aspects/constituent factors of trait openness—than openness to experience (often described as ‘beauty-oriented’). Pinning down this asymmetry more precisely seems like one interesting direction for follow-up work.
Though it was out of scope for this report, we are also excited about better understanding the extent to which there might be ‘neglected’ personalities in both EA and alignment—i.e., whether there are certain trait configurations that are typically associated with research/organizational success that are currently underrepresented in either community. To give one example hypothesis, it may be the case that consistently deprioritizing openness to experience (beauty-orientedness) in favor of intellect (truth-orientedness) may lead to organizational and research environments that prevent the most effective and resonant possible creative/intellectual work from being done. We are also interested in better understanding whether there is a clear relationship between ‘neglected’ personalities and neglected approaches to alignment—that is, to what degree including (or not including) specific kinds of thinkers in alignment would have a predictable impact on research directions.
In spite of significant trait similarities across the two communities, we also find that EA respondents on average are more conscientious (t=2.7768, p=0.0058) and more agreeable (t=3.0674, p=0.0023) than the alignment community respondents, while alignment researchers tend to be slightly (though not statistically significantly) higher in openness. It is possible that EAs are more broadly people-oriented (or otherwise select for this) given their prioritization of explicitly-people-(or-animal)-related causes. It is also possible that the relative concreteness of EA cause areas, as compared to the often-theoretical world of technical AI safety research, may lend itself to slightly more day-to-day, industrious types.
These differences are mostly being driven by significantly different distributions on key self-reports related to each trait, for instance:
EAs and alignment researchers have significantly different moral foundations
Moral foundations theory posits that the latent variables underlying moral judgments are modularized to some extent and are validly captured (like the Big Five) via factor analysis/dimensionality reduction techniques. We directly operationalize this paper in our implementation of the Moral Foundations Questionnaire (MFQ), which finds three clear factors underlying the original model:
Traditionalism: Values social stability, respect for authority, and community traditions, emphasizing loyalty and social norms. Lower scores may lean towards change and flexibility, whereas higher scores uphold authority and tradition.
Compassion: Centers on empathy, care for the vulnerable, and fairness, advocating for treating individuals based on need rather than status. Lower scores might place less emphasis on individual care, while higher scores demonstrate deep empathy and fairness.
Liberty: Prioritizes individual freedom and autonomy, resisting excessive governmental control and supporting the right to personal wealth. Lower scores may be more accepting of government intervention, while higher scores champion personal freedom and autonomy.
We find in general that both EAs and alignment researchers score low on traditionalism, high on compassion, and are distributed roughly normally on liberty. However, EAs are found to score significantly higher in compassion (U=8349, p≈0), and alignment researchers are found to score significantly higher in liberty (U=16035, p≈0). Note that Likert items (strongly disagree, somewhat disagree, …, strongly agree) are represented numerically below, where 1 = strongly disagree, and so on.
Considering each of these three results in turn:
It is not very surprising that EAs and alignment researchers are low in traditionalism, which is typically associated with conservatism and more deontological/rule-based ethical systems. Worrying about issues like rogue AI and wild animal suffering might indeed be considered the epitome of ‘untraditional’ ethics. This result naturally pairs with the finding that there are virtually no conservative EAs/alignment researchers, which may have important implications for viewpoint diversity and neglected approaches in both communities.
Both alignment researchers and EAs clearly value compassion from a moral perspective, but EAs seem especially passionate—and more homogenous—in this respect. For instance:
Interestingly, both alignment researchers and EAs are generally-normally-distributed on liberty as a moral foundation, with alignment researchers demonstrating a slight positive skew (towards liberty) and EAs demonstrating a slight negative skew (away from liberty). A clear example of this dynamic can be seen here:
It is worth noting that while the philosophy of effective altruism/AI safety has a clear expected relationship to traditionalism (boo!) and compassion (yay!), it seems plausibly agnostic to liberty as a moral value, potentially explaining the generally-normally-distributed nature of both populations. This finding invites further reflection within both communities on how liberty as a moral foundation relates to their work. For example, the implementation details of an AI development pause seemingly have a clear relationship to liberty (as we actually demonstrate quantitatively later on). Given that alignment researchers seem to care both about liberty and AI x-risk, it would be interesting for follow-up work to better understand, for example, how researchers would react to a government-enforced pause.
Free responses from alignment survey
On the alignment survey, we asked respondents three questions that they could optionally write in responses to:
What, if anything, do you think is neglected in the current alignment research landscape? Why do you think it is neglected?
How would you characterize the typical alignment researcher? What are the key ways, if any, that you perceive the typical alignment researcher as unique from the typical layperson, the typical researcher, and/or the typical EA/rationalist type?
Do you have any other insights about the current state of alignment research that you’d like to share that seems relevant to the contents of this survey?
Given the quantity of the feedback and the fact that we ourselves have strong priors about these questions, we elected to simply aggregate responses for each question and pass them to an LLM to synthesize a coherent and comprehensive overview.
Here is that output (note: it is ~60% the length of this post), along with the anonymized text of the respondents.
Our four biggest takeaways from the free responses (consider this an opinionated TL;DR):
The field is seen as suffering from discoordination and a lack of consensus on research strategies, compounded by a community described as small, insular, and overly influenced by a few thought leaders. It is important to highlight the significant worries about the lack of self-correction mechanisms and objective measures of research impact, which suggests the need for further introspection on how the community evaluates progress and success. Both of these concerns appear to us as potentially highly impactful neglected ‘meta-approaches’ that would be highly worthwhile to fund and/or pursue further.
There were numerous specific calls for interdisciplinary involvement in alignment, including multiple calls for collaboration with cognitive psychologists and behavioral scientists. We were excited to see that brain-like AGI was highlighted as one neglected approach that was construed as both accessible and potentially-high-impact for new promising researchers entering the space.
The alignment community perceives itself to be distinguished by its members’ high intellectual capacity and mathematical ability, specialized technical knowledge, high agency, pragmatic altruism, and excellent epistemic practices. Distinct from typical EA/rationalist types, they’re noted for their STEM background, practical engagement with technical AI issues, and a combination of ambition with intrinsic motivation. They also believe they are perceived as less experienced and sometimes less realistic than their peers in cognitive sciences or typical ML researchers.
The community also shared concerns about the ambiguous standards defining alignment researchers, potentially skewing the field towards rewarding effective communication over substantive research progress. Critiques also extend to the research direction and quality, with some arguing that emphasis on intelligence may overlook creativity and diverse contributions (a finding we replicate in more quantitative terms elsewhere).
Concluding thoughts
Thanks again to both communities for their participation in these surveys, which has enabled all of the analysis presented here, as well as over $10k in donations to a set of very high impact orgs. We want to emphasize that we perceive this write-up to be a first pass on both datasets rather than the last word, and we’d like to strongly encourage those who are interested to explore the data analysis tool we built alongside this project (as well as the full, anonymized datasets). We suspect that there are other interesting results to be found that we have not yet uncovered and are very excited to see what else the community can unearth (please do share any results you find and we will add them to this post!).
One practical thought: we were most surprised by the community misprediction/false consensus effect results. Accordingly, we think it is probably worth probing alignment between (1) group X’s perception of group X’s views ‘as a whole’ and (2) group X’s actual views fairly regularly, akin to calibration training in forecasting. Group-level self-misperceptions are a clear coordination problem that should likely be explicitly minimized through some kind of active training or reporting process. (A more precise future tool might enable users to predict the full shape of the distribution to avoid noise in varying statistical interpretations of (1) above.)
To end on a positive note, we highlight one final significant community misprediction from the alignment survey:
This demonstrates that alignment researchers are significantly more optimistic than they anticipated about having made significant alignment progress before AGI is developed. In other words: alignment researchers currently don’t think that other alignment researchers are particularly hopeful about making progress, but they actually are! (Or at the very least, are explicitly not pessimistic.) So we’d like to strongly encourage researchers to go out and continue doing the hard work with this understanding in mind, particularly with respect to the more underexplored areas of the alignment research landscape.
Thanks very much for your engagement with this project, and we are looking forward to seeing what other interesting results the community can discover.
Appendix: other interesting miscellaneous findings (in no particular order)
Using temperament to predict alignment positions
An interesting (though not particularly actionable) classification result:
We show that respondents’ trait-level scores from the psychometric instruments deployed in both surveys can be used to predict alignment researchers’ positions on the various alignment-specific questions significantly above chance level using a simple Random Forest Classifier (with balanced class weights). Feature importances reveal that many such predictions are based on seemingly sensible features—for instance, for the statement, “I currently support pausing or dramatically slowing AI development,” the feature with the single highest importance is one’s liberty moral foundation score, which makes a good deal of sense. For the “promise seen in controlling the AI (deception, model edits, value alignment, goal robustness)” question, the single feature with the highest importance is, quite intriguingly, one’s own self-control score on the Brief Self-Control Scale.
The purpose of this analysis is to demonstrate that, while undoubtedly imperfect, these psychometric tools can indeed be used to help predict real-world psychological variables in sensible and interesting ways—which in turn can yield interesting practical implications for field-building, pursuing novel approaches, and the like.
Gender differences in alignment
We show here that female alignment researchers are slightly less likely to think of alignment as fundamentally related to control rather than coexistence, more likely to think that alignment should be more multidisciplinary, and slightly less likely to think that alignment researchers require a CS, math, physics, engineering, or similar background. Given that female researchers seem to have meaningfully different views on key questions about the nature of alignment research and are dramatically outnumbered by males (9 to 1), it may be worth explicitly attempting to recruit a larger number of well-qualified female alignment researchers into the fold.
EAs and alignment researchers exhibit very low future discounting rates
As additional convergent evidence supporting the they-are-who-they-say-they-are conclusion, both EAs and alignment researchers demonstrate very low future discounting rates as measured using a subset of questions from the Monetary Choice Questionnaire. (This tool basically can be thought of as a more quantitative version of the famous marshmallow test and has been shown to correlate with a number of real-world variables.) Having very low discounting rates makes quite a lot of sense for rationalist longtermist thinkers.
One particularly interesting finding related to this metric is that k-value correlates moderately (r=0.19, p=0.03) with support for pursuing theory work in alignment. One clear interpretation of this result might be that those who discount the future more aggressively—and who might have a diminished sense of the urgency of alignment research as a result—also think it is more promising to pursue alignment approaches that are less immediately practical (i.e., theory work).
EAs and alignment researchers aren’t huge risk-takers
We show that both EAs and alignment researchers are generally normally distributed with a slight negative skew on risk-taking as captured by the General Risk Propensity Scale, with less than 15% of individuals in either community displaying a strong risk-taking temperament (≥4 on the scale above). This effect is driven by example responses shown below the scale-level plot.
EAs are almost-perfectly-normally-distributed on some key EA questions
These plots show that EAs are almost perfectly normally distributed on (1) the extent to which they have a positive view of effective altruism’s overall shift towards longtermist causes, and (2) the extent to which they think the FTX crisis was a reflection of deeper problems with EA. These both may be questions that therefore require further adjudication within the community given the strong diversity of opinions on these fairly foundational issues.
Alignment researchers support a pause
It is very clear that alignment researchers generally support pausing or dramatically slowing AI development (>60% agreement), which naturally pairs with the finding that alignment researchers do not think we are currently on track to solve alignment before we get AGI.
Alignment org leaders are highly optimistic by temperament
In blue are respondents who actively lead alignment orgs, and in red are all other alignment researchers. We probed trait optimism (ie, not optimism about alignment specifically) in the survey using items like “I see myself as someone who is an optimist,” “...who has a ‘glass-half-full’ mentality,” etc. and found an interesting pocket of extremely optimistic alignment org leaders! This finding suggests an important (if somewhat obvious) motivating factor of good leaders: genuinely believing that effortfully pushing forward impactful work is likely to yield very positive outcomes.
[Any additional interesting results found by the community will be added here!]
- ^
We defined this as currently-grant-funded alignment researchers and EAs actively involved for >5h/week in a specific cause area.
- ^
Donations from alignment survey:
37 part- or full-time researchers chose AI Safety Camp (https://aisafety.camp/), totaling $1480 for this org.
26 part- or full-time researchers chose SERI MATS (https://www.matsprogram.org/), totaling $1040 for this org.
11 part- or full-time researchers chose FAR AI (https://far.ai/), totaling $440 for this org.
8 part- or full-time researchers chose CAIS (https://www.safe.ai/), totaling $320 for this org.
6 part- or full-time researchers chose FHI (https://www.fhi.ox.ac.uk/), totaling $240 for this org.
5 part- or full-time researchers chose Catalyze Impact (https://www.catalyze-impact.org/), totaling $200 for this org.
Donations from EA survey:
33 actively involved EAs chose GiveWell top charities fund, totaling $1320 for this org.
32 actively involved EAs chose Animal welfare fund, totaling $1280 for this org.
31 actively involved EAs chose Wild Animal Initiative, totaling $1240 for this org.
17 actively involved EAs chose Long term future fund, totaling $680 for this org.
10 actively involved EAs chose Lead Exposure Elimination Project, totaling $400 for this org.
7 actively involved EAs chose Good Food Institute, totaling $280 for this org.
6 actively involved EAs chose Faunalytics, totaling $240 for this org.
5 actively involved EAs chose The Humane League, totaling $200 for this org.
5 actively involved EAs chose Charity entrepreneurship, totaling $200 for this org.
4 actively involved EAs chose Against Malaria Foundation, totaling $160 for this org.
4 actively involved EAs chose StrongMinds, totaling $160 for this org.
3 actively involved EAs chose Nuclear Threat Initiative Biosecurity Program, totaling $120 for this org.
2 actively involved EAs chose Johns Hopkins Center For Health Security, totaling $80 for this org.
2 actively involved EAs chose Suvita, totaling $80 for this org.
2 actively involved EAs chose Malaria Consortium SMC programme, totaling $80 for this org.
1 actively involved EAs chose New Incentives, totaling $40 for this org.
Across both surveys, we are donating $10,280 to a diverse set of effective organizations.
- ^
It might be worthwhile to explore and pioneer structures to help individuals for whom these skills come less naturally work on them further—and/or surround these individuals with excellent people to bring out the best in them. This may be particularly necessary for extracting and implementing some very promising underexplored approaches from, eg, more disagreeable but brilliant individuals who might not otherwise implement them.
- ^
That is, knowing that someone is an alignment researcher/in the EA community doesn’t meaningfully help predict how much they will value liberty, but it does meaningfully help predict how much they will value both compassion and traditionalism.
- ^
We are also incidentally hopeful that these results may actually have implications for increased funding towards some neglected cause areas that could indirectly wind up benefiting alignment, by, for example, leading to a funding environment in which causes like cluster headaches and consciousness research and the best of human morality are prioritized, and that this in turn may be a part of the hodgepodge that solves alignment.
- ^
“Prestige is like a powerful magnet that warps even your beliefs about what you enjoy. It causes you to work not on what you like, but what you’d like to like.
That’s what leads people to try to write novels, for example. They like reading novels. They notice that people who write them win Nobel prizes. What could be more wonderful, they think, than to be a novelist? But liking the idea of being a novelist is not enough; you have to like the actual work of novel-writing if you’re going to be good at it; you have to like making up elaborate lies.
Prestige is just fossilized inspiration. If you do anything well enough, you’ll make it prestigious. Plenty of things we now consider prestigious were anything but at first. Jazz comes to mind — though almost any established art form would do. So just do what you like, and let prestige take care of itself.
Prestige is especially dangerous to the ambitious. If you want to make ambitious people waste their time on errands, the way to do it is to bait the hook with prestige. That’s the recipe for getting people to give talks, write forewords, serve on committees, be department heads, and so on. It might be a good rule simply to avoid any prestigious task. If it didn’t suck, they wouldn’t have had to make it prestigious.
Similarly, if you admire two kinds of work equally, but one is more prestigious, you should probably choose the other. Your opinions about what’s admirable are always going to be slightly influenced by prestige, so if the two seem equal to you, you probably have more genuine admiration for the less prestigious one.”—https://paulgraham.com/love.html
- ^
It is worth noting that two respondents noted that they thought these questions were phrased in an unclear way, which may be a potential source of noise in these results.
- Making a conservative case for alignment by 15 Nov 2024 18:55 UTC; 201 points) (
- Science advances one funeral at a time by 1 Nov 2024 23:06 UTC; 92 points) (
- The case for a negative alignment tax by 18 Sep 2024 18:33 UTC; 74 points) (
- Key takeaways from our EA and alignment research surveys by 4 May 2024 15:51 UTC; 64 points) (EA Forum;
- There Should Be More Alignment-Driven Startups by 31 May 2024 2:05 UTC; 60 points) (
Thank you so much for conducting this survey! I want to share some information on behalf of MATS:
In comparison to the AIS survey gender ratio of 9 M:F, MATS Winter 2023-24 scholars and mentors were 4 M:F and 12 M:F, respectively. Our Winter 2023-24 applicants were 4.6 M:F, whereas our Summer 2024 applicants were 2.6 M:F, closer to the EA survey ratio of 2 M:F. This data seems to indicate a large recent change in gender ratios of people entering the AIS field. Did you find that your AIS survey respondents with more AIS experience were significantly more male than newer entrants to the field?
MATS Summer 2024 applicants and interested mentors similarly prioritized research to “understand existing models”, such as interpretability and evaluations, over research to “control the AI” or “make the AI solve it”, such as scalable oversight and control/red-teaming, over “theory work”, such as agent foundations and cooperative AI (note that some cooperative AI work is primarily empirical).
The forthcoming summary of our “AI safety talent needs” interview series generally agrees with this survey’s findings regarding the importance of “soft skills” and “work ethic” in impactful new AIS contributors. Watch this space!
In addition to supporting core established AIS research paradigms, MATS would like to encourage the development of new paradigms. For better or worse, the current AIS funding landscape seems to have a high bar for speculative research into new paradigms. Has AE Studios considered sponsoring significant bounties or impact markets for scoping promising new AIS research directions?
Did survey respondents mention how they proposed making AIS more multidisciplinary? Which established research fields are more needed in the AIS community?
Did EAs consider AIS exclusively a longtermist cause area, or did they anticipate near-term catastrophic risk from AGI?
Thank you for the kind donation to MATS as a result of this survey!
Thanks for all these additional datapoints! I’ll try to respond all of your questions in turn:
Overall, there don’t appear to be major differences when filtering for amount of alignment experience. When filtering for greater than vs. less than 6 months of experience, it does appear that the ratio looks more like ~5 M:F; at greater than vs. less than 1 year of experience, it looks like ~8 M:F; the others still look like ~9 M:F. Perhaps the changes you see over the past two years at MATS are too recent to be reflected fully in this data, but it does seem like a generally positive signal that you see this ratio changing (given what we discuss in the post).
We definitely want to do everything we can to support increased exploration of neglected approaches—if you have specific ideas here, we’d love to hear them and discuss more! Maybe we can follow up offline on this.
We don’t appear to have gotten many practical proposals for how to make AIS more multidisciplinary, but there were a number of specific disciplines mentioned in the free responses, including cognitive psychology, neuroscience, game theory, behavioral science, ethics/law/sociology, and philosophy (epistemology was specifically brought up across multiple respondents). One respondent wrote, “AI alignment is dominated by computer scientists who don’t know much about human nature, and could benefit from more behavioral science expertise and game theory,” which I think captures the sentiment of many of the related responses most succinctly (however accurate this statement actually is!). Ultimately, encouraging and funding research at the intersection of these underexplored areas and alignment is likely the only thing that will actually lead to a more multidisciplinary research environment.
Unfortunately, I don’t think we asked the EA sample about AIS in a way that would allow us to answer this question using the data we have. This would be a really interesting follow-up direction. I will paste in below the ground truth distribution of EAs’ views on the relative promise of these approaches as additional context (eg, we see that the ‘AI risk’ and ‘Existential risk (general)’ distributions have very similar shapes), but I don’t think we can confidently say much about whether these risks were being conceptualized as short- or long-term.
It’s also important to highlight that in the alignment sample (from the other survey), researchers generally indicate that they do not think we’re going to get AGI in the next five years. Again, this doesn’t clarify if they think there are x-risks that could emerge in the nearer term from less-general-but-still-very-advanced AI, but it does provide an additional datapoint that if we are considering AI x-risks to be largely mediated by the advent of AGI, alignment researchers don’t seem to expect this as a whole in the very short term:
You might be interested in this breakdown of gender differences in the research interests of the 719 applicants to the MATS Summer 2024 and Winter 2024-25 Programs who shared their gender. The plot shows the difference between the percentage of male applicants who indicated interest in specific research directions from the percentage of female applicants who indicated interest in the same.
The most male-dominated research interest is mech interp, possibly due to the high male representation in software engineering (~80%), physics (~80%), and mathematics (~60%). The most female-dominated research interest is AI governance, possibly due to the high female representation in the humanities (~60%). Interestingly, cooperative AI was a female-dominated research interest, which seems to match the result from your survey where female respondents were less in favor of “controlling” AIs relative to men and more in favor of “coexistence” with AIs.
Epistemic status: just speculation, from a not very concrete memory, written hastily on mobile after a quick skim of the post.
My guess is that these results should be taken with a large grain of salt, but if I’m wrong, I’d be interested in hearing more about why.
Specifically, I think the “alignment researcher” population and “org leader” populations here are probably a far departure from what people envision when they hear these terms. I also expect other populations reported on to have a directionally similar skew to what I speculate below.
An anecdote for why I expect that (some aspects may be off):
I started the survey, based off the description that it’d be decently short. I found it long, involved, and asking various questions (marked as required) that I really wasn’t interested in answering (nor interested in the results of). IIRC it also had various ways in which the question phrasing was lacking. I accordingly abandoned it, while seeing there was still a long way to go to completion.
One additional factor for my abandoning it was that I couldn’t imagine it drawing a useful response population anyway; the sample mentioned above is a significant surprise to me (even with my skepticism around the makeup of that population). Beyond the reasons I already described, I felt that it being done by a for-profit org that is a newcomer and probably largely unknown would dissuade a lot of people from responding (and/or providing fully candid answers to some questions).
All in all, I expect that the respondent population skews heavily toward those who place a lower value on their time and are less involved. I expect this to generally be a more junior group, often not fully employed in these roles, with eg the average age and funding level of the orgs that are being led particularly low (and some of the orgs being more informal).
That’s a very legitimate and useful population to survey; I just think it also isn’t at all what people typically think of when hearing these terms.
I could be wrong about all of this! But my guess is it’s directionally useful for understanding this post.
Here is the full list of the alignment orgs who had at least one researcher complete the survey (and who also elected to share what org they are working for): OpenAI, Meta, Anthropic, FHI, CMU, Redwood Research, Dalhousie University, AI Safety Camp, Astera Institute, Atlas Computing Institute, Model Evaluation and Threat Research (METR, formerly ARC Evals), Apart Research, Astra Fellowship, AI Standards Lab, Confirm Solutions Inc., PAISRI, MATS, FOCAL, EffiSciences, FAR AI, aintelope, Constellation, Causal Incentives Working Group, Formalizing Boundaries, AISC.
~80% of the alignment sample is currently receiving funding of some form to pursue their work, and ~75% have been doing this work for >1 year. Seems to me like this is basically the population we were intending to sample.
Your expectation while taking the survey about whether we were going to be able to get a good sample does not say much about whether we did end up getting a good sample. Things that better tell us whether or not we got a good sample are, eg, the quality/distribution of the represented orgs and the quantity of actively-funded technical alignment researchers (both described above).
Note that the survey took people ~15 minutes to complete and resulted in a $40 donation being made to a high-impact organization, which puts our valuation of an hour of their time at ~$160 (roughly equivalent to the hourly rate of someone who makes ~$330k annually). Assuming this population would generally donate a portion of their income to high-impact charities/organizations by default, taking the survey actually seems to probably have been worth everyone’s time in terms of EV.
There’s a lot of overlap between alignment researchers and the EA community, so I’m wondering how that was handled.
It feels like it would be hard to find a good way of handling it: if you include everyone who indicated an affiliation with EA on the alignment survey it’d tilt the survey towards alignment people, in contrast if you exclude them then it seems likely it’d tilt the survey away from alignment people since people will be unlikely to fill in both surveys.
Regarding the support for various cause areas, I’m pretty sure that you’ll find the support for AI Safety/Long-Termism/X-risk is higher among those most involved in EA than among those least involved. Part of this may be because of the number of jobs available in this cause area.
Agree that there is inherent/unavoidable overlap. As noted in the post, we were generally cautious about excluding participants from either sample for reasons you mention and also found that the key results we present here are robust to these kinds of changes in the filtration of either dataset (you can see and explore this for yourself here).
With this being said, we did ask in both the EA and the alignment survey to indicate the extent to which they are involved in alignment—note the significance of the difference here:
From alignment survey:
From EA survey:
This question/result serves both as a good filtering criterion for cleanly separating out EAs from alignment researchers and also gives a pretty strong evidence that we are drawing on completely different samples across these surveys (likely because we sourced the data for each survey through completely distinct channels).
Interesting—I just tried to test this. It is a bit hard to find a variable in the EA dataset that would cleanly correspond to higher vs. lower overall involvement, but we can filter by number of years one has been involved involved in EA, and there is no level-of-experience threshold I could find where there are statistically significant differences in EAs’ views on how promising AI x-risk is. (Note that years of experience in EA may not be the best proxy for what you are asking, but is likely the best we’ve got to tackle this specific question.)
Blue is >1 year experience, red is <1 year experience:
Blue is >2 years experience, red is <2 years experience:
Can you estimate dark triad scores from the Big Five survey data?
How much higher was the scoring on neuroticism than the general population?
How many alignment researchers do you think there are total? What % do you think this survey hit that you wanted it to hit?