Some blindspots in rationality and effective altruism
Lighter reading here.
Update: appreciating Scott Alexander’s humble descriptions in his recent Criticism of Criticism of Criticism post. A clarification I need to make is that the blindspot-brightspot distinctions below are not about prescribing ’EAs’ to be eg. less individualistic (although there is an implicit preference, with non-elaborated-on reasoning, which Scott also seems to have in the other direction). The distinctions are attempts at categorising where (covering aspects of the environment) people in involved in our broader community incline to focus on more (‘brightspots’) relative to other communities, as well as what corresponding representational assumptions we are making in our mental models, descriptions, and explanations. The distinctions also form a basis for prescribing the community to not just make hand-wavy gestures of ‘we should be open to criticism’, but to actually zone in on different aspects other communities notice and could complement our sensemaking in, if we manage to build epistemic bridges to their perspectives. Ie. listen in a way where we do not keep misinterpreting what they are saying within our default frames of thinking (criticism is not useful if we keep talking past each other). I highlighted where we are falling short and other communities could contribute value.
Last month, Julia Galef interviewed Vitalik Buterin. Their responses to Glen Weyl’s critiques of the EA community struck me as missing perspectives he had tried to raise.
So I emailed Julia to share my thoughts. Cleaned-up text below:
Personally, I thought Vitalik’s and your commentary on Glen Weyl’s characterisation of the EA and rationality community missed something important.
Glen spent a lot of time interacting with people from the black community and other cultural niches and asking for their perspectives. He said that he learned more from that than from the theoretical work he did before.
To me, Glen’s criticism came across as unnuanced (eg. EAs also donate to GiveDirectly, and it’s not like we force people to take what we give them). I also resonate with that critiques of rationality and EA often seem unfair and devoid of reason. They lack specific examples and arguments relating to what the community actually does, and come across as a priori judgements of our community being cold, reductionist and weird. It’s also frustrating that such critiques can constrain EA efforts to improve the lives of others.But Glen’s criticism hit an important point: our community wields usefully biased styles of thinking to comprehend the world and impact the lives of beneficiaries far removed from us. But we overlook the perspectives held by persons we affect, perspectives which are adapted to the contexts they live in (with ‘adapted’ I roughly mean that their perspectives are useful for navigating their surrounding environment in ways that allow them to reach opportunities and avoid hazards).
It’s hard though to discern these perspectives without hanging out and talking.
Most rationalists, me included, have spent little time travelling overseas and immersing themselves in local cultures different from theirs. It’s also hard for Glen (or Tyler) to convey perspectives unfamiliar to listeners in x minutes.
If it’s helpful, I could try and share a summary with you of views and styles of thinking that the rationality community is not much in touch with. I read about 150 psychology papers in my spare time to try and form a better understanding of our blindspots (and complementary ‘brightspots’).
Julia graciously replied that she was interested in a summary of what I think EAs might be missing, or how in particular our views on philanthropy might be biased due to lack of exposure to other cultures/communities.
I emailed her a summary of my upcoming sequence (edit: instead writing up new AGI-misalignment arguments that very much fall in attentional blindspot 1 to 6) – about a tool I’m developing to map group blindspots. It was tough to condense 75 pages of technical explanation clearly into 7 pages, so bear with me (comment)! I refined the text further below:
Here are brightspots (vs. blindspots) that EAs and rationalists might hold in common, ie. areas we notice (vs. miss) that map to relevant aspects of reality.
Common brightspots:
We often focus on analysing how an abstract thing will function.This focus is narrow (compared to the portion of hypothetical space that humans are able to perceive and meta-learn from). This is because we are mapping the territory at the intersection of several views.
I think we especially focus on viewing...
future (vs. past),
far (vs. near vs. centrally present),
precisely (vs. imprecisely) sorted
structures (vs. processes) of
independent individuals (vs. the interdependent collective) that are
externally (vs. internally) present.
Different-minded groups can illuminate aspects we miss in viewing our brightspots:
1. We reference future possibilities
We seem to neglect past case studies or historical accounts somewhat in predicting the effects our philanthropic actions have on beneficiaries. Some conservative cultures attach more value to passing on and studying past accounts and practices. Our community has a progressive leaning. We seem more open to letting go of past learnings and to reimagine the future. As a result, we also reinvent the wheel more (comment).
Complicating factors:
– Over the years, EA orgs fixated more on upholding established paradigms. Community building practicesarehave especially become more conservative. Unconventional entrepreneurs receive less support.
– I’m confusing different meanings of conservatism. Is this about being closed to unfamiliar people or things, a need to maintain purity and order, deferring to authority or tradition, preventing the loss of what you acquired in the past, updating less on novel inferences, referencing past experiences in planning..?Update: I rarely hear people at events discuss historical trends relevant to EA work, but have seen ideas posted by EA-dedicated staff now-and-then:
– History of philanthropy: OpenPhil’s research, and Future Perfect’s podcast series
(eg. on Rockefeller funding forced sterilisation of Indians residing in slums).
– Mercy for Animals building on grassroots advocacy work over earlier decades.
– Intellectual history: FHI on why prior generations missed existential risks.
2. We represent far across the distances we perceive
I. Far across more far-fetched scenarios, sequenced over time.
eg. existential risk ends society’s long-term trajectory (vs. current civic issue)
II. Far across places stitched into space.
eg. global poverty (vs. local homelessness)
III. Separated from the bounded entities that neighbour us:
inanimate objects, goal-directed beings, social partners.
eg. pandemic containment (vs. on-the-ground fieldwork)
IV. Decoupled from the context of individual identities we sort entities into:
things, agents, persons.
eg. LW on game-theoretic agents (vs. activists on rescuing a caged animal)
Since we aim to impact the lives of persons far removed from us, we get loose or no feedback on how our actions affected our supposed beneficiaries. Instead, we rely on feedback from our social circle to correct our beliefs. We talk with trusted collaborators who understand what we aim to do. But people close to us share similar backgrounds and use similar mental styles to map and navigate their environment.
Worryingly, contexts to which EAs were exposed in the past (W.E.I.R.D. academia, coding, engineering, etc), and later generalised arguments from, are very dissimilar to the contexts in which their supposed beneficiaries reside (villages in low-income countries, non-human animals in factory farms, cultural and ethnic groups who will be affected by technology developments).
Many of us seem motivated to focus and act on beliefs for doing good by a Calvinistic sense of responsibility that was impressed on us through social interactions since our childhood. We later generalised it to other hypothetical humans, out of a need to have a coherent ontology, as well as to assess value and pursue our derived goals consistently. Such underlying motivations are different from actually caring, and therefore drive a subtle wedge between the information we seek and act on, and what’s actually true, relevant, and helpful for the persons we claim to be trying to improve the lives of.
Going by projects I’ve coordinated, EAs often push for removing paper conflicts of interest over attaining actual skin in the game (comment).
3. We sort entities precisely into identities
When rationalists detachedly and coolly model the external world (vs. say nature-loving hippies who feel warm and close to their surroundings), they are guided by an aesthetic preference, is my sense:
We distill individual identities into abstract types that are elegant and ordered, based on primary features held in common (your interview on beauty in physics highlighted these aesthetics). Distillation allows us to block out context-specific features that appear messy and noisy.
Perhaps, we perceive contexts as more messy because we sort more precisely. Rationalists have, on average, more pronounced autistic traits. People diagnosed with autism tend to precisely sort entities they notice around them within a narrower confidence band of identity. They need tighter coherence amongst instances to feel certain about them being the same. So their threshold for perceiving ambiguity is lower: a smaller deviation in an entity’s features (superficial or essentialised) will trigger ambiguity as to which identity that entity belongs to.
Upon entering a new context, they fixate on sorting out all these special instances. They get overwhelmed, missing the proverbial forest for the trees. But if they’re able to select primary features individuals hold in common, they can extract a signal from the noise. By distilling individuals as general elements of an ordered system, they can draw neat lines of inference between them.
When we describe that representation of the world to a non-STEM group, it may come across as clinical, segregated, barren, and impoverished of meaning (look into how this post or Brave New World is worded). They struggle with ambiguity of a different kind – about which real-life concrete features embody such an abstract concept (what the hell does a ‘game-theoretic agent’ mean?).
Decoupling conflicts with their motivation to read into concrete nuances. They want to get close to, and be part of, a context that is rich, alive and interwoven.
Glen’s personal quote: ❝I work to imagine, build and communicate a pluralistic future for social technology truer to the richness of our diversely shared lives.
EAs and rationalists tend to be context-blind. We are more likely to miss subtle social cues. We are also more naively confident about our models fitting uniformly across various contexts (vs. say the peasants Luria interviewed who were hesitant to speculate about places they hadn’t seen before).
One pattern: early EA thinkers proposing explicit arguments that were elegant and ordered. Their analysis of causes and interventions was idealised – only refined later by new entrants who considered concrete applications.
In hindsight, judgements read as simplistic and naive in similar repeating ways (relying on one metric, study, or paradigm and failing to factor in mean reversion or model error there; fixating on the individual and ignoring societal interactions; assuming validity across contexts):
Eliezer Yudkowsky’s portrayal of a single self-recursively improving AGI (later
overturneddisputed by some applied ML researchers)Will MacAskill’s claim that you can do 100x more good by giving to low-income countries
Toby Ord’s analysis of DCP2: ‘the best of these interventions is estimated to be 1,400 times as cost-effectiveness as the least good’
ACE researchers’ estimate of 1.4 animals saved per vegan leaflet
CEA staff recommending community building models derived from Oxford settings to all local organisers (fortunately opened up to criticism and adapted)
Current EA arguments still tend to build on mutually exclusive categorisations (unlike this email :), generalise across large physical spaces and timespans (comment), and assume underlying structures of causation that are static. Authors figure out general scenarios and assess the relative likelihood of each, yet often don’t disentangle the concrete meanings and implications of their statements nor scope out the external validity of the models they use in their writing (granted, the latter are much harder to convey). Posts usually don’t cover variations across concrete contexts, the relations and overlap between various plausible perspectives, or the changes in underlying dynamics much.
RadicalxChange, on the other hand, emphasises combining relevant perspectives in their modelling, and co-creating solutions with stakeholders who are working from different contexts. I could make a case say for GiveWell doing the former (eg. cluster thinking), but not much of the latter.
4. Our views are built out of structures
We (more the EAs, see comment) perceive the world as consisting of locatable parts:
A. We represent + recognise an observation to be a fixed cluster (a structure).
eg. an observation’s recurrence, a place, a body, a stable identityB. We predict + update on the possible location of this structure.
eg. how likely it ends up present within some linear sequence, geographic surface, physical boundaries, or levels of feature abstractionOur structure-based view is a reflection of our Westernised culture. English sentence descriptions centre around the subject, object, and adjectives. Westerners also often perceive active causation as the inherent causal feature of something or someone (eg. a mechanical function, a growth gene, or an aggressive personality).
Conversely, some traditional cultures foster a process-based view. Things are perceived as impermanent and ever-changing (see (action) verb-based Native American languages, or this wacky interpretation of Dzogchen philosophy).
A. They represent + recognise an observed change to be a trajectory in presence (a process).
eg. a transition, movement, interaction, relationB. They predict + update on whenever, wherever, and so forth, this process may initiate again.
Update: Some readers said they were confused by this distinction, or its relevance.
→ See cases of focussing on (interpreting and forecasting) processes vs. structures.5. We view individuals as independent
Since we prefer sorting different things into ordered and mutually exclusive categories, we aren’t as aware of the relations between them (besides maybe endorsing this as a fact of reality upon reflection). In our attempts to carve nature at its joints, we neglect the ways persons and things are causally interdependent. That is, we do this even more strongly than broader Western individualist society (vs. say Asian collectivist cultures).
Quoting from the paper Culture and the Self:
❝Western view of the individual as an independent, self-contained, autonomous entity who
(a) comprises a unique configuration of internal attributes (eg., traits, abilities, motives, and values) and
(b) behaves primarily as a consequence of these internal attributes❝Experiencing interdependence entails seeing oneself as part of an encompassing social relationship and recognizing that one’s behavior is determined, contingent on, and, to a large extent organized by what the actor perceives to be the thoughts, feelings, and actions of others in the relationship.
The RadicalxChange community recognises the interdependence of people as part of a collective whole. They attract members who often take up a more interdependent culture or mindset (social scientists, artists, African-Americans, women). To be fair, this might also be because RxC engages minority groups more, who in turn feel empowered to contribute to mechanisms that can overcome systemic exploitation. Update: minorities are less represented than I thought.
Revisiting Glen’s critiques, I interpret one basis to be our neglect of social interconnectedness:
❝We’re not going to pay that much attention to getting feedback from the people whose lives that affects or being in conversation with them.
❝cloistering themselves into a room
❝You have to think of the things you’re doing as speech acts and not purely intellectual acts… They condition a certain sort of a society.
❝a lot of the overly field experiment driven, effective ways of charitable giving that didn’t think about broader social structures and effects of that sort of stuff
❝the actual power that people derive in a decentralized way almost always comes from their ability to act collectively. It’s almost never possible on your own to exercise much power.Update: Climate change is a case I think where noticing causal interdependencies can be especially insightful (if you don’t rely on a priori notions of interconnectedness).
– Carbon emissions are not ‘one global thing’ but downstream from eg. farming animals intensively, cutting trees to expand farmland and to supply firewood for cooking, burning biomass that releases air pollutants, and subsidising polluting industries over investing in lower-cost clean alternatives. These activities harm local residents too through respiratory illness, self-reinforcing poverty cycles, estrangement from their surroundings, etc. Since these localised harms intertwine with global emissions, overlapping interventions can address both.
– Carbon emissions are upstream from gases released into the atmosphere trapping heat, inducing anomalous weather patterns that further harm citizens worldwide, and also suck up their representatives’ attention and coordinated use of resources to mitigate other human-originated threats.
(comment on how this is not simply about flow-through effects)
6. We view things as being external to us
In conversations, EAs and rationalists often attempt to convey a more impartial or objective view of the external world. This leads us to disregard personal interpretations when those are actually relevant (eg. for supporting a collaborator, or considering a friend’s needs and constraints when advising them on their career options).
In terms of individual persons, we could do worse though. Though people can be socially awkward, they do check in and consider how their conversation partner is feeling. Some of us are also really into introspection techniques and self-awareness (eg. rationalists writing about meditation experiences, Qualia Research Institute).
But since we focus more on the individual than the interdependent collective, we’re particularly unaware of the cultural feeling of our community, as well as any broader social repercussions of our actions.
Examples:
cases where 80,000 Hours staff didn’t catch on to the effects that their general career recommendations were having on the broader EA community (they’ve now more clearly specified the scope of their goals and whom they can serve)
criticism around EAs neglecting the effects of what they say and do on societal norms (eg. Rob Reich’s criticism of GiveWell)
LessWrong members who felt unwelcome and lonely, prompting Project Hufflepuff’s start (effective animal advocacy meetups though are more warm and cozy in my experience; one organiser shared more or less with me that outsiders see them as cutesy Hufflepuffs)
Update: Cases of people working well with different social perspectives:
– Past non-violent resistance movements led by Ghandi & Martin Luther King
– Tsai Ing-wen transparently engaging, relating & empowering Taiwanese citizens
– Therapists and social workers who uncover relationship contexts & dynamics
– Human-centric computer tool designers (eg. at Apple, early internet innovators)
– Acumen Fund incubating a mosquito bednet factory & localised voice surveys
More speculative brightspot(vs. blindspot) trade-offs:
I updated and added to the sections below:
7. (Interpret vs.) forecast
A. Interpret: recognise + represent aspects
eg. classical archaeologists focus on differentiated recognition of artefacts,
linguistic anthropologists on representing differentiated social contextsB. Forecast: sample + predict aspects
eg. development economists focus more on calibrated sampling of metrics,
global prio scholars on calibrating their predictions of distilled scenariosOur forecasting style focuses on calibrating likelihoods of a minimal set of aspects we deem to be primary or most important. eg. AI existential safety researchers who seek to improve the accuracy of their AGI timeline forecasts, rather than seek out other complementary interpretations relevant to developing beneficial AI.
A ‘reader of technological landscapes’ like Kevin Kelly can tell various trends or possibilities that others can’t (like an urbanite can’t tell the rich signs of natural landscapes). Similarly, venture capitalists focus less on fine-tuning their predictions of start-up exit scenarios and more on reading into the markers of performance (eg. founders ‘living in the future’ who confidently pursue their alternative vision for future products) that competing VCs and tech corps neglect.
Even to judge the hits and misses, there’s a trade-off between calibrating likelihoods of a clean reference class (B), and differentiating which aspects to reference (A).
Dialectic between the views:
A: VCs anticipated tech trends before others did.
B: Those VCs were overconfident; it rarely turned out as they specifically claimed.
A: But they identified new frontiers and gaps to invest in, which gave them an edge.
B: An impartial sample of VC investors shows they got sub-par returns on average.
A: A subpopulation of some geographical & intellectual origin got high returns.
B: Those are spurious correlations you cherry-picked after the fact.
8. Gain vs. loss focus
To attain more positive valence vs. remove more negative valence.
Personal motivation matters because it guides how people frame their focus, pursue strategies, and assess aims. For example, x-risk leaders and s-risk managers may be predisposed to and socially reinforced to set goals differently, going by my stereotyped impressions:
Eliezer and Nick – to eagerly leap towards attaining their ideal for a more positive world state (powering over obstacles, towards the possible presence of a gain). Each originally led the start-up phase of a techno-optimistic institute. But they realised they needed to go pioneer research to prevent AGI misalignment in order to not sadly miss out on utopia.
Center on Long-Term Risk – to vigilantly maintain their responsibility to prevent a more negative world state (contain the intrusion of any potential loss, to control its absence). Founders originally met through discussions of moral philosophy, infused with German Weltschmerz. Managing secure, incremental charity operations is their core competence (update: an ops employee said CLT’s processes are more loose and exploratory than I let on). But they’re prepared to promote innovative start-up practices in order to relieve the universe from pessimistic scenarios of suffering.
9. (Sensory groundedness vs.) representational stability of beliefs
Markers of perception:
0. Present I. Time-bound II. Stitched III. Enclosed IV. Categorisedobservation’s recurrence vs. sorted identity
observed transition vs. analogised relation0. Arising I. Follows II. Moves III. Crosses IV. Links
10. (Non-duality)
Both the internal {now, here, this, my} and the external {then, there, that, them} arise in awareness and are part of material reality.
If you got this far, I’m interested to hear your thoughts! Do grab a moment to call so we can chat about and clarify the concepts. Takes some back and forth.
- Some blindspots in rationality and effective altruism by 21 Mar 2021 18:01 UTC; 54 points) (EA Forum;
- 16 May 2022 16:03 UTC; 12 points) 's comment on Thoughts on AI Safety Camp by (
- A parable of brightspots and blindspots by 21 Mar 2021 18:18 UTC; 4 points) (
- Presumptive Listening: sticking to familiar concepts and missing the outer reasoning paths by 27 Dec 2022 15:40 UTC; 3 points) (EA Forum;
- Presumptive Listening: sticking to familiar concepts and missing the outer reasoning paths by 27 Dec 2022 15:40 UTC; -16 points) (
I’ve found myself doubting this claim, so I’ve read the post in question. As far as I can tell, it’s a reasonable summary of the fast takeoff position that many people still hold today. If all you meant to say was that there was disagreement, then fine—but saying ‘later overturned’ makes it sound like there is consensus, not that people still have the same disagreement they’ve had 13 years ago. (And your characterization in the paragraph I’ll quote below also gives that impression.)
Sorry, I get how the bullet point example gave that impression. I’m keeping the summary brief, so let me see what I can do.
I think the culprit is ‘overturned’. That makes it sound like their counterarguments were a done deal or something. I’ll reword that to ‘rebutted and reframed in finer detail’.
Note though that ‘some applied ML researchers’ hardly sounds like consensus. I did not mean to convey that, but I’m glad you picked it up.
Perhaps, your impression from your circle is different from mine in terms of what proportion of AIS researchers prioritise work on the fast takeoff scenario?
Yeah, I think overturned is the word I took issue with. How about ‘disputed’? That seems to be the term that remains agnostic about whether there is something wrong with the original argument or not.
My impression is that gradual takeoff has gone from a minority to a majority position on LessWrong, primarily due to Paul Christiano, but not an overwhelming majority. (I don’t know how it differs among Alignment Researchers.)
I believe the only data I’ve seen on this was in a thread where people were asked to make predictions about AI stuff, including takeoff speed and timelines, using the new interactive prediction feature. (I can’t find this post—maybe someone else remembers what it was called?) I believe that was roughly compatible with the sizeable minority summary, but I could be wrong.
Seems good. Let me adjust!
This roughly corresponds with my impression actually.
I know a group that has surveyed researchers that have permission to post on the AI Alignment Forum, but they haven’t posted an analysis of the survey’s answers yet.
To disentangle what I had in mind when I wrote ‘later overturned by some applied ML researchers’:
Some applied ML researchers in the AI x-safety research community like Paul Christiano, Andrew Critch, David Krueger, and Ben Garfinkel have made solid arguments towards the conclusion that Eliezer’s past portrayal of a single self-recursively improving AGI had serious flaws.
In the post though, I was sloppy in writing about this particular example, in a way that served to support the broader claims I was making.
On 3, I’d like to see EA take sensitivity analysis more seriously.
This resonates, based on my very limited grasp of statistics.
My impression is that sensitivity analysis aims more at reliably uncovering epistemic uncertainty (whereas Guesstimate as a tool seems to be designed more for working out aleatory uncertainty).
Quote from interesting data science article on Silver-Taleb debate:
This is my go-to figure when thinking about aleatoric vs epistemic uncertainty.
Edit: In the context of the figure. The aleatoric uncertainty is high in the left cluster because the uncertainty of where a new data point will be is high and is not reduced by the number of training examples. The epistemic uncertainty is high in regions where there is insufficient data or knowledge to produce an accurate estimate of the output, this would go down with more training data in these regions.
Looks cool, thanks! Checking if I understood it correctly:
- is x like the input data?
- could y correspond to something like the supervised (continuous) labels of a neural network, which inputs are matched too?
- does epistemic uncertainty here refer to that inputs for x could be much different from the current training dataset if sampled again (where new samples could turn out be outside of the current distribution)?
Thanks, I realised that I provided zero context for the figure. I added some.
Yes. The example is about estimating y given x where x is assumed to be known.
Not quite, we are still thinking of uncertainty only as applied to y. Epistemic uncertainty here refers to regions where the knowledge and data is insufficient to give a good estimate y given x from these regions.
To compare it with your dice example, consider x to be some quality of the die such that you think dies with similar x will give similar rolls y. Then aleatoric uncertainty is high for dies where you are uncertain for values of new rolls even after having rolled several similar dies and rolling more similar dies will not help. While epistemic uncertainty is high for dies with qualities you haven’t seen enough of.
Thank you! That was clarifying especially the explanation of epistemic uncertainty for y.
1. I’ve been thinking about epistemic uncertainty more in terms of ‘possible alternative qualities present’, where
you don’t know the probability of a certain quality being present for x (e.g. what’s the chance of the die having an extended three-sided base?).
or might not even be aware of some of the possible qualities that x might have (e.g. you don’t know a triangular prism die can exist).
2. Your take on epistemic uncertainty for that figure seems to be
you know of x’s possible quality dimensions (e.g. relative lengths and angles of sides at corners).
but given a set configuration of x (e.g. triangular prism with equilateral triangle sides = 1, rectangular lengths = 2 ), you don’t know yet the probabilities of outcomes for y (what’s the probability of landing face up for base1, base2, rect1, rect2, rect3?).
Both seem to fit the definition of epistemic uncertainty. Do correct me here!
Edit: Rough difference in focus:
1. Recognition and Representation
vs.
2. Sampling and Prediction
Good point, my example with the figure is lacking in regards to 1 simply because we are assuming that x is known completely and that the observed y are true instances of what we want to measure. And from this I realize that I am confused about when some uncertainties should be called aleagoric or epistemic.
When I think I can correctly point out epistemic uncertainty:
If the y that are observed are not the ones that we actually want then I’d call this uncertainty epistemic. This could be if we are using tired undergrads to count the number of pips of each rolled die and they miscount for some fraction of the dice.
If you haven’t seen similar x before then you have epistemic uncertainty because you have uncertainty about which model or model parameters to use when estimating y. (This is the one I wrote about previously and the one shown in the figure)
My confusion from 1:
If the conditions of the experiment changes. Our undergrads start to pull dice from another bag with an entirely different distribution p(y|x), then we have insufficient knowledge to estimate y and I would call this epistemic uncertainty.
If x is lacking in some information to do good estimates of y. x is the color of the die and when we have thrown enough dice from our experimental distribution we get a good estimate of p(y|x) and our uncertainty doesn’t increase with more rolls, which makes me think that it is aleatoric uncertainty. But on the other hand x is not sufficient to spot when we have a new type of die (see previous point) and if we knew more about the dice we could do better estimates which makes me think that it is epistemic uncertainty.
You bring up a good point in 1 and I agree that this feels like it should be epistemic uncertainty, but at some point the boundary between inherent uncertainty in the process and uncertainty from knowing too little about the process becomes vague to me and I can’t really tell when a process is aleatoric or epistemic.
I also noticed I was confused. Feels like we’re at least disentangling cases and making better distinctions here.
BTW, just realised that a problem with my triangular prism example is that theoretically no will rectangular side can face up parallel to the floor at the same time, just two at 60º angles).
This is interesting. This seems to ask the question ‘Is a change in the quality of x like colour actually causal to outcomes y?’ Difficulty here is that you can never fully be certain empirically, just get closer to [change in roll probability] for [limit number of rolls → infinity] = 0.
To disentangle the confusion I took a look around about a few different definitions of the concepts. The definitions were mostly the same kind of vague statement of the type:
Aleatoric uncertainty is from inherent stochasticity and does not reduce with more data.
Epistemic uncertainty is from lack of knowledge and/or data and can be further reduced by improving the model with more knowledge and/or data.
However, I found some useful tidbits
With this my updated view is that our confusion is probably because there is a free parameter in where to draw the line between aleatoric and epistemic uncertainty.
This seems reasonable as more information can always lead to better estimates (at least down to considering wavefunctions I suppose) but in most cases having this kind of information and using it is infeasible and thus having the distinction between aleatoric and epistemic depend on the problem at hand seems reasonable.
This is clarifying, thank you!
Good catch
Yes, I think you are right. Usually when modeling you can learn correlations that are useful for predictions but if the correlations are spurious they might disappear when the distributions changes. As such to know if p(y|x) changes from only observing x, then we would probably need that all causal relationships to y are captured in x?
I found it immensely refreshing to see valid criticisms of EA. I very much related to the note that many criticisms of EA come off as vague or misinformed. I really appreciated that this post called out specific instances of what you saw as significant issues, and also engaged with the areas where particular EA aligned groups have already taken steps to address the criticisms you mention.
I think I disagree on the degree to which EA folks expect results to be universal and generalizable (this is in response to your note at the end of point 3). As a concrete example, I think GiveWell/EAers in general would be unlikely to agree that GiveDirectly style charity would have similarly sized benefits in the developed world (even if scaled to equivalent size given normal incomes in the nation) without RCTs suggesting as much. I expect that the evidence from other nations would be taken into account, but the consensus would be that experiments should be conducted before immediately concluding large benefits would result.
I appreciate your thoughtful comment too, Dan.
You’re right I think that I overstated EA’s tendency to assume generalisability, particularly when it comes to testing interventions in global health and poverty (though much less so when it comes to research in other cause areas). Eva Vivalt’s interview with 80K, and more recent EA Global sessions discussing the limitations of the randomista approach are examples. Some incubated charity interventions by GiveWell also seemed to take a targeted regional approach (e.g. No Lean Season). Also, Ben Kuhn’s ‘local context plus high standards theory’ for Wave. So point taken!
I still worry about EA-driven field experiments relying too much, too quickly on filtering experimental observations through quantitive metrics exported from Western academia. In their local implementation, these metrics may either not track the aspects we had in mind, or just not reflect what actually exists and/or is relevant to people’s local context there. I haven’t heard yet about EA founders who started out by doing open qualitative fieldwork on the ground (but happy to hear examples!).
I assume generalisability of metrics would be less of a problem for medical interventions like anti-malaria nets and deworming tablets. But here’s an interesting claim I just came across:
Fair points!
I don’t know if I’d consider JPAL directly EA, but they at least claim to conduct regular qualitative fieldwork before/after/during their formal interventions (source from Poor Economics, I’ve sadly forgotten the exact point but they mention it several times). Similarly, GiveDirectly regularly meets with program participants for both structured polls and unstructured focus groups if I recall correctly. Regardless, I agree with the concrete point that this is an important thing to do and EA/rationality folks are less inclined to collect unstructured qualitative feedback than its importance deserves.
Interesting, I didn’t know GiveDirectly ran unstructured focus groups, nor that JPAL does qualitative interviews at various stages of testing interventions. Adds a bit more nuance to my thoughts, thanks!
One of GiveDirectly’s blog posts on survey and focus group results, by the way.
https://www.givedirectly.org/what-its-like-to-receive-a-basic-income/
Could you elaborate on this? Which wheels are you thinking of?
EAs invented neither effectiveness nor altruism, as Buddha, Quakers, Gandhi, Elizabeth Fry, Equiano and many others can attest!
EAs tend to be slow/behind the curve on coms and behavioural science and Implementation Science, and social science/realpolitik in general. But they do learn over time.
This is a good question hmm. Now I’m trying to come up with specific concrete cases, I actually feel less confident of this claim.
Examples that did come to mind:
I recall reading somewhere about early LessWrong authors reinventing concepts that were already worked out before in philosophic disciplines (particularly in decision theory?). Can’t find any post on this though.
More subtly, we use a lot of jargon. Some terms were basically imported from academic research (say into cognitive biases) and given a shiny new nerdy name that appeals to our incrowd. In the case of CFAR, I think they were very deliberate about renaming some concepts, also to make them more intuitive for workshops participants (eg. implementation intentions → trigger action plans/patterns, pre-mortem → murphijitsu).
(After thinking about this, I called with someone who is doing academic research on Buddhist religion. They independently mentioned LW posts on ‘noticing’, which basically is a new name for a mediation technique that has been practiced for millennia.)
Renaming is not reinventing of course, but the new terms do make it harder to refer back to sources from established research literature. Further, some smart amateur blog authors like to synthesise and intellectually innovate upon existing research (eg. see Scott Alexander’s speculative posts, or my post above ^^).
The lack of referencing while building up innovations can cause us to misinterpret and write stuff that poorly reflects previous specialist research. We’re building up our own separated literature database.
A particular example is Robin Hanson ‘near-far mode’, from a concise and well-articulated review paper about psychological distance to the community, which spawned a lot of subsequent posts about implications for thinking in the community (but with little referencing to other academic studies or analyses).
E.g. Hanson’s idea that people are hypocritical when they signal high-construal values but are more honest when they think concretely – a psychology researcher who seems rigorously minded said to me that he dug into Hanson’s claim but that conclusions from other studies don’t support this.
My impression from local/regional/national EA community building is that a many organisers (including me) either tried to work out how to run their group from first principles, or consulted with other more experienced organisers. We could also have checked for good practices from and consulted with other established youth movements. I have seen plenty of write-ups that go through the former, but little or none of the other.
Definitely give me counter-examples!
See Eliezer’s Sequences and Mainstream Academia and scroll down for my comment there. Also https://www.greaterwrong.com/posts/XkNXsi6bsxxFaL5FL/ai-cooperation-is-already-studied-in-academia-as-program (According to these sources, AFAWK, at least some of the decision theory ideas developed on LW were not worked out already in academia.)
I hadn’t read about these specific cases yet, thanks! I appreciate your nuances here
The way I’ve tended to think about these sorts of questions is to see a difference between the global portfolio of approaches, and our personal portfolio of approaches.
A lot of the criticisms of EA as being too narrow, and neglecting certain types of evidence or ways of thinking make far more sense if we see EA as hoping to become the single dominant approach to charitable giving (and perhaps everything else), rather than as a particular community which consists of particular (fairly similar) individuals who are pushing particular approaches to doing good that they see as being ignored by other people.
Yeah, seems awesome for us to figure out where we fit within that global portfolio! Especially in policy efforts, that could enable us to build a more accurate and broadly reflective consensus to help centralised institutions improve on larger-scale decisions they make (see a general case for not channeling our current efforts towards making EA the dominant approach to decision-making).
To clarify, I hope this post helps readers become more aware of their brightspots (vs. blindspots) that they might hold in common with like-minded collaborators – ie. areas they notice (vs. miss) that map to relevant aspects of the underlying territory.
I’m trying to encourage myself and the friends I collaborate with to build up an understanding of alternative approaches that outside groups take up (ie. to map and navigate their surrounding environment), and where those approaches might complement ours. Not necessarily for us to take up more simultaneous mental styles or to widen our mental focus or areas of specialisation. But to be able to hold outside groups’ views so we get roughly where they are coming from, can communicate from their perspective, and form mutually beneficial partnerships.
More fundamentally, as human apes, our senses are exposed to an environment that is much more complex than just us. So we don’t have the capacity to process our surroundings fully, nor to perceive all the relevant underlying aspects at once. To map the environment we are embedded in, we need robust constraints for encoding moment-to-moment observations, through layers of inductive biases, into stable representations.
Different-minded groups end up with different maps. But in order to learn from outside critics of EA, we need to be able to line up our map better with theirs.
Let me throw an excerpt from an intro draft on the tool I’m developing. Curious for your thoughts!
Yeah, I really like this idea—at least in principle. The idea of looking for value agreement and where do our maps (that likely are verbally extremely different) match is something that I think we don’t do nearly enough.
To get at what worries me about some of the ‘EA needs to consider other viewpoints discourse’ (and not at all about what you just wrote, let me describe two positions:
EA needs to get better at communicating with non EA people, and seeing the ways that they have important information, and often know things we do not, even if they speak in ways that we find hard to match up with concepts like ‘bayesian updates’ or ‘expected value’ or even ‘cost effectiveness’.
EA needs to become less elitist, nerdy, jargon laden and weird so that it can have a bigger impact on the broader world.
I fully embrace 1, subject to constraints about how sometimes it is too expensive to translate an idea into a discourse we are good at understanding, how sometimes we have weird infohazard type edge cases and the like.
2 though strikes me as extremely dangerous.
To make a metaphor: Coffee is not the only type of good drink, it is bitter and filled with psychoactive substances that give some people heart palpitations. That does not mean it would be a good idea to dilute coffee with apple juice so that it can appeal to people who don’t like the taste of coffee and are caffeine sensitive.
The EA community is the EA community, and it currently works (to some extent), and it currently is doing important and influential work. Part of what makes it work as a community is the unifying effect of having our own weird cultural touchstones and documents. The barrier of excluisivity created by the jargon and the elitism, and the fact that it is one of the few spaces where the majority of people are explicit utilitarians is part of what makes it able to succeed (to the extent it does).
My intuition is that an EA without all of these features wouldn’t be a more accessible and open community that is able to do more good in the world. My intuition is an EA without those features would be a dead community where everyone has gone on to other interests and that therefore does no good at all.
Obviously there is a middle ground—shifts in the culture of the community that improve our pareto frontier of openness and accessibility while maintaing community cohesion and appeal.
However, I don’t think this worry is what you actually were talking about. I think you really were focusing on us having cognitive blindspots, which is obviously true, and important.
Well-written! Most of this definitely resonates for me.
Quick thoughts:
Some of the jargon I’ve heard sounded plain silly from a making-intellectual-progress-perspective (not just implicit aggrandising). Makes it harder to share our reasoning, even to each other, in a comprehensible, high-fidelity way. I like Rob Wiblin’s guide on jargon.
Perhaps we put too much emphasis on making explicit communication comprehensible. Might be more fruitful to find ways to recognise how particular communities are set up to be good at understanding or making progress in particular problem niches, even if we struggle to comprehend what they’re specifically saying or doing.
(I was skeptical about the claim ‘majority of people are explicit utilitarians’ – i.e. utilitarian not just consequentialist or some pluralistic mix of moral views – but EA Survey responses seems to back it up: ~70% utilitarian)
I left out nuances to keep the blindspot summary short and readable. But I should have specifically prefaced what fell outside the scope of my writing. Not doing so made claims come across more extreme than I meant for the more literal/explicit readers amongst us :)
So for you who still happens to read this, here’s where I was coming from:
To describe blindspots broadly across the entire rationality and EA community.
In actual fact I see both communities more as loose clusters of interacting and affiliated people. Each gathered group somewhat diverges in how it attracts members who are predisposed towards focussing on (and reinforce each other to express) certain aspects as perceived within certain views.
I pointed out how a few groups diverge in the summary above (e.g. effective animal advocacy vs. LW decision theorists, thriving vs suffering-focussed EAs), but left out many others. Responding to Christian Kl’s earlier comment, I think how the ‘CFAR alumni’ cluster frames aspects meaningfully diverges from the larger/overlapping ‘long-time LessWrong fans’ cluster.
Previously, I suggested that EA staff could coordinate work more through non-EA-branded groups with distinct yet complementary scopes and purposes, so the general overarching tone of this post runs counter to that.
To aggregate common views within which our members seemed to most often frame problems (as expressed to others involved in the community they knew also aimed to work on those problems), and to contrast those with the foci held by other purposeful human communities out there.
Naturally, what an individual human focusses on in any given moment depends on their changing emotional/mental makeup and the context they find themselves (incl. the role they then identify as having) in. I’m not e.g. claiming that when someone who aspires to be a rational researcher at work focusses on brushing their teeth at home while glancing at their romantic partner, they must nevertheless be thinking real abstract and elegant thoughts.
But for me, the exercise of mapping our ingroup’s brightspots onto each listed dimension – relative to the focus of outside groups on – has provided some overview. The dimensions are from a perceptual framework I gradually put together and that is somewhat internally coherent (but predictably overwhelms anyone whom I explain it to, and leaves them wondering how it’s useful; hence this more pragmatic post).
I hope though no reader ends up using this as a personality test – say for identifying their or their friend’s (supposedly stable) character traits to predict their resulting future behaviour (or god forgive, to explain away any confusion or disagreement they sense about what an unfamiliar stranger says).
To keep each blindspot explanation simple and to the point:
If I already mix in a lot of ‘on one hand in this group...but on the other hand in this situation’, the reader will gloss over the core argument. I appreciate people’s comments with nuanced counterexamples though. Keeps me intellectually honest.
Hope that clarifies the post’s argumentation style somewhat.
I had those three starting points at the back of my mind while writing in March. So sorry I didn’t include them.
I don’t think our rationality at the moment is very structure-based. Kahneman’s view of cognitive biases and heuristics was very structure-based.
Reasoning with CFAR techniques on the other hand is more process-based. If I listen to my felt sense because I trained a lot of focusing or do double crux I’m not acting in a structure-based way.
That’s basically the straw Vulcan accusation. We build pillow forts.
I don’t think we do. When discussing for example FDA decisions we don’t see Fauci as a person who’s independent of the system in which he operates. There’s the moral maze discourse which is also about individuals being strongly influenced by the system in which they are operating. Inadequate Equilibria is also not about the faults of individuals but how individuals are limited by the systems in which they operate.
We certainly care more about the future than many other communities, but we also care about the past. Progress studies and the surrounding debates are very much focused on the past. We do have Petrov day which is about remembering a past event and reminding us that there was a history of the world being at risk.
I now see how the ‘who’ part of the sentence can come across as me saying that rationalists only detachedly and coolly modelling the external world. I do not think that is the case based on interacting with plenty of self-ascribed rationalists myself (including making a pillow fort and hanging out with them in it myself). I do think rationalists do this mental move a lot more than other people I know.
Instead, I meant this as an action rationalists choose to take more often.
I just edited that sentence to ‘When rationalists detachedly and coolly model...’
That’s a very different claim than the one in the OP. The one in the OP is about lack of diversity of mental moves and not just a claim about engaging in coolly modeling being a mental move that rationalists are capable of doing well and do frequently.
This seems to presume that a certain literal interpretation of that text is the only one that could be intended or interpreted. I don’t think this is worth discussing this further, so leaving it at that.
I like this distinction, and actually agree! Have talked with a CFAR (ex-)staff member about this, who confirmed my impression that CFAR has been compensating factor in the community amongst most of the perceptual/cognitive dimensions I listed. Where you and I may disagree though is that I still think the default tendency for many rationalists is to construct problems as structure-based.
Good nuances here that we don’t just see individuals as independent of the system they’re operating in. So there is some sense of interconnectivity there. I think you’re only partially capturing what I mean with interdependence though. See other comment for my attempt to convey this.
I agree with all these points. We seem to be on one line here.
There are certainly rationalists who’s approach to problems is structure-based. We have a diversity of approaches.
One interesting example here is the question of diet. You find people who do argue the structure-based approach where it’s about CICO (Calories-Out-Calories) in. Then you have other people who take a more process oriented perspective out of cybernetics.
You often have people who believe in CICO who don’t get that there is another way to look at the issue but historically for example Eliezer did argue the cybernetics paradigm.
When I look at my local rationality community CFAR has a huge influence on it. I would guess that within EA you find more people who can only handle the structure-based approach. I would claim that’s more because those EA have too little exposure to the rationality community then it’s due to flaws in the rationality community.
There are probably two separate issues here. One is about modeling yourself as independent and the other is about modeling other people as independent actors.
I think generally we do a decent job at modeling how other people are constraint in the choices they make by their enviroment but model ourselves more as independent actors. But then I’m uncertain whether anybody really looks at the way their own decision making depends on other people.
re: Processes vs Structure
Your concrete examples made me update somewhat towards process thinking being more common in AI alignment and local rationality communities than I was giving credit for. I also appreciate you highlighting that the rationality community has a diversity of approaches, and that we’re not some homogenous blob (as my post might imply).
A CFAR staff member also pointed me to MIRI’s logical induction paper (with input from Critch) as one of MIRI’s most cited papers (i.e. at least somewhat representative of how outside people might view MIRI’s work) that’s clearly based on an algorithmic process.
Eliezer’s AI Foom post (the one I linked to above) can be read as an explanation of how an individual agent is constructed out of a reinitiating process.
Also, there has been some interest in decision and allocation mechanisms like e.g. quadratic voting and funding (both promoted by RxC, latter also by Vitalik Buterin) which seems kinda deliberately process-oriented.
This also resonates, and I hadn’t explicitly made that distinction yet! Particularly, how EA researchers have traditionally represented cause areas/problems/interventions to work on (after going through e.g. Importance-Tractability-Neglectedness analysis) seems quite structure-based (80Ks methods for testing personal fit don’t as much however).
IMO CEA-hosted grantmakers also started from a place where they dissected and assessed the possible promising/unpromising traits of a person, the country or professional hub they operate from, and their project idea (based on first principles say, or track record). This was particularly the case with the EA Community Building Grant in early days. But seems to be changing somewhat in how EA Funds grantmakers are offering smaller grants in dialog with possible applicants, and assessing viability for the career aspirant to continue or for a project to expand further as they go.
I made a case before for funding my entrepreneurial endeavour that instead relied on funding processes that relied more directly on eliciting and acting on feedback. And expanded on that in a grant application to the EA Infrastructure Fund:
I’m not claiming btw that process-based representations are inherently superior for the social good or something. Just that the value of that kind of thinking is overlooked in some of the work we do.
E.g. In this 80K interview, they made a good case for why adhering to enacting some bureaucratic process can be bad. I also thought they overlooked a point – you can make other, similarly compelling arguments for why rewarding that some previously assessed outcome or end state came into existence or was reached can be bad.
re: Independent vs. Interdependent
Both resonate for me.
And seeing yourself as more independent than you see others does seem very human (or at least seems like what I’d do :P). Wondering though whether there’s any rigorous research on this in East-Asian cultures, given the different tendency for people living there to construe the personal and agentic self as more interdependent.
I like your distinction of viewing yourself vs. another as an independent agent.
Some other person-oriented ways of representing that didn’t make it into the post:
Identifying an in(ter)dependent personal self with respect to the outside world.
Identifying an in(ter)dependent relation attributable to ‘you’ and/or to an identified social partner.
Identifying a collective of related people as in(ter) dependent with respect to the outside world.
and so on…
Worked those out using the perceptual framework I created, inspired by and roughly matching up with the categories of a paper that seems kinda interesting.
Returning to the simplified two-sided distinction I made above, my sense is still that you’re not capturing it.
There’s a nuanced difference between the way most of your examples are framed, and the framing I’m trying to convey. Struggling to, but here’s another attempt!
Your examples:
Each of those examples describes an individual agent as independent vs. ‘not independent’, i.e. dependent.
Dependence is unlike interdependence (in terms of what you subjectively perceive in the moment).
Interdependence involves holding both/all in mind at the same time, and representing how the enduring existence of each is conditional upon both themselves and the other/s. If that sounds wishy-washy and unintuitive, then hope you get my struggle to explain it.
You could sketch out a causal diagram, where one arrow shows the agent affecting the system, and a parallel arrow shows the system affecting the agent back. That translates as “A independently wills a change in S; S depends on A” in framing 1, “S independently causes a change in A; A depends on S” in framing 2.
Then, when you mentally situate framing 2 next to framing 1, that might look like you actually modelled the interdependence of the two parts.
That control loop seems deceptively like interdependence, but it’s not by the richer meaning I’m pointing to with the word. It’s what these speakers are trying to point out when they vaguely talk on about system complexity vs. holistic complexity.
Cartesian frames seem like a mathematically precise way of depicting interdependence. Though this also implicitly imports an assumption of self-centricity: a model in which compute is allocated towards predicting the future as represented within a dualistic ontology (consisting of the embedded self and the environment outside of the self).
Brian Christian’s example is interesting. It seems to suggest that focusing on the process or outcome are the only possible directions to focus. Leslie Cameron-Bandler et al argue in one example in the Emprint Method that in good parenting the focus in not on the past (and the process or outcome of the past) but on the future. (It’s generally a good book for people who want to understand what ways there are to think and make decisions)
I spoke imprecisely above when I linked Fauci and the FDA. Fauci leads the NIAID. He has some influence on it but he’s also largely influenced by it. That seems to me interdepence.
I will listen to the talk later and maybe write more then.
Let me google the Emprint Method. The idea of focus on past vs. future in rewarding/encouraging makes intuitive sense to me though.
I haven’t actually heard of Fauci or discussions around him, but appreciate the clarification! Note again, I’m talking about a way you perceive interdependence (not to point to the elements needed for two states to be objectively described as interdependent).
Thanks for engaging here!
To clarify the independent vs. interdependent distinction
Julia suggested that EA thought about negative flow-through effects are an example of interdependent thinking. IMO EAs still tend to take an independent view on that. Even I did a bad job above of describing causal interdependencies in climate change (since I still placed the causal sources in a linear ‘this leads to this leads to that’ sequence).
So let me try to clarify again, at the risk of going meta-physical:
EAs do seem to pay more attention to causal dependencies than I was letting on, but in a particular way:
When EA researchers estimate impacts of specific flow-through effects, they often seem to have in mind some hypothetical individual who takes actions, which incrementally lead to consequences in the future. Going meta on that, they may philosophise about how an untested approach can have unforeseen and/or irreversible consequences, or about cluelessness (not knowing how the resulting impacts spread out across the future will average out). Do correct me if you have a different impression!
An alternate style of thinking involves holding multiple actors / causal sources in mind to simulate how they conditionally interact. This is useful for identifying root causes for problems, which I don’t recall EA researchers doing much of (e.g. the sociological/economic factors that originally made commercial farmers industrialise their livestock production).
To illustrate the difference, I think gene-environment interactions provide a neat case:
Independent ‘this or that’ thinking:
Hold one factor constant (e.g. take the different environments in which adopted twins grew up in as a representative sample) to predict the other (e.g. attribute 50% of variation of a general human trait to their genes).
Interdependent ‘this and that’ thinking:
Assume that factors will interplay, and therefore probabilities are not strictly independent.
Test nonlinear factors together to predict outcomes.
e.g. on/off gene for aggression × childhood trauma × teenagers playing violent video games
Cartesian frames seem an apt theoretical analogy
“A represents a set of possible ways the agent can be, E represents a set of possible ways the environment can be, and ⋅ : A × E → W is an evaluation function that returns a possible world given an element of A and an element of E”
Under the interdependent framing, the environment affords certain options perceivable by the agent, which they choose between.
A notion of Free Will loses its relevancy under this framing. Changes in the world were caused neither by the settings of the outside environment nor the embedded agent ‘willing’ an action, but rather as contingent on both.
You might counter: isn’t the agent’s body constituted of atomic particles that act and react deterministically over time, making free will an illusion?
Yes, and somehow in parts interacting across parts, they come to view the constitution of a greater whole, an agent, that makes choices.
None of these (admittedly confusing) framings have to be inconsistent with each other.
Overlap between ‘interdependent thinking’ and ‘context’ and ‘collective thinking’.
When individuals with their own distinct traits are constrained in the possible ways they can interact by surrounding others (i.e. by their context), they will behave predictably within those constraints:
e.g. when EAs stick to certain styles of analysis that they know comrades will grasp and admire when gathered at a conference or writing a post for others to read.
Analysis of the kind ‘this individual agent with x skills and y preferences will take/desist from actions that are more likely to lead to z outcomes’ falls flat here.
e.g. to paraphrase Critch’s Production Web scenario, which typical AI Safety analysis tends to overlook the severity of:
Take a future board that buys a particular ‘CEO AI service’ to ensure their company will be successful. The CEO AI elicits trustees for their inherent categorical preferences, but what they express at any given moment is guided by their recent interactions with influential others (e.g. the need to survive tougher competition by other CEO AIs). A CEO AI that plans company actions based on preferences elicited by board members’ preferences at any given point in time, will by default not account for actions bringing into existence processes that actually change the preferences board members state. That is, unless safety-minded AI developers design a management service that accounts for this circuitous dynamic, and boards are self-aware enough to buy the less-profit-optimised service that won’t undermine their personal integrity.
The risk emerges from how the AI developers and company’s board introduce assumptions of structure:
i.e. That you can design an AI to optimise for end states based on its human masters’ identified intrinsic preferences. That AI would fail to use available compute to determine whether a chosen instrumental action reinforces a process through which ‘stuff’ contingently gets flagged in human attention, expressed to the AI, received as inputs, and derived as ‘stable preferences’.
Two people asked me to clarify this claim:
Copying over my responses:
re: Conflicts of interest:
My impression has been that a few people appraising my project work looked for ways to e.g. reduce Goodharting, or the risk that I might pay myself too much from the project budget. Also EA initiators sometimes post a fundraiser write-up for an official project with an official plan, that somewhat hides that they’re actually seeking funding for their own salaries to do that work (the former looks less like a personal conflict of interest *on paper*).
re: Skin in the game:
Bigger picture, the effects of our interventions aren’t going to affect us in a visceral and directly noticeable way (silly example: we’re not going to slip and fall from some defect in the malaria nets we fund). That seems hard to overcome in terms of loose feedback from far-away interventions, but I think it’s problematic that EAs also seem to underemphasise skin in the game for in-between steps where direct feedback is available. For example, EAs seem sometimes too ready to pontificate (me included) about how particular projects should be run or what a particular position involves, rather than rely on the opinions/directions of an experienced practician who would actually suffer the consequences of failing (or even be filtered out of their role) if they took actions that had negative practical effects for them. Or they might dissuade someone from initiating an EA project/service that seems risky to them in theory, rather than guide the initiator to test it out locally to constrain or cap the damage.
This interview with Jacqueline Novogratz from Acumen Fund covers some practical approaches to attain skin in the game.