EA & LW Forums Weekly Summary (21 Aug − 27 Aug 22′)
This is also posted on the EA forum: see here.
Supported by Rethink Priorities
Sunday August 21st—Saturday August 27th
The amount of content on the EA and LW forums has been accelerating. This is awesome, but makes it tricky to keep up with! The below hopes to help by summarizing popular (>40 karma) posts each week. It also includes announcements and ideas from Twitter that this audience might find interesting. This will be a regular series published weekly—let me know in the comments if you have any feedback on what could make it more useful!
If you’d like to receive these summaries via email, you can subscribe here.
Methodology
This series originated from a task I did as Peter Wildeford’s executive research assistant at Rethink Priorities, to summarize his weekly readings. If your post is in the ‘Didn’t Summarize’ list, please don’t take that as a judgment on its quality—it’s likely just a topic less relevant to his work. I’ve also left out technical AI posts because I don’t have the background knowledge to do them justice.
My methodology has been to use this and this link to find the posts with >40 karma in a week for the EA forum and LW forum respectively, read / skim each, and summarize those that seem relevant to Peter. Those that meet the karma threshold as of Sunday each week are considered (sometimes I might summarize a very popular later-in-the-week post in the following week’s summary, if it doesn’t meet the bar until then). For twitter, I skim through the following lists: AI, EA, Forecasting, National Security (mainly nuclear), Science (mainly biosec).
I’m going through a large volume of posts so it’s totally possible I’ll get stuff wrong. If I’ve misrepresented your post, or you’d like a summary edited, please let me know (via comment or DM).
EA Forum
Philosophy and Methodologies
Critque’s of MacAskill’s ‘Is it Good to Make Happy People?’
Discusses population asymmetry, the viewpoint that a new life of suffering is bad, but a new life of happiness is neutral or only weakly positive. Post is mainly focused on what these viewpoints are and that they have many proponents vs. specific arguments for them. Mentions that they weren’t well covered in Will’s book and could affect the conclusions there.
Presents evidence that people’s intuitions tend towards needing significantly more happy people than equivalent level of suffering people for a tradeoff to be ‘worth it’ (3:1 to 100:1 depending on question specifics), and that therefore a big future (which would likely have more absolute suffering, even if not proportionally) could be bad.
EAs Underestimate Uncertainty in Cause Prioritization
Argues that EAs work across too narrow a distribution of causes given our uncertainty in which are best, and that standard prioritizations are interpreted as more robust than they really are.
As an example, they mention that 80K states “some of their scores could easily be wrong by a couple of points” and this scale of uncertainty could put factory farming on par with AI.
The Repugnant Conclusion Isn’t
The repugnant conclusion (Parfit, 1984) is the argument that enough lives ‘barely worth living’ are better than a much smaller set of super duper awesome lives. In one description of it, Parfit said the barely worth it lives had ‘nothing bad in them’ (but not much good either).
The post argues that actually makes those lives pretty awesome and non-repugnant, because nothing bad is a high bar.
A Critical Review of Givewell’s 2022 Cost-effectiveness Model
NB: longer article—only skimmed it so I may have missed some pieces.
Suggestions for cost-effectiveness modeling in EA by a health economist, with Givewell as a case study. The author believes the overall approach to be good, with the following major critiques:
Extremely severe: no uncertainty modeling—we don’t know how likely they think their recommendations are to be wrong
Severe: opaque inputs—it’s hard to trace back where inputs to the model come from, or to update them over time
Moderate: the model architecture could use best practice to be easier to read / understand (eg. separating intervention and moral inputs)
A number of minor issues are also discussed, and the author also does their own CEAs on several top charities and compares them to Givewell’s in depth (looking cell by cell). By doing this they find several errors / inconsistencies (eg. Givewell assumes every malaria death prevented by the Malaria Consortium also indirectly prevents 0.5 deaths, but hadn’t applied the same logic to AMF, therefore significantly undercounting AMF’s relative impact). Note overall the feedback is good, and there is more without errors than with.
A rep from Givewell has gotten in touch in the comments to get more detail / consider what to do about the feedback.
Object-Level Interventions / Reviews
Protest Movements: How Effective Are They?
Summary of six months research by Social Change Lab on the impacts and outcomes of particularly influential protests / protest movements. The full report is available here. Future research will look at what makes certain movements more effective, and more generally to understand if social movement organizations could be more effective than current well-funded avenues to change.
Headline results include that protest movements often lead to small changes in public opinion & policy, moderate in public discourse, and with large variance between specific cases. Public opinion shifts of 2-10%, voting shifts of 1-6%, and increased discourse of >10x were observed in natural experiments.
Animal Welfare Fund: April 2022 grant recommendations
Summary & review of grants made in March / April 2022.
Climate Change & Longtermism: new book-length report
Executive summary of an in-depth report on climate change from an LT perspective that was put together as part of What We Owe the Future research.
It states emissions and warming are likely going to be lower than once thought, due to political movement and revisions on the amount of recoverable fossil fuels available to burn. Author gives a 5% chance to >4 degrees warming, not a level that’s an LT risk.
The most LT risk is from reaching ‘tipping points’ (runaway feedback loops). There is high uncertainty in these mechanisms, though they’re unlikely to kick in under 4 degrees, and we know Earth has been >17 degrees hotter in the past and still supported life. If those feedback loops cause significant damage to certain locations, that could in turn cause instability and war. Due to this, he concludes that climate change is still an important LT area—though not as important as some other global catastrophic risks (eg. biorisk), which outsize on both neglectedness and scale.
[Cause Exploration Prizes] The importance of Intercausal Impacts
When interventions are considered only in their primary cause area (eg. global health, animal welfare, existential risk) their impact can be under or over counted by excluding effects in the other cause areas.
Food systems transformation (plant and cell based meat) has competitive positive effects on climate change and biosecurity, in addition to its primary area of animal welfare, so should be rated higher / receive more resources.
Driving Education on EA Topics Through Khan Academy
The social content strategy manager for Khan Academy is looking for ideas on EA concepts that would be easy to disseminate in ~1m short videos. These would be used to create a series focused on getting young learners interested in EA / improving the world.
EA Giving Tuesday will likely hibernate in 2022, unless handed off to another EA organization
EA Giving Tuesday uses Facebook’s Giving Tuesday matching to try and get matched funds to effective orgs. Rethink Charity is stopping support due to lack of capacity, will hibernate if not handed off by Sept 30th this year. $411K was matched in 2021, and it is an operationally complex project to run.
Community & Media
The EA Community Might be Neglecting the Value of Influencing People
Argues that too much resource currently goes into attracting people into the EA community, as compared to guiding people towards doing more EA-endorsed actions (without the EA affiliation).
For example, the latter could look like influencing existing politicians to care more about preventing pandemics, hosting a talk at a university on the dangers of gain-of-function research, or spreading EA epistemics like scale / neglectedness / tractability frameworks. This allows more scaling of impact as we can influence those who likely wouldn’t join EA.
Perhaps the highest leverage meta-skill: an EA guide to hiring
A practical guide to hiring. Short summary:
Know the tasks you’re hiring for ie. hire around a role, not a job title.
Write a good job ad, with a clear title and engaging language, and a specific deadline.
Post it everywhere relevant, not just one place, and ask specific people for referrals.
Use structure: app → test task → interview. Get task as close to the real job as possible, and drop any tasks or questions that aren’t distinguishing (always get the same answer).
Consider using a hiring agency.
They also include a linked list of other good resources on hiring.
Announcing a new introduction to effective altruism
New intro to EA page is ready to be shared with general audiences, aiming at being the authoritative introductory source.
When doing stuff for fun, don’t worry about it also being productive / good / socially approved—lean into the feeling of “mwahaha I found a way to give myself hedons!”
Mockup image of a newspaper with important stuff on the front page (eg. 15K children died just like every day, and we’re at risk of nuclear war).
Effective altruism’s billionaires aren’t taxed enough. But they’re trying.
Linkpost and summary for Vox’s article which defends EA billionaires as trying to redistribute their income: “If you’re such an effective altruist, how come you’re so rich?”
What We Owe the Future is an NYT bestseller
Reached #7 on hardcover non-fiction list.
Gaps and opportunities in the EA talent & recruiting landscape
Current methods of talent <-> job matching:
Individual orgs hiring people
Candidates listing themselves on directories
Orgs & groups matchmaking or referring people who talk to them
Dedicated hiring orgs—new in EA, ramping up
Gaps:
Strategic clarity on biggest hiring needs
One stop shop CRM of talent for both orgs & candidates (vs. lots of different groups for niche areas) - good & bad aspects to level of centralization here
Building talent pipelines of strong candidates → particularly mid-career proto-EAs
EAs with the skills to be in-house recruiters or external headhunters
Didn’t Summarize
*Either because they’re not on the target topic set, because the title is already a solid summary, or because they’re link-posted from LW (and summarized on that list)
Translating the Precipice into Czech—my experience and recommendations
Paper is published! 100,000 Lumens to Treat Seasonal Affective Disorder
I’m interviewing Marcus Davis, co-founder of Rethink Priorities — what should I ask him?
Common misconceptions about OpenAI (linkpost, summarized on LW list)
AI Strategy Nearcasting (cross-posted, summarized on LW list)
LW Forum
AI Impacts / New Capabilities
What’s the least impressive thing GPT-4 won’t be able to do
Asks for predictions in the forum thread on what GPT-4 won’t be able to do. Top voted ones include:
Play a novel, complicated board game well, given only written instructions.
Understand and manipulate causal relationships framed differently to training data.
Give good suggestions on improving itself.
AI art isn’t “about to shake things up”. It’s already here.
AI art is cheap and good enough quality to be used for commercial purposes now. Gives the example of Midjourney being 400x cheaper and often better quality for illustrating cards in a card game, as compared to a human artist.
Fictional piece, first person view on a post-AGI utopia.
AI Meta & Methodologies
AGI Timelines are Mostly Not Strategically Relevant to Alignment
5 years vs >100 year timelines are both long enough for training / hiring new researchers, and for foundational research to pay off. Because of this, where on this scale timelines fall doesn’t matter for choices on whether to invest in those things.
Beliefs and disagreements about automating alignment research
Summarizes views of others on whether we can use AI to automate alignment research safely.
States three levels − 1) assisting humans (already here), 2) original contributions (arguably here, a little) and 3) building own aligned successor (not here). Lots of disagreement on which are possible or desirable.
Views of specific researchers: (note these are summarized views of summarized views, so might not be great representations of that expert’s opinions)
Nate Soares—building an AI to help with alignment is no easier than building an aligned AI. It would need enough intelligence to already be dangerous.
John Wentworth—Assisting humans is fine (eg. via google autocomplete), but we can’t have AI do the hard parts. We don’t know how close we are to alignment either, because we are still unclear on the problem.
Evan Hubringer—GPT-3 shows we can have programs imitate the process that creates their training data, without goal directed behavior. This could be used to safely produce new alignment research if we can ensure it doesn’t pick up goals.
Ethan Perez—Unclear how easy / hard this is vs. doing alignment ourselves, and if an AI capable of helping would already be dangerous / deceptive. But we should try—build tools that can have powerful AI plugged in when available.
Richard Ngo—AI helping with alignment is essential long-term. But for now do regular research so we can automate once we know how.
Post quality AI research on arXiv instead of just LW and the alignment forum, it’s easy, it’ll show up on google scholar, and it’s likely to be read more broadly.
Some conceptual alignment research projects
A list of 26 research outputs Richard Ngo would like to see. (Each expected to be pretty time-consuming).
OpenAI’s roadmap / approach to alignment, cross-posted from their blog. They explain their approach as iterative and empirical—attempting to align real highly capable AI systems, learning and refining methods as AI develops (in addition to tackling problems they assume will be on the path to AGI).
Their primary approach is “engineering a scalable training signal for very smart AI systems that is aligned with human intent” via the following three pillars:
Training AI systems using human feedback—eg. their approach creating InstructGPT, which is 100x smaller but often preferred to models not trained to follow implicit intent. (Note: still fails sometimes eg. lying or not refusing harmful tasks)
Training AI systems to assist human evaluation—make it easier for humans to assess other AIs performance on complicated tasks (eg. a human evaluating AI book summaries, an assistant evaluation AI can provide related online links to help accuracy evaluation).
Training AI systems to do alignment research—train AIs to develop alignment research, and humans to review it (an easier task). No models sufficiently capable to contribute yet.
They also cover some limitations / arguments against this approach.
Common Misconceptions about OpenAI
OpenAI’s post on accurate and inaccurate common conceptions about it.
Accurate:
OpenAI is looking to directly build a safe AGI
The majority of researchers work on the capabilities team (100/145)
The majority did not join explicitly to reduce existential risk (exception—the 30 person alignment team are pretty driven by this)
There isn’t much interpretability research since Anthropic split off
Inaccurate (I’ll phrase as the accurate versions—the inaccurate ones were the opposite):
OpenAI has teams focused on both practical alignment of models it’s deployed, and researching how to align AGIs beyond human supervision—not just the former.
No alignment researchers (other than interpretability ones) moved from OpenAI → Anthropic. It still has an alignment team.
OpenAI is not obligated to make a profit.
OpenAI is aware of race dynamics, and will assist another value-aligned project closer to building AGI if it has a better than even chance within 2 years.
OpenAI has a governance team and cares about existential risk from AI.
Advocates nearcasting, which is forecasting with the assumption of “a world relatively similar to today’s”. Eg. “what should we do if TAI is just around the corner?”
Benefits include:
Gives a simpler jumping off point to start suggesting concrete actions. Ie. if we know an action we’d suggest if TAI were to be developed in a world like today’s, we can ask ‘do we expect differences in the future that will change this suggestion? Which ones?’
Focuses us on near-term TAI worlds—which are most dangerous / urgent.
Allows comparing predictions over time—comparing nearcasts in a given year to those a few years ago gives a feedback loop, showing changing predictions / conclusions and how / why they changed.
In a future post, Holden will be laying out more detail on an example nearcast scenario for TAI and his predictions and recommended actions on it.
Not AI Related
Bulleted list of advice from Katja, based on surveys she’s done. More summarized list below:
Test your surveys by having people take them & narrate their thoughts as they do.
Wording matters a lot to results
Even if you don’t intend it & avoid known issues like desirability bias.
If you do intend it, there’s heaps of ways to get the result you want.
Ask people what they know about already (otherwise your summary will bias them), don’t change wording between surveys if doing over time analysis, and avoid sequences of related questions (can lead people to a particular answer).
Qualtrics is expensive. Guided track isn’t and seems good.
Surveys are under-rated—do more.
Didn’t Summarize
Either because they’re not part of the target topic set, had very technical content, title is already a solid summary, or they were already summarized on the EA forum list.
Paper is published! 100,000 lumens to treat seasonal affective disorder
Finding Goals in the World Model
Taking the parameters which seem to matter and rotating them until they don’t
This Week on Twitter
AI
Stable diffusion was launched publicly 22nd Aug—an open source text-to-image model. Competitive image generation to DALL-E2, even on consumer grade GPUs, and only cost 600K to do the initial training. Relational understanding (eg. ‘red cube on top of green cube’) still shaky. Figma already incorporated it, a few days after release. #stablediffusion will pull up lots of related tweets. Along the same lines, NeRF models can make high quality images from multiple viewpoints via. Separate static images, and are apparently progressing very quickly (link).
OpenAI roadmap—OpenAI shared their current roadmap for aligning AI (link) See also the LW post of the same. (link)
Assessing AI moral capacities—DeepMind linked a new paper suggesting a framework for assessing AI moral capacities from a developmental psyc viewpoint. (link)
Adversarial inputs research—Anthropic found they respond best to human-feedback reinforcement learning (for language models, and compared to no additional feedback or to rejection sampling). (link)
EA
Summary of HLI critique on Givewell deworming estimates, and a good response to it by Givewell. → tweet thread describes the robust back and forth.
Vox article defending EA billionaires—same topic as the forum post in the EA forum section below (link)
Forecasting
“Starlink V2, launching next year, will transmit direct to mobile phones, eliminating dead zones worldwide” could be good for anti-censorship (link)
Ajeya updated her TAI timeline predictions (median 2040). (link)
Science
CLTR highlights that the UK government has committed an £800 million investment for the creation of a new research funding body, the Advanced Research + Invention Agency (ARIA) → with a high risk / high reward philosophy, and only light governance.
US soon to roll out variant-specific covid booster, first in world. (link)
The EA forum created a page to book 30m 1-1s with biosecurity professionals for those interested in using their career on catastrophic bio risks. It’s in beta currently. (link)
I’m very pleased to see a distillation/roundup/summary effort like this. Great stuff! Thank you.
Great to hear, thanks :-)
This sentence is confusing me, should I read it as:
Due to this, he concludes the cause area is one of the most important LT problems but primarily advises focusing on other risks anyway due to neglectedness.
Due to this, he concludes the cause area is not one of the most important LT problems and primarily advises focusing on other risks due to neglectedness.
From this summary of the summary I get the the impression that 1 is the correct interpretation but from
in the original summary I get the impression that 2 should be true.
Good point, thank you—I’ve had a re-read of the conclusion and replaced the sentence with “Due to this, he concludes that climate change is still an important LT area—though not as important as some other global catastrophic risks (eg. biorisk), which outsize on both neglectedness and scale.”
Originally I think I’d mistaken his position a bit based on this sentence: “Overall, because other global catastrophic risks are so much more neglected than climate change, I think they are more pressing to work on, on the margin.” (and in addition I hadn’t used the clearest phrasing) But the wider conclusion fits the new sentence better.