If the balance of opinion of scientists and policymakers (or those who had briefly heard arguments) was that AI catastrophic risk is high, and that this should be a huge social priority, then you could do a lot of things. For example, you could get budgets of tens of billions of dollars for interpretability research, the way governments already provide tens of billions of dollars of subsidies to strengthen their chip industries. Top AI people would be applying to do safety research in huge numbers. People like Bill Gates and Elon Musk who nominally take AI risk seriously would be doing stuff about it, and Musk could have gotten more traction when he tried to make his case to government.
My perception based on many areas of experience is that policymakers and your AI expert survey respondents on the whole think that these risks are too speculative and not compelling enough to outweigh the gains from advancing AI rapidly (your survey respondents state those are much more likely than the harms). In particular, there is much more enthusiasm for the positive gains from AI than your payoff matrix suggests (particularly among AI researchers), and more mutual fear (e.g. the CCP does not want to be overthrown and subjected to trials for crimes against humanity as has happened to some other regimes, and the rest of the world does not want to live under oppressive CCP dictatorship indefinitely).
But you’re proposing that people worried about AI disaster should leapfrog smaller asks of putting a substantial portion of the effort going into accelerating AI into risk mitigation, which we haven’t been able to achieve because of low buy-in on the case for risk, to far more costly and demanding asks (on policymakers’ views, which prioritize subsidizing AI capabilities and geopolitical competition already). But if you can’t get the smaller more cost-effective asks because you don’t have buy-in on your risk model, you’re going to achieve even less by focusing on more extravagant demands with much lower cost-effectiveness that require massive shifts to make a difference (adding $1B to AI safety annual spending is a big multiplier from the current baseline, removing $1B from semiconductor spending is a miniscule proportional decrease).
When your view is the minority view you have to invest in scientific testing to evaluate your view and make the truth more credible, and better communication. You can’t get around failure to convince the world of a problem by just making more extravagant and politically costly demands about how to solve it. It’s like climate activists in 1950 responding to difficulties passing funds for renewable energy R&D or a carbon tax by proposing that the sale of automobiles be banned immediately. It took a lot of scientific data, solidification of scientific consensus, and communication/movement-building over time to get current measures on climate change, and the most effective measures actually passed have been ones that minimized pain to the public (and opposition), like supporting the development of better solar energy.
Another analogy in biology: if you’re worried about engineered pandemics and it’s a struggle to fund extremely cost-effective low-hanging fruit in pandemic prevention, it’s not a better strategy to try to ban all general-purpose biomedical technology research.
I think this comment is overstating the case for policymakers and the electorate actually believing that investing in AI is good for the world. I think the answer currently is “we don’t know what policymakers and the electorate actually want in relation to AI” as well as “the relationship of policymakers and the electorate is in the middle of shifting quite rapidly, so past actions are not that predictive of future actions”.
I really only have anecdata to go on (though I don’t think anyone has much better), but my sense from doing informal polls of e.g. Uber drivers, people on Twitter, and perusing a bunch of Subreddits (which, to be clear, is a terrible sample) is that indeed a pretty substantial fraction of the world is now quite afraid of the consequences of AI, both in a “this change is happening far too quickly and we would like it to slow down” sense, and in a “yeah, I am actually worried about killer robots killing everyone” sense. I think both of these positions are quite compatible with pushing for a broad slow down. There is also a very broad and growing “anti-tech” movement that is more broadly interested in giving less resources to the tech sector, whose aims are at least for a long while compatible with slowing down AGI progress.
My current guess is that policies that are primarily aimed at slowing down and/or heavily regulating AI research are actually pretty popular among the electorate, and I also expect them to be reasonably popular among policymakers, though I also expect their preferences to lag behind the electorate for a while. But again, I really think we don’t know, and nobody has run even any basic surveys on the topic yet.
I agree there is some weak public sentiment in this direction (with the fear of AI takeover being weaker). Privacy protections and redistribution don’t particularly favor measures to avoid AI apocalypse.
But the sentiment looks weak compared to e.g. climate change and nuclear war, where fossil fuel production and nuclear arsenals continue, although there are significant policy actions taken in hopes of avoiding those problems. The sticking point is policymakers and the scientific community. At the end of the Obama administration the President asked scientific advisors what to make of Bostrom’s Superintelligence, and concluded not to pay attention to it because it was not an immediate threat. If policymakers and their advisors and academia and the media think such public concerns are confused, wrongheaded, and not politically powerful they won’t work to satisfy them against more pressing concerns like economic growth and national security. This is a lot worse than the situation for climate change, which is why it seems better regulation requires that the expert and elite debate play out differently, or the hope that later circumstances such as dramatic AI progress drastically change views (in favor of AI safety, not the central importance of racing to AI).
But the sentiment looks weak compared to e.g. climate change and nuclear war, where fossil fuel production and nuclear arsenals continue,
That seems correct to me, but on the other hand, I think the public sentiment against things like GMOs was also weaker than the one that we currently have against climate change, and GMOs got slowed down regardless. Also I’m not sure how strong the sentiment against nuclear power was relative to the one against climate change, but in any case, nuclear power got hindered quite a bit too.
I think one important aspect where fossil fuels are different from GMOs and nuclear power is that fossil fuel usage is firmly entrenched across the economy and it’s difficult, costly, and slow to replace it. Whereas GMOs were a novel thing and governments could just decide to regulate them and slow them down without incurring major immediate costs. As for nuclear power, it was somewhat entrenched in that there were many existing plants, but society could make the choice to drastically reduce the progress of building new ones—which it did.
Nuclear arsenals don’t quite fit this model—in principle, one could have stopped expanding them, but they did keep growing for quite a bit, despite public opposition. Then again, there was an arms race dynamic there. And eventually, nuclear arsenals got cut down in size too.
I think AI is in a sense comparable to nuclear power and GMOs in that there are existing narrow AI applications that would be hard and costly to get rid of, but more general and powerful AI is clearly not yet entrenched due to not having been developed yet. On the other hand, AI labs have a lot of money and there are lots of companies that have significant investments in AI R&D, so that’s some level of entrenchment.
Whether nuclear weapons are comparable to AI depends on whether you buy the arguments in the OP for them being different… but seems also relevant that AI arms race arguments are often framed as the US vs. China. That seems reasonable enough, given that the West could probably find consensus on AI as it has found on other matters of regulation, Russia does not seem to be in a shape to compete, and the rest of the world isn’t really on the leading edge of AI development. And now it seems like China might not even particularly care about AI [1, 2].
I’ll shill here and say that Rethink Priorities is pretty good at running polls of the electorate if anyone wants to know what a representative sample of Americans think about a particular issue such as this one. No need to poll Uber drivers or Twitter when you can do the real thing!
I’d very much like to see this done with standard high-quality polling techniques, e.g. while airing counterarguments (like support for expensive programs that looks like majority but collapses if higher taxes to pay for them is mentioned). In particular, how the public would react given different views coming from computer scientists/government commissions/panels.
It might be worth testing quite carefully for robustness—to ask multiple different questions probing the same issue, and see whether responses converge. My sense is that people’s stated opinions about risks from artificial intelligence, and existential risks more generally, could vary substantially depending on framing. Most haven’t thought a lot about these issues, which likely contributes. I think a problem problem with some studies on these issues is that researchers over-generalise from highly framing-dependent survey responses.
That makes a lot of sense. We can definitely test a lot of different framings. I think the problem with a lot of these kinds of problems is that they are low saliency, and thus people tend not to have opinions already, and thus they tend to generate an opinion on the spot. We have a lot of experience polling on low saliency issues though because we’ve done a lot of polling on animal farming policy which has similar framing effects.
I would definitely vote in favor of a grant to do this on the LTFF, as well as the SFF, and might even be interested in backstopping it with my personal funds or Lightcone funds.
I found this thread interesting and useful, but I feel a key point has been omitted thus far (from what I’ve read):
Public, elite, and policymaker beliefs and attitudes related to AI risk aren’t just a variable we (members of the EA/longtermist/AI safety communities) have to bear in mind and operate in light of, but instead also a variable we can intervene on.
And so far I’d say we have (often for very good reasons) done significantly less to intervene on that variable than we could’ve or than we could going forward.
So it seems plausible that actually these people are fairly convincible if exposed to better efforts to really explain the arguments in a compelling way.
We’ve definitely done a significant amount of this kind of work, but I think we’ve often (a) deliberately held back on doing so or on conveying key parts of the arguments, due to reasonable downside risk concerns, and (b) not prioritized this. And I think there’s significantly more we could do if we wanted to, especially after a period of actively building capacity for this.
Important caveats / wet blankets:
I think there are indeed strong arguments against trying to shift relevant beliefs and attitudes in a more favorable direction, including not just costs and plausibly low upside but also multiple major plausible downside risks.[1]
So I wouldn’t want anyone to take major steps in this direction without checking in with multiple people working on AI safety/governance first.
And it’s not at all obvious to me we should be doing more of that sort of work. (Though I think whether, how, & when we should is an important question and I’m aware of and excited about a couple small research projects that are happening on that.)
All I really want to convey in this comment is what I said in my first paragraph: we may be able to significantly push beliefs and opinions in favorable directions relative to where they are now or would be n future by default.
I think I would have totally agreed in 2016. One update since then is that I think progress scales way less than resources than I used to think it did. In many historical cases, a core component of progress driven by a small number of people (which is reflected in citation counts, who is actually taught in textbooks), and introducing lots of funding and scaling too fast can disrupt that by increasing the amount of fake work.
$1B in safety well-spent is clearly more impactful than $1B less in semiconductors, it’s just that “well-spent” is doing a lot of work, someone with a lot of money is going to have lots of people trying to manipulate their information environment to take their stuff.
Reducing especially dangerous tech progress seems more promising than reducing tech broadly, however since these are dual use techs, creating knowledge about which techs are dangerous can accelerate development in these sectors (especially the more vice signalling / conflict orientation is going on). This suggests that perhaps an effective way to apply this strategy is to recruit especially productive researchers (identified using asymmetric info) to labs where they work on something less dangerous.
In gain of function research and nuclear research, progress requires large expensive laboratories; AI theory progress doesn’t require that, although large scale training does (though, to a lesser extent than GOF or nuclear).
There are plenty of movements out there (ethics & inclusion, digital democracy, privacy, etc.) who are against current directions of AI developments, and they don’t need the AGI risk argument to be convinced that current corporate scale-up of AI models is harmful.
Working with them, redirecting AI developments away from more power-consolidating/general AI may not be that much harder than investing in supposedly “risk-mitigating” safety research.
Do you think there is a large risk of AI systems killing or subjugating humanity autonomously related to scale-up of AI models?
A movement pursuing antidiscrimination or privacy protections for applications of AI that thinks the risk of AI autonomously destroying humanity is nonsense seems like it will mainly demand things like the EU privacy regulations, not bans on using $10B of GPUs instead of $10M in a model. It also seems like it wouldn’t pursue measures targeted at the kind of disaster it denies, and might actively discourage them (this sometimes happens already). With a threat model of privacy violations restrictions on model size would be a huge lift and the remedy wouldn’t fit the diagnosis in a way that made sense to policymakers. So I wouldn’t expect privacy advocates to bring them about based on their past track record, particularly in China where privacy and digital democracy have not had great success.
If it in fact is true that there is a large risk of almost everyone alive today being killed or subjugated by AI, then establishing that as scientific consensus seems like it would supercharge a response dwarfing current efforts for things like privacy rules, which would aim to avert that problem rather than deny it and might manage such huge asks, including in places like China. On the other hand, if the risk is actually small, then it won’t be possible to scientifically demonstrate high risk, and it would play a lesser role in AI policy.
I don’t see a world where it’s both true the risk is large and knowledge of that is not central to prospects for success with such huge political lifts.
A movement pursuing antidiscrimination or privacy protections for applications of AI that thinks the risk of AI autonomously destroying humanity is nonsense seems like it will mainly demand things like the EU privacy regulations, not bans on using $10B of GPUs instead of $10M in a model”.
I can imagine there being movements that fit this description, in which case I would not focus on talking with them or talking about them.
But I have not been in touch with any movements matching this description. Perhaps you could share specific examples of actions from specific movements you have in mind?
For the movements I have in mind (and am talking with), the description does not match at all:
AI ethics and inclusion movements go a lot further than stopping people from building AI that eg. make discriminatory classifications/recommendations associated with marginalised communities – they want Western corporations to stop consolidating power through AI development and deployment while pushing their marginalised communities further out of the loop (rendering them voiceless).
Digital democracy groups and human-centric AI movements go a lot further than wanting to regulate AI – they want to relegate AI models humble models in the background that can assist and interface between humans building consensus and making decisions in the foreground.
Privacy and data ownership movements go a lot further than wanting current EU regulations – they do not want models to be trained on, store and exploit their own data in model parameters without their permission.
Suggest reading writings by people in those movements. Let me also copy over excerpts from people active in the areas of AI ethics & inclusion and digital democracy:
“We also advocate for a re-alignment of research goals: Where much effort has been allocated to making models (and their training data) bigger and to achieving ever higher scores on leaderboards often featuring artificial tasks, we believe there is more to be gained by focusing on understanding how machines are achieving the tasks in question and how they will form part of socio-technical systems.” from paper co-authored by Timnit Gebru.
“Rationalists are like most of the ideological groups I interact with. They are allies in important projects, such as limiting the race for massive investments in “AI” capabilities and engaging in governance experimentation. In other projects, such as limiting the social power/hubris of SV and diversifying it along a variety of dimensions they are more likely adversaries or at least unlikely allies.” from post by Glen Weyl.
Do you think there is a large risk of AI systems killing or subjugating humanity autonomously related to scale-up of AI models?
Yes, I do. And the movements I am in touch with are against corporate R&D labs scaling up AI models in the careless ways they’ve been doing so far.
Are you taking a stance here of “those outside movements have different explicit goals than us AI Safety researchers, and therefore cannot become goal-aligned with our efforts”?
In that case, I would disagree here.
Theoretically, I disagree with the ontological and meta-ethical assumptions that these claims would be based on. While objective goals expressed here are disjunctive, the underlying values are additive (I do not expect this statement to make sense for you; please skip to next point).
Practically, movements with various explicit goals are already against corporations (that are selected by markets to extract value from local communities) centrally scaling up the training of increasingly autonomous/power-seeking models. Some examples:
re: AI ethics and inclusion: The Stochastic Parrots paper co-authored by Timnit Gebru (before Google AI managers let her go, so to speak), describes various reasons for slowing down the scaled training of (language) transformer models. These reasons include the environmental costs of compute, neglecting to curate training data carefully, and the failure to co-design with stakeholders affected. All reasons to not scale up AI models fast.
Note that I have not read any writings from Gebru that “AGI risk” is not a thing. More the question of why people are then diverting resources to AGI-related research while assuming that the development general AI is inevitable and beyond our control.
re: Digital democracy and human-centric AI: The How AI Fails Us paper (see p5) argues against the validity of and further investment in the “centralization of capital and decision-making capacity under the direction of a small group of engineers of AI systems” where “the machine is independent from human input and oversight” and with “the target of “achieving general intelligence””. How does this not match up with arguing against “subjugating humanity autonomously [with the centralised] scale-up of AI models”?
People like Divya Siddarth, Glen Weyl, Audrey Tang, Jaron Lanier and Daron Acemoglu have repeatedly expressed their concerns about how current automation of work through AI models threatens the empowerment of humans in their work, creativity, and collective choice-making.
Weyl is also skeptical about the monolithic conception of “AGI” surpassing humans across some metrics. I disagree in that generally-capable self-learning/modifying machinery are physically possible. I agree in that monolithic oversimplified representations of AGI have allowed AI Safety researchers to make unsound presumptive claims about how they expect that machinery could be “aligned” in “principle.
As an example, you mentioned how governments could invest tens of billions of dollars in interpretability research. I touched on reasons here why interpretability research does not and cannot contribute top long-term AGI safety. Based on that, government-funded in interpretability research would distract smart AI researchers from actually contributing, and lend false confidence to AGI researchers that AGI could be interpreted sufficiently. Ie. this is “align-washing” the harmful activities by AI corporations, analogous to green-washing the harmful activities of fossil-fuel corporations.
As another example, your idea of Von Neuman Probes with error correcting codes, referred to by Christiano here, cannot soundly work for AGI code (as self-learning new code for processing inputs into outputs, and as introducing errors through interactions with the environment that cannot be detected and corrected). This is overdetermined. An ex-Pentagon engineer has spelled out the reasons to me. See a one-page summary by me here.
re: Privacy and data ownership: If privacy and data ownership movements take their own claims seriously (and some do), they would push for banning the training of ML models on human-generated data or any sensor-based surveillance that can be used to track humans’ activities.
With a threat model of privacy violations restrictions on model size would be a huge lift and the remedy wouldn’t fit the diagnosis in a way that made sense to policymakers. So I wouldn’t expect privacy advocates to bring them about based on their past track record, particularly in China where privacy and digital democracy have not had great success.
What do you mean here with a “huge lift”? Koen Holtman has been involved with internet privacy movements for decades. Let me ping him in case he wants to share thoughts on what went wrong there in Europe/ and in China.
I agree that some specific leaders you cite have expressed distaste for model scaling, but it seems not to be a core concern. In a choice between more politically feasible measures that target concerns they believe are real vs concerns they believe are imaginary and bad, I don’t think you get the latter. And I think arguments based on those concerns get traction on measures addressing the concerns, but less so on secondary wishlist items of leaders .
I think that’s the reason privacy advocacy in legislation and the like hasn’t focused on banning computers in the past (and would have failed if they tried). For example:
If privacy and data ownership movements take their own claims seriously (and some do), they would push for banning the training of ML models on human-generated data or any sensor-based surveillance that can be used to track humans’ activities.
AGI working with AI generated data or data shared under the terms and conditions of web services can power the development of highly intelligent catastrophically dangerous systems, and preventing AI from reading published content doesn’t seem close to the core motives there, especially for public support on privacy. So taking the biggest asks they can get based on privacy arguments I don’t think blocks AGI.
People like Divya Siddarth, Glen Weyl, Audrey Tang, Jaron Lanier and Daron Acemoglu have repeatedly expressed their concerns about how current automation of work through AI models threatens the empowerment of humans in their work, creativity, and collective choice-making.
It looks this kind of concern at scale naturally goes towards things like compensation for creators (one of Lanier’s recs), UBI, voting systems, open-source AI, and such.
Jaron Lanier has written a lot dismissing the idea of AGI or work to address it. I’ve seen a lot of such dismissal from Glen Weyl. Acemoglu I don’t think wants to restrict AI development? I don’t know Siddarth or Tang’s work well.
Note that I have not read any writings from Gebru that “AGI risk” is not a thing. More the question of why people are then diverting resources to AGI-related research while assuming that the development general AI is inevitable and beyond our control.
They’re definitely living in a science fiction world where everyone who wants to save humanity has to work on preventing the artificial general intelligence (AGI) apocalypse...Agreed but if that urgency is in direction of “we need to stop evil AGI & LLMs are AGI” then it does the opposite by distracting from types of harms perpetuated & shielding those who profit from these models from accountability. I’m seeing a lot of that atm (not saying from you)...What’s the open ai rationale here? Clearly it’s not the same as mine, creating a race for larger & larger models to output hateful stuff? Is it cause y’all think they have “AGI”?...Is artificial general intelligence (AGI) apocalypse in that list? Cause that’s what him and his cult preach is the most important thing to focus on...The thing is though our AGI superlord is going to make all of these things happen once its built (any day now) & large language models are a way to get to it...Again, this movement has so much of the $$ going into “AI safety.” You shouldn’t worry about climate change as much as “AGI” so its most important to work on that. Also what Elon Musk was saying around 2015 when he was backing of Open AI & was yapping about “AI” all the time.
That reads to me as saying concerns about ‘AGI apocalypse’ are delusional nonsense but pursuit of a false dream of AGI incidentally cause harms like hateful AI speech through advancing weaker AI technology, while the delusions should not be an important priority.
What do you mean here with a “huge lift”?
I gave the example of barring model scaling above a certain budget.
I touched on reasons here why interpretability research does not and cannot contribute top long-term AGI safety.
I disagree extremely strongly with that claim. It’s prima facie absurd to think that, e.g. that using interpretability tools to discover that AI models were plotting to overthrow humanity would not help to avert that risk. For instance, that’s exactly the kind of thing that would enable a moratorium on scaling and empowering those models to improve the situationn.
As another example, your idea of Von Neuman Probes with error correcting codes, referred to by Christiano here, cannot soundly work for AGI code (as self-learning new code for processing inputs into outputs, and as introducing errors through interactions with the environment that cannot be detected and corrected). This is overdetermined. An ex-Pentagon engineer has spelled out the reasons to me. See a one-page summary by me here.
This is overstating what role error-correcting codes play in that argument. They mean the same programs can be available and evaluate things for eons (and can evaluate later changes with various degrees of learning themselves), but don’t cover all changes that could derive from learning (although there are other reasons why those could be stable in preserving good or terrible properties).
Some of your interpretations of writings by Timnit Gebru and Glen Weyl seem fair to me (though would need to ask them to confirm). I have not look much into Jaron Lanier’s writings on AGI so that prompts me to google that.
Perhaps you can clarify the other reasons why the changes in learning would be stable in preserving “good properties”? I’ll respond to your nuances regarding how to interpret your long-term-evaluating error correcting code after that.
re: Leaders of movements being skeptical of the notion of AGI.
Reflecting more, my impression is that Timnit Gebru is skeptical about the sci-fiy descriptions of AGI, and even more so about the social motives of people working on developing (safe) AGI. She does not say that AGI is an impossible concept or not actually a risk. She seems to question the overlapping groups of white male geeks who have been diverting efforts away from other societal issues, to both promoting AGI development and warning of AGI x-risks.
Regarding Jaron Lanier, yes, (re)reading this post I agree that he seems to totally dismiss the notion of AGI, seeing it more a result of a religious kind of thinking under which humans toil away at offering the training data necessary for statistical learning algorithms to function without being compensated.
Feel free to still clarify the other reasons why the changes in learning would be stable in preserving “good properties”. Then I will take that starting point to try explain why the mutually reinforcing dynamics of instrumental convergence and substrate-needs convergence override that stability.
Fundamentally though, we’ll still be discussing the application limits of error correction methods.
Three ways to explain why:
AnyworkableAI-alignment method involves receiving input signals, comparing input signals against internal references, and outputting corrective signals to maintain alignment of outside states against those references (ie. error correction).
Any workable AI-alignment method involves a control feedback loop – of detecting the actual (or simulating the potential) effects internally and then correcting actual (or preventing the potential) effects externally (ie. error correction).
Eg. mechanistic interpretability is essentially about “detecting the actual (or simulating the potential) effects internally” of AI.
The only way to actually (slightly) counteract AGI convergence on causing “instrumental” and “needed” effects within a more complex environment is to simulate/detect and then prevent/correct those environmental effects (ie. error correction).
~ ~ ~ Which brings us back to why error correction methods, of any kind and in any combination, cannot ensure long-term AGI Safety.
I reread your original post and Christiano’s comment to understand your reasoning better and see how I could limits of applicability of error correction methods.
I also messaged Forrest (the polymath) to ask for his input.
The messages were of a high enough quality that I won’t bother rewriting the text. Let me copy-paste the raw exchange below (with few spelling edits).
Remmelt 15:38 Remmelt: “As another example [of unsound monolithic reasoning], your idea of Von Neuman Probes with error correcting codes, referred to by Christiano here (https://www.lesswrong.com/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic?commentId=Jaf9b9YAARYdrK3jp), cannot soundly work for AGI code (as self-learning new code for processing inputs into outputs, and as introducing errors through interactions with the environment that cannot be detected and corrected). This is overdetermined. An ex-Pentagon engineer has spelled out the reasons to me. See a one-page summary by me here.”
Carl Shulman: ”This is overstating what role error-correcting codes play in that argument. They mean the same programs can be available and evaluate things for eons (and can evaluate later changes with various degrees of learning themselves), but don’t cover all changes that could derive from learning (although there are other reasons why those could be stable in preserving good or terrible properties).”
Remmelt 15:40 Excerpting from the comment by Christiano I link to above: ”The production-web has no interest in ensuring that its members value production above other ends, only in ensuring that they produce (which today happens for instrumental reasons). If consequentialists within the system intrinsically value production it’s either because of single-single alignment failures (i.e. someone who valued production instrumentally delegated to a system that values it intrinsically) or because of new distributed consequentialism distinct from either the production web itself or any of the actors in it, but you don’t describe what those distributed consequentialists are like or how they come about.
And more pragmatically, such competition most obviously causes harm either via a space race and insecure property rights, or war between blocs with higher and lower savings rates (some of them too low to support human life, which even if you don’t buy Carl’s argument is really still quite low, conferring a tiny advantage). If those are the chief mechanisms then it seems important to think/talk about the kinds of agreements and treaties that humans (or aligned machines acting on their behalf!) would be trying to arrange in order to avoid those wars.”
”But the program of an AI, large stores of astronomical observations for navigation, and vast stores of technological information would take up an enormous amount of memory and storage space, perhaps many exabytes or more. Given this large body of information, adding additional directives to ensure that the probes eventually turn to producing welfare need only increase storage needs by a very small proportion, e.g. by 1 in 1 billion. Directives could directly specify the criteria to be eventually optimized, or could simply require compliance with further orders traveling behind the frontier of colonization. ... Mutation is easier to resist for computers than animals Biological life on Earth has evolved through mutation, and the reproductive process introduces significant errors in each generation. However, digital information storage allows for the comparison of redundant copies and the use of error-correcting codes, making substantive mutation many orders of magnitude less likely than in Earthly life.”
Remmelt 15:45
Returning to the new comment by Carl Schulman: ”This is overstating what role error-correcting codes play in that argument. They mean the same programs can be available and evaluate things for eons (and can evaluate later changes with various degrees of learning themselves)”
Remmelt 15:46
Thinking about this overnight, I think Carl’s stated reasoning is still unsound for multiple reasons:
1. Contradiction between a Von Neummann Probe being adaptable enough (ie. learning new ways of processing inputs into outputs) to travel across space and seed new civilisations, yet have error correcting code that allows comparison of new code with original redundant copies. Not going to work, for reasons Forrest amply explained and I tried to summarise here: https://docs.google.com/document/d/1-AAhqvgFNx_MlLkcSgw-chvmFoC4EZ4LmTl1IWcsqEA/edit
2. Confuses complicated pre-loaded technological knowledge/systems with complex adaptive systems. The fact that they are saying that adding in directives would only increase storage by 1 part in 1 billion parts is a giveaway, I think.
Remmelt 15:55
3. Inverse take on 1. Algorithms which can flexibly ‘mutate’ and branch out into different versions become better at using resources and multiplying than more rigid or robustly functional designs. This makes Carl Schulman’s case for launching out self-replicating space probes with code error-checking/minimisation routines seem a lot more dicey. If a defecting group launches even one alternate design with a flexible code-mutating ability that confers an advantage that can’t easily be copied by the error-minimising designs without compromising on their ability to act on the directives humans originally coded in to ‘directly specify the criteria to be eventually optimized’ – well, then you might end up instead with swarms of spaceprobes that eat up the galaxy indiscriminately, including any remaining carbon-based lifeforms on planet Earth.
Underlying premise: even if humans construct a long-term aligned AI design – where humans can formally prove a model to causally constrain any possible process of agency emerging from and expanding across each of the physical parts in which this model infers its computational process to be embedded to stay within all fundamental bounds necessary for maintaining alignment with the values that humans broadly share in common – then in practice that design is ‘one step away’ from getting mutated into misalignment by a faction of humans who seek a capability advantage (does it give one though?) for manifesting their more granular personal values.
Remmelt 15:57
@Forrest anything to add to the above? How about I write that into a comment reply (which perhaps might make for a stand-alone post later?)
Forrest 16:25
The starting point is: That any procedure of AGI alignment will (Ie, cannot not) resemble some sort of error correction algorithm
Forrest 16:28
This point cannot be disagreed with and still have the notion of alignment be persistent. If they argue here, they will need to establish a conformable coherent counter example. Simple contradiction is not enough, as the claim basis shifts to them.
Forrest 16:30
Once hard line equivalence is firmly established, then the argument re limits of causation in all error correction processes apply.
Remmelt 16:44
Good one. So start with how any AGI-alignment method must resemble an error correction algorithm in order to correct out misaligned behaviour learned and expressed in interaction with the environment.
Forrest 16:48
In regards to the idea that some machine could “have a strategy of optimizing for near-maximum X until condition Z is achieved, and *then* implement directive Q” makes at least three assumptions that are highly unlikely.
1; there is no assurance that in the long duration of the optimization of X that there would be persistence of directive Q.
2; there is no assurance that the condition ‘Z’ is consistently detectable, or that the triggering condition is even probable to being implemented, even if detected (as per a violation of assumption 1).
3; the integration coupling of directive Q is mismatched to the sheer mass of the output of the optimized code—ie, it is unlikely that ‘Q’, no matter how formed, will actually result in the consistent behavior described. Ie, ‘Q’ is significantly underspecified, and if ‘Q’ was appropriately specified, it would not be “lightweight” with respect to the optimization process ‘X’, and would have been optimized away to ineffectiveness. Hence, even if Z condition trigger happens, Q does not happen.
Note insofar as the logic is directly generalized over the logical claim form of “adopt a strategy of near-maximum colonization speed until they reach the most distant accessible galaxies, and only then divert resources to producing welfare”, the proposal itself is not well specified. Ie, the assumption that ‘Q’ and detection of ‘Z’ can be “simple” is a conceptual category error.
Remmelt 17:02
I read through this, and think I understand your points.
Forrest 17:10
Yes. I am trying to think of a good metaphor. It is maybe a bit like some single ordinary person trying to command everyone to do something in some consistent way, when that _everyone_ is 100000 people all yelling at one another in some huge auditorium somewhere. The assumption that one person, even at maximum personal signal volume and intensity, could be heard over all similar other people (in this case, the other objectives of the Von Neumann probe), and command them to implement some new action pattern reliably consistent with ‘Q’, is simply deeply mismatched to the output reality of the optimization process ‘X’. The single directive ‘Q’ is simply not powerful enough to fully conditionalize all of ‘X’, even under triggering conditions ‘Z’.
Also, I notice that the assumptions that the error correction process could be applied linearly to a dynamic self recursive system at arbitrary levels of abstraction is simply another category error. Wrong tool for the wrong job. That is less important, though, than the issue with the Q directive arbitrarily efficient effectivity mismatch.
It’s prima facie absurd to think that, e.g. that using interpretability tools to discover that AI models were plotting to overthrow humanity would not help to avert that risk.
I addressed claims of similar forms at least 3 times times already on separate occasions (including in the post itself).
“The fact that mechanistic interpretability can possibly be used to detect a few straightforwardly detectable misalignment of the kinds you are able to imagine right now does not mean that the method can be extended to detecting/simulating most or all human-lethal dynamics manifested in/by AGI over the long term.
If AGI behaviour converges on outcomes that result in our deaths through less direct routes, it really does not matter much whether the AI researcher humans did an okay job at detecting “intentional direct lethality” and “explicitly rendered deception”.”
This is like saying there’s no value to learning about and stopping a nuclear attack from killing you because you might get absolutely no benefit from not being killed then, and being tipped off about a threat trying to kill you, because later the opponent might kill you with nanotechnology before you can prevent it.
Removing intentional deception or harm greatly increases the capability of AIs that can be worked with without getting killed, to further improve safety measures. And as I said actually being able to show a threat to skeptics is immensely better for all solutions, including relinquishment, than controversial speculation.
It’s saying that if you can prevent a doomsday device from being lethal in some ways and not in others, then it’s still lethal. Focussing on some ways that you feel confident that you might be able to prevent the doomsday device from being lethal is IMO distracting dangerously from the point, which is that people should not built the doomsday device in the first place.
If mechanistic interpretability methods cannot prevent that interactions of AGI necessarily converge on total human extinction beyond theoretical limits of controllability, it means that these (or other “inspect internals”) methods cannot contribute to long-term AGI safety. And this is not idle speculation, nor based on prima facie arguments. It is based on 15 years of research by a polymath working outside this community.
In that sense, it would not really matter that mechanistic interpretability can do an okay job at detecting that a power-seeking AI was explicitly plotting to overthrow humanity.
That is, except for the extremely unlikely case you pointed to that such intentions are detected and on time, and humans all coordinate at once to impose an effective moratorium on scaling or computing larger models. But this is actually speculation, whereas that OpenAI promoted Olah’s fascinating Microscope-generated images as them making progress on understanding and aligning scalable ML models is not speculation.
Overall, my sense is that mechanistic interpretability is used to align-wash capability progress towards AGI, while not contributing to safety where it predominantly matters.
Removing intentional deception or harm greatly increases the capability of AIs that can be worked with without getting killed, to further improve safety measures.
Exactly this kind of thinking is what I am concerned about. It implicitly assumes that you have a (sufficiently) comprehensive and sound understanding of the ways humans would get killed at a given level of capability, and therefore can rely on that understanding to conclude that capabilities of AIs can be greatly increased without humans getting killed.
How do you think capability developers would respond to that statement? Will they just stay on the safe side, saying “Well those alignment researchers say that mechanistic interpretability helps remove intentional deception or harm, but I’m just going to stay on the safe side and not scale any further”. No, they are going to use your statement to promote the potential safety of their scalable models, and remove whatever safety margin they can justify themselves taking and feel justified taking for themselves.
Not considering unknown unknowns is going to get us killed. Not considering what safety problems may be unsolvable is going to get us killed.
Age-old saying: “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.”
> What is your background? > How is it relevant to the work > you are planning to do?
Years ago, we started with a strong focus on civilization design and mitigating x-risk. These are topics that need and require more generalist capabilities, in many fields, not just single specialist capabilities, in any one single field of study or application.
Hence, as generalists, we are not specifically persons who are career mathematicians, nor even career physicists, chemists, or career biologists, anthropologists, or even career philosophers. Yet when considering the needs of topics civ-design and/or x-risk, it is very abundantly clear that some real skill and expertise is actually needed in all of these fields.
Understanding anything about x-risk and/or civilization means needing to understand key topics regarding large scale institutional process, ie; things like governments, businesses, university, constitutional law, social contract theory, representative process, legal and trade agreements, etc.
Yet people who study markets, economics, and politics (theory of groups, firms, etc) who do not also have some real grounding in actual sociology and anthropology, are not going to have grounding in understanding why things happen in the real world as they tend to do.
And those people are going to need to understand things like psychology, developmental psych, theory of education, interpersonal relationships, attachment, social communication dynamics, health of family and community, trauma, etc.
And understanding *those* topics means having a real grounding in evolutionary theory, bio-systems, ecology, biology, neurochemestry and neurology, ecosystem design, permaculture, and evolutionary psychology, theory of bias, etc.
It is hard to see that we would be able to assess things like ‘sociological bias’ as impacting possible mitigation strategies of x-risk, if we do not actually also have some real and deep, informed, and realistic accounting of the practical implications of, in the world, of *all* of these categories of ideas.
And yet, unfortunately, that is not all, since understanding of *those* topics themselves means even more and deeper grounding in things like organic and inorganic chemistry, cell process, and the underlying *physics* of things like that. Which therefore includes a fairly general understanding of multiple diverse areas of physics (mechanical, thermal, electromagnetic, QM, etc), and thus also of technology—since that is directly connected to business, social systems, world systems infrastructure, internet, electrical grid and energy management, transport (for fuel, materials, etc), and even more politics, advertising and marketing, rhetorical process and argumentation, etc.
Oh, and of course, a deep and applied practical knowledge of ‘computer science’, since nearly everything in the above is in one way or another “done with computers”. Maybe, of course, that would also be relevant when considering the specific category of x-risk which happens to involve computational concepts when thinking about artificial superintelligence.
I *have* been a successful practicing engineer in both large scale US-gov deployed software and also in product design shipped to millions. I have personally written more than 900,000 lines of code (mostly Ansi-C, ASM, Javascript) and have been ‘the principle architect’ in a team. I have developed my own computing environments, languages, procedural methodologies, and system management tactics, over multiple process technologies in multiple applied contexts. I have a reasonably thorough knowledge of CS. Including the modeling math, control theory, etc. Ie, I am legitimately “full stack” engineering from the physics of transistors, up through CPU design, firmware and embedded systems, OS level work, application development, networking, user interface design, and the social process implications of systems. I have similarly extensive accomplishments in some of the other listed disciplines also.
As such, as a proven “career” generalist, I am also (though not just) a master craftsman, which includes things like practical knowledge of how to negotiate contracts, write all manner documents, make all manner of things, *and* understand the implications of *all* of this in the real world, etc.
For the broad category of valid and reasonable x-risk assessment, that nothing less than at least some true depth in nearly *all* of these topics, will do.
From Math Expectations, a depersonalised post Forrest wrote of his impressions of a conversation with a grant investigator where the grant investigator kept looping back on the expectation that a “proof” based on formal reasoning must be written in mathematical notation. We did end up receiving the $170K grant.
I usually do not mention Forrest Landry’s name immediately for two reasons:
If you google his name, he comes across like a spiritual hippie. Geeks who don’t understand his use of language take that as a cue that he must not know anything about computational science, mathematics or physics (wrong – Forrest has deep insights into programming methods and eg. why Bell’s Theorem is a thing) .
Forrest prefers to work on the frontiers of research, rather than repeating himself in long conversations with tech people who cannot let go off their own mental models and quickly jump to motivated counterarguments that he heard and addressed many times before. So I act as a bridge-builder, trying to translate between Forrest speak and Alignment Forum speak.
Both of us prefer to work behind the scenes. I’ve only recently started to touch on the arguments in public.
You can find those arguments elaborated on here. Warning: large inferential distance; do message clarifying questions – I’m game!
It’s saying that if you can prevent a doomsday device from being lethal in some ways and not in others, then it’s still lethal. Focussing on some ways that you feel confident that you might be able to prevent the doomsday device from being lethal is IMO distracting dangerously from the point, which is that people should not built the doomsday device in the first place.
As requested by Remmelt I’ll make some comments on the track record of privacy advocates, and their relevance to alignment.
I did some active privacy advocacy in the context of the early Internet in the 1990s, and have been following the field ever since. Overall, my assessment is that the privacy advocacy/digital civil rights community has had both failures and successes. It has not succeeded (yet) in its aim to stop large companies and governments from having all your data. On the other hand, it has been more successful in its policy advocacy towards limiting what large companies and governments are actually allowed to do with all that data.
The digital civil rights community has long promoted the idea that Internet based platforms and other computer systems must be designed and run in a way that is aligned with human values. In the context of AI and ML based computer systems, this has led to demands for AI fairness and transparency/explainability that have also found their way into policy like the GDPR, legislation in California, and the upcoming EU AI Act. AI fairness demands have influenced the course of AI research being done, e.g. there has been research on defining it even means for an AI model to be fair, and on making models that actually implement this meaning.
To a first approximation, privacy and digital rights advocates will care much more about what an ML model does, what effect its use has on society, than about the actual size of the ML model. So they are not natural allies for x-risk community initiatives that would seek a simple ban on models beyond a certain size. However, they would be natural allies for any initiative that seeks to design more aligned models, or to promote a growth of research funding in that direction.
To make a comment on the premise of the original post above: digital rights activists will likely tell you that, when it comes to interventions on AI research, speculating about the tractability of ‘slowing down AI research’ is misguided. What you really should be thinking about is changing the direction of AI research.
Also, I stand corrected then on my earlier comment on that privacy and digital ownership advocates would/should care about models being trained on their own/person-tracking data such to restrict the scaling of models. I’m guessing I was not tracking well then what people in at least the civil rights spaces Koen moves around in are thinking and would advocate for.
A movement pursuing antidiscrimination or privacy protections for applications of AI that thinks the risk of AI autonomously destroying humanity is nonsense seems like it will mainly demand things like the EU privacy regulations, not bans on using $10B of GPUs instead of $10M in a model.
This is a very spicy take, but I would (weakly) guess that a hypothetical ban on ML trainings that cost more than $10M would make AGI timelines marginally shorter rather than longer, via shifting attention and energy away from scaling and towards algorithm innovation.
Very interesting! Recently, US started to regulate export of computing power to China. Do you expect this to speed up AGI timeline in China, or do you expect regulation to be ineffective, or something else?
Reportedly, NVIDIA developed A800, which is just A100, to keep the letter but probably not the spirit of the regulation. I am trying to follow closely how A800 fares, because it seems to be an important data point on feasibility of regulating computing power.
I strongly agree with Steven about this. Personally, I expect it’ll be non-impactful in either direction. I think the majority of research groups already have sufficient compute available to make dangerous algorithmic progress, and they are not so compute-resource-rich that their scaling efforts are distracting them from more dangerous pursuits. I think the groups who would be more dangerous if they weren’t ‘resource drunk’ are mainly researchers at big companies.
I think the two camps are less orthogonal than your examples of privacy and compute reg portray. There’s room for plenty of excellent policy interventions that both camps could work together to support. For instance, increasing regulatory requirements for transparency on algorithmic decision-making (and crucially, building a capacity both in regulators and in the market supporting them to enforce this) is something that I think both camps would get behind (the xrisk one because it creates demand for interpretability and more and the other because eg. it’s easier to show fairness issues) and could productively work on together. I think there are subculture clash reasons the two camps don’t always get on, but that these can be overcome, particularly given there’s a common enemy (misaligned powerful AI). See also this paper Beyond Near- and Long-Term: Towards a
Clearer Account of Research Priorities in AI Ethics and Society
I know lots of people who are uncertain about how big the risks are, and care about both problems, and work on both (I am one of these—I care more about AGI risk, but I think the best things I can do to help avert it involve working with the people you think aren’t helpful).
Seems reason regarding public policy. But what about 1. private funders of AGI-relevant research 2. researchers doing AGI-relevant research?
Seems like there’s a lot of potential reframings that make it more feasible to separate safe-ish research from non-safe-ish research. E.g. software 2.0: we’re not trying to make a General Intelligence, we’re trying to replace some functions in our software with nets learned from data. This is what AlphaFold is like, and I assume is what ML for fusion energy is like. If there’s a real category like this, a fair amount of the conflict might be avoidable?
Most AI companies and most employees there seem not to buy risk much, and to assign virtually no resources to address those issues. Unilaterally holding back from highly profitable AI when they won’t put a tiny portion of those profits into safety mitigation again looks like an ask out of line with their weak interest. Even at the few significant companies with higher percentages of safety effort, it still looks to me like the power-weighted average of staff is extremely into racing to the front, at least to near the brink of catastrophe or until governments buy risks enough to coordinate slowdown.
So asks like investing in research that could demonstrate problems with higher confidence, or making models available for safety testing, or similar still seem much easier to get from those companies than stopping (and they have reasonable concerns that their unilateral decision might make the situation worse by reducing their ability to do helpful things, while regulatory industry-wide action requires broad support).
As with government, generating evidence and arguments that are more compelling could be super valuable, but pretending you have more support than you do yields incorrect recommendations about what to try.
looks to me like the power-weighted average of staff is extremely into racing to the front, at least to near the brink of catastrophe or until governments buy risks enough to coordinate slowdown.
Can anyone say confident why? Is there one reason that predominates, or several? Like it’s vaguely something about status, money, power, acquisitive mimesis, having a seat at the table… but these hypotheses are all weirdly dismissive of the epistemics of these high-powered people, so either we’re talking about people who are high-powered because of the managerial revolution (or politics or something), or we’re talking about researchers who are high-powered because they’re given power because they’re good at research. If it’s the former, politics, then it makes sense to strongly doubt their epistemics on priors, but we have to ask, why can they meaningfully direct the researchers who are actually good at advancing capabilities? If it’s the latter, good researchers have power, then why are their epistemics suddenly out the window here? I’m not saying their epistemics are actually good, I’m saying we have to understand why they’re bad if we’re going to slow down AI through this central route.
There are a lot of pretty credible arguments for them to try, especially with low risk estimates for AI disempowering humanity, and if their percentile of responsibility looks high within the industry.
One view is that the risk of AI turning against humanity is less than the risk of a nasty eternal CCP dictatorship if democracies relinquish AI unilaterally. You see this sort of argument made publicly by people like Eric Schmidt, and ‘the real risk isn’t AGI revolt, it’s bad humans’ is almost a reflexive take for many in online discussion of AI risk. That view can easily combine with the observation that there has been even less takeup of AI safety in China thus far than in liberal democracies, and mistrust of CCP decision-making and honesty, so it also reduces accident risk.
With respect to competition with other companies in democracies, some labs can correctly say that they have taken action that signals they are more into taking actions towards safety or altruistic values (including based on features like control by non-profit boards or % of staff working on alignment), and will have vastly more AI expertise, money, and other resources to promote those goals in the future by locally advancing AGI, e.g. OpenAI reportedly has a valuation of over $20B now and presumably more influence over the future of AI and ability to do alignment work than otherwise. Whereas some sitting on the sidelines may lack financial and technological/research influence when it is most needed. And, e.g. the OpenAI charter has this clause:
We are concerned about late-stage AGI development becoming a competitive race without time for adequate safety precautions. Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project. We will work out specifics in case-by-case agreements, but a typical triggering condition might be “a better-than-even chance of success in the next two years.
Technical Leadership
To be effective at addressing AGI’s impact on society, OpenAI must be on the cutting edge of AI capabilities—policy and safety advocacy alone would be insufficient.
We believe that AI will have broad societal impact before AGI, and we’ll strive to lead in those areas that are directly aligned with our mission and expertise.
Then there are altruistic concerns about the speed of AI development. E.g. over 60 million people die every year, almost all of which could be prevented by aligned AI technologies. If you think AI risk is very low, then current people’s lives would be saved by expediting development even if risk goes up some.
And of course there are powerful non-altruistic interests in enormous amounts of money, fame, and personally getting to make a big scientific discovery.
Note that the estimate of AI risk magnitude, and the feasibility of general buy-in on the correct risk level, recurs over and over again, and so credible assessments and demonstrations of large are essential to making these decisions better.
Taking an extreme perspective here: do future generations of people not alive and who no one alive now would meet have any value?
One perspective is no they don’t. From that perspective “humanity” continues only as some arbitrary random numbers from our genetics. Even Clippy probably keeps at least one copy of the human genome in a file somewhere so it’s the same case.
That is, there is no difference between the outcomes of:
we delay AI a few generations and future generations of humanity take over the galaxy
we fall to rampant AIs and their superintelligent descendants take over the galaxy
If you could delay AI long enough you would be condemning the entire population of the world to death from aging, or essentially the same case where the rampant AI kills the entire world.
There are a lot of pretty credible arguments for them to try, especially with low risk estimates for AI disempowering humanity, and if their percentile of responsibility looks high within the industry.
One view is that the risk of AI turning against humanity is less than the risk of a nasty eternal CCP dictatorship if democracies relinquish AI unilaterally. You see this sort of argument made publicly by people like Eric Schmidt, and ‘the real risk isn’t AGI revolt, it’s bad humans’ is almost a reflexive take for many in online discussion of AI risk. That view can easily combine with the observation that there has been even less takeup of AI safety in China thus far than in liberal democracies, and mistrust of CCP decision-making and honesty, so it also reduces accident risk.
My thought: seems like a convincing demonstration of risk could be usefully persuasive.
I’ll make an even stronger statement: So long as the probabilities of a technological singularity isn’t too low, they can still rationally keep working on it even if they know the risk is high, because the expected utility is much greater still.
This comment employs an oddly common failure mode of ignoring intermediate successes that align with market incentives, like “~N% of AI companies stop publishing their innovations on Arxiv for free”.
Those are good points. There are some considerations that go in the other direction. Sometimes it’s not obvious what’s a “failure to convince people” vs. “a failure of some people to be convincible.” (I mean convincible by object-level arguments as opposed to convincible through social cascades where a particular new view reaches critical mass.)
I believe both of the following:
Persuasion efforts haven’t been exhausted yet: we can do better at reaching not-yet-safety-concerned AI researchers. (That said, I think it’s at least worth considering that we’re getting close to exhausting low-hanging fruit?)
Even so, “persuasion as the main pillar of a strategy” is somewhat likely to be massively inadequate because it’s difficult to change the minds and culture of humans in general (even if they’re smart), let alone existing organizations.
Another point that’s maybe worth highlighting is that the people who could make large demands don’t have to be the same people who are best-positioned for making smaller asks. (This is Katja’s point about there not being a need for everyone to coordinate into a single “we.”) The welfarism vs. abolitionism debate in animal advocacy and discussion of the radical flank effect seems related. I also agree with a point lc makes in his post on slowing down AI. He points out that there’s arguably a “missing mood” around the way most people in EA and the AI alignment community communicate with safety-unconcerned researchers. The missing sense of urgency probably lowers the chance of successful persuasion efforts?
Lastly, it’s a challenge that there’s little consensus in the EA research community around important questions like “How hard is AI alignment?,” “How hard is alignment conditional on <5 years to TAI?,” and “How long are TAI timelines?” (Though maybe there’s quite some agreement on the second one and the answer is at least, “it’s not easy?”)
I’d imagine there would at least be quite a strong EA expert consensus on the following conditional statement (which has both normative and empirical components):
Let’s call it “We’re in Inconvenient World” if it’s true that, absent strong countermeasures, we’ll have misaligned AI that brings about human extinction in <5 years. If the chance that we’re in Inconvenient World is10% or higher, we should urgently make large changes to the way AI development progresses as a field/industry.
Based on this, some further questions one could try to estimate are:
How many people (perhaps weighted by their social standing within an organization, opinion leaders, etc.) are convincible of the above conditional statement? Is it likely we could reach a critical mass?
Doing this for any specific org (or relevant branch of government, etc.) that seems to play a central role
What’s the minimum consensus threshold for “We’re in Inconvenient World?” (I.e., what percentage would be indefensibly low to believe in light of peer disagreement unless one considers oneself the world’s foremost authority on the question?)
He points out that there’s arguably a “missing mood” around the way most people in EA and the AI alignment community communicate with safety-unconcerned researchers. The missing sense of urgency probably lowers the chance of successful persuasion efforts?
Sorry for responding very late, but it’s basically because contra the memes, most LWers do not agree with Eliezer’s views on how doomed we are. This is very much a fringe viewpoint on LW, not the mainstream.
So the missing mood is basically because most of LW doesn’t share Eliezer’s views on certain cruxes.
If the balance of opinion of scientists and policymakers (or those who had briefly heard arguments) was that AI catastrophic risk is high, and that this should be a huge social priority, then you could do a lot of things. For example, you could get budgets of tens of billions of dollars for interpretability research, the way governments already provide tens of billions of dollars of subsidies to strengthen their chip industries. Top AI people would be applying to do safety research in huge numbers. People like Bill Gates and Elon Musk who nominally take AI risk seriously would be doing stuff about it, and Musk could have gotten more traction when he tried to make his case to government.
My perception based on many areas of experience is that policymakers and your AI expert survey respondents on the whole think that these risks are too speculative and not compelling enough to outweigh the gains from advancing AI rapidly (your survey respondents state those are much more likely than the harms). In particular, there is much more enthusiasm for the positive gains from AI than your payoff matrix suggests (particularly among AI researchers), and more mutual fear (e.g. the CCP does not want to be overthrown and subjected to trials for crimes against humanity as has happened to some other regimes, and the rest of the world does not want to live under oppressive CCP dictatorship indefinitely).
But you’re proposing that people worried about AI disaster should leapfrog smaller asks of putting a substantial portion of the effort going into accelerating AI into risk mitigation, which we haven’t been able to achieve because of low buy-in on the case for risk, to far more costly and demanding asks (on policymakers’ views, which prioritize subsidizing AI capabilities and geopolitical competition already). But if you can’t get the smaller more cost-effective asks because you don’t have buy-in on your risk model, you’re going to achieve even less by focusing on more extravagant demands with much lower cost-effectiveness that require massive shifts to make a difference (adding $1B to AI safety annual spending is a big multiplier from the current baseline, removing $1B from semiconductor spending is a miniscule proportional decrease).
When your view is the minority view you have to invest in scientific testing to evaluate your view and make the truth more credible, and better communication. You can’t get around failure to convince the world of a problem by just making more extravagant and politically costly demands about how to solve it. It’s like climate activists in 1950 responding to difficulties passing funds for renewable energy R&D or a carbon tax by proposing that the sale of automobiles be banned immediately. It took a lot of scientific data, solidification of scientific consensus, and communication/movement-building over time to get current measures on climate change, and the most effective measures actually passed have been ones that minimized pain to the public (and opposition), like supporting the development of better solar energy.
Another analogy in biology: if you’re worried about engineered pandemics and it’s a struggle to fund extremely cost-effective low-hanging fruit in pandemic prevention, it’s not a better strategy to try to ban all general-purpose biomedical technology research.
I think this comment is overstating the case for policymakers and the electorate actually believing that investing in AI is good for the world. I think the answer currently is “we don’t know what policymakers and the electorate actually want in relation to AI” as well as “the relationship of policymakers and the electorate is in the middle of shifting quite rapidly, so past actions are not that predictive of future actions”.
I really only have anecdata to go on (though I don’t think anyone has much better), but my sense from doing informal polls of e.g. Uber drivers, people on Twitter, and perusing a bunch of Subreddits (which, to be clear, is a terrible sample) is that indeed a pretty substantial fraction of the world is now quite afraid of the consequences of AI, both in a “this change is happening far too quickly and we would like it to slow down” sense, and in a “yeah, I am actually worried about killer robots killing everyone” sense. I think both of these positions are quite compatible with pushing for a broad slow down. There is also a very broad and growing “anti-tech” movement that is more broadly interested in giving less resources to the tech sector, whose aims are at least for a long while compatible with slowing down AGI progress.
My current guess is that policies that are primarily aimed at slowing down and/or heavily regulating AI research are actually pretty popular among the electorate, and I also expect them to be reasonably popular among policymakers, though I also expect their preferences to lag behind the electorate for a while. But again, I really think we don’t know, and nobody has run even any basic surveys on the topic yet.
Edit: Inspired by this topic/discussion, I ended up doing some quick google searches for AI opinion polls. I didn’t find anything great, but this Pew report has some stuff that’s pretty congruent with potential widespread support for AI regulation: https://www.pewresearch.org/internet/2022/03/17/how-americans-think-about-artificial-intelligence/
I collected such polls here, if you want to see more. Most people say they want to regulate AI.
I agree there is some weak public sentiment in this direction (with the fear of AI takeover being weaker). Privacy protections and redistribution don’t particularly favor measures to avoid AI apocalypse.
I’d also mention this YouGov survey:
But the sentiment looks weak compared to e.g. climate change and nuclear war, where fossil fuel production and nuclear arsenals continue, although there are significant policy actions taken in hopes of avoiding those problems. The sticking point is policymakers and the scientific community. At the end of the Obama administration the President asked scientific advisors what to make of Bostrom’s Superintelligence, and concluded not to pay attention to it because it was not an immediate threat. If policymakers and their advisors and academia and the media think such public concerns are confused, wrongheaded, and not politically powerful they won’t work to satisfy them against more pressing concerns like economic growth and national security. This is a lot worse than the situation for climate change, which is why it seems better regulation requires that the expert and elite debate play out differently, or the hope that later circumstances such as dramatic AI progress drastically change views (in favor of AI safety, not the central importance of racing to AI).
That seems correct to me, but on the other hand, I think the public sentiment against things like GMOs was also weaker than the one that we currently have against climate change, and GMOs got slowed down regardless. Also I’m not sure how strong the sentiment against nuclear power was relative to the one against climate change, but in any case, nuclear power got hindered quite a bit too.
I think one important aspect where fossil fuels are different from GMOs and nuclear power is that fossil fuel usage is firmly entrenched across the economy and it’s difficult, costly, and slow to replace it. Whereas GMOs were a novel thing and governments could just decide to regulate them and slow them down without incurring major immediate costs. As for nuclear power, it was somewhat entrenched in that there were many existing plants, but society could make the choice to drastically reduce the progress of building new ones—which it did.
Nuclear arsenals don’t quite fit this model—in principle, one could have stopped expanding them, but they did keep growing for quite a bit, despite public opposition. Then again, there was an arms race dynamic there. And eventually, nuclear arsenals got cut down in size too.
I think AI is in a sense comparable to nuclear power and GMOs in that there are existing narrow AI applications that would be hard and costly to get rid of, but more general and powerful AI is clearly not yet entrenched due to not having been developed yet. On the other hand, AI labs have a lot of money and there are lots of companies that have significant investments in AI R&D, so that’s some level of entrenchment.
Whether nuclear weapons are comparable to AI depends on whether you buy the arguments in the OP for them being different… but seems also relevant that AI arms race arguments are often framed as the US vs. China. That seems reasonable enough, given that the West could probably find consensus on AI as it has found on other matters of regulation, Russia does not seem to be in a shape to compete, and the rest of the world isn’t really on the leading edge of AI development. And now it seems like China might not even particularly care about AI [1, 2].
I’ll shill here and say that Rethink Priorities is pretty good at running polls of the electorate if anyone wants to know what a representative sample of Americans think about a particular issue such as this one. No need to poll Uber drivers or Twitter when you can do the real thing!
I’d very much like to see this done with standard high-quality polling techniques, e.g. while airing counterarguments (like support for expensive programs that looks like majority but collapses if higher taxes to pay for them is mentioned). In particular, how the public would react given different views coming from computer scientists/government commissions/panels.
I think that could be valuable.
It might be worth testing quite carefully for robustness—to ask multiple different questions probing the same issue, and see whether responses converge. My sense is that people’s stated opinions about risks from artificial intelligence, and existential risks more generally, could vary substantially depending on framing. Most haven’t thought a lot about these issues, which likely contributes. I think a problem problem with some studies on these issues is that researchers over-generalise from highly framing-dependent survey responses.
That makes a lot of sense. We can definitely test a lot of different framings. I think the problem with a lot of these kinds of problems is that they are low saliency, and thus people tend not to have opinions already, and thus they tend to generate an opinion on the spot. We have a lot of experience polling on low saliency issues though because we’ve done a lot of polling on animal farming policy which has similar framing effects.
I would definitely vote in favor of a grant to do this on the LTFF, as well as the SFF, and might even be interested in backstopping it with my personal funds or Lightcone funds.
Cool—I’ll follow up when I’m back at work.
I think that’s exactly right.
I found this thread interesting and useful, but I feel a key point has been omitted thus far (from what I’ve read):
Public, elite, and policymaker beliefs and attitudes related to AI risk aren’t just a variable we (members of the EA/longtermist/AI safety communities) have to bear in mind and operate in light of, but instead also a variable we can intervene on.
And so far I’d say we have (often for very good reasons) done significantly less to intervene on that variable than we could’ve or than we could going forward.
So it seems plausible that actually these people are fairly convincible if exposed to better efforts to really explain the arguments in a compelling way.
We’ve definitely done a significant amount of this kind of work, but I think we’ve often (a) deliberately held back on doing so or on conveying key parts of the arguments, due to reasonable downside risk concerns, and (b) not prioritized this. And I think there’s significantly more we could do if we wanted to, especially after a period of actively building capacity for this.
Important caveats / wet blankets:
I think there are indeed strong arguments against trying to shift relevant beliefs and attitudes in a more favorable direction, including not just costs and plausibly low upside but also multiple major plausible downside risks.[1]
So I wouldn’t want anyone to take major steps in this direction without checking in with multiple people working on AI safety/governance first.
And it’s not at all obvious to me we should be doing more of that sort of work. (Though I think whether, how, & when we should is an important question and I’m aware of and excited about a couple small research projects that are happening on that.)
All I really want to convey in this comment is what I said in my first paragraph: we may be able to significantly push beliefs and opinions in favorable directions relative to where they are now or would be n future by default.
Due to time constraints, I’ll just point to this vague overview.
I think I would have totally agreed in 2016. One update since then is that I think progress scales way less than resources than I used to think it did. In many historical cases, a core component of progress driven by a small number of people (which is reflected in citation counts, who is actually taught in textbooks), and introducing lots of funding and scaling too fast can disrupt that by increasing the amount of fake work.
$1B in safety well-spent is clearly more impactful than $1B less in semiconductors, it’s just that “well-spent” is doing a lot of work, someone with a lot of money is going to have lots of people trying to manipulate their information environment to take their stuff.
Reducing especially dangerous tech progress seems more promising than reducing tech broadly, however since these are dual use techs, creating knowledge about which techs are dangerous can accelerate development in these sectors (especially the more vice signalling / conflict orientation is going on). This suggests that perhaps an effective way to apply this strategy is to recruit especially productive researchers (identified using asymmetric info) to labs where they work on something less dangerous.
In gain of function research and nuclear research, progress requires large expensive laboratories; AI theory progress doesn’t require that, although large scale training does (though, to a lesser extent than GOF or nuclear).
There are plenty of movements out there (ethics & inclusion, digital democracy, privacy, etc.) who are against current directions of AI developments, and they don’t need the AGI risk argument to be convinced that current corporate scale-up of AI models is harmful.
Working with them, redirecting AI developments away from more power-consolidating/general AI may not be that much harder than investing in supposedly “risk-mitigating” safety research.
Do you think there is a large risk of AI systems killing or subjugating humanity autonomously related to scale-up of AI models?
A movement pursuing antidiscrimination or privacy protections for applications of AI that thinks the risk of AI autonomously destroying humanity is nonsense seems like it will mainly demand things like the EU privacy regulations, not bans on using $10B of GPUs instead of $10M in a model. It also seems like it wouldn’t pursue measures targeted at the kind of disaster it denies, and might actively discourage them (this sometimes happens already). With a threat model of privacy violations restrictions on model size would be a huge lift and the remedy wouldn’t fit the diagnosis in a way that made sense to policymakers. So I wouldn’t expect privacy advocates to bring them about based on their past track record, particularly in China where privacy and digital democracy have not had great success.
If it in fact is true that there is a large risk of almost everyone alive today being killed or subjugated by AI, then establishing that as scientific consensus seems like it would supercharge a response dwarfing current efforts for things like privacy rules, which would aim to avert that problem rather than deny it and might manage such huge asks, including in places like China. On the other hand, if the risk is actually small, then it won’t be possible to scientifically demonstrate high risk, and it would play a lesser role in AI policy.
I don’t see a world where it’s both true the risk is large and knowledge of that is not central to prospects for success with such huge political lifts.
I can imagine there being movements that fit this description, in which case I would not focus on talking with them or talking about them.
But I have not been in touch with any movements matching this description. Perhaps you could share specific examples of actions from specific movements you have in mind?
For the movements I have in mind (and am talking with), the description does not match at all:
AI ethics and inclusion movements go a lot further than stopping people from building AI that eg. make discriminatory classifications/recommendations associated with marginalised communities – they want Western corporations to stop consolidating power through AI development and deployment while pushing their marginalised communities further out of the loop (rendering them voiceless).
Digital democracy groups and human-centric AI movements go a lot further than wanting to regulate AI – they want to relegate AI models humble models in the background that can assist and interface between humans building consensus and making decisions in the foreground.
Privacy and data ownership movements go a lot further than wanting current EU regulations – they do not want models to be trained on, store and exploit their own data in model parameters without their permission.
Suggest reading writings by people in those movements. Let me also copy over excerpts from people active in the areas of AI ethics & inclusion and digital democracy:
“We also advocate for a re-alignment of research goals: Where much effort has been allocated to making models (and their training data) bigger and to achieving ever higher scores on leaderboards often featuring artificial tasks, we believe there is more to be gained by focusing on understanding how machines are achieving the tasks in question and how they will form part of socio-technical systems.” from paper co-authored by Timnit Gebru.
“Rationalists are like most of the ideological groups I interact with. They are allies in important projects, such as limiting the race for massive investments in “AI” capabilities and engaging in governance experimentation. In other projects, such as limiting the social power/hubris of SV and diversifying it along a variety of dimensions they are more likely adversaries or at least unlikely allies.” from post by Glen Weyl.
Yes, I do. And the movements I am in touch with are against corporate R&D labs scaling up AI models in the careless ways they’ve been doing so far.
Are you taking a stance here of “those outside movements have different explicit goals than us AI Safety researchers, and therefore cannot become goal-aligned with our efforts”?
In that case, I would disagree here.
Theoretically, I disagree with the ontological and meta-ethical assumptions that these claims would be based on. While objective goals expressed here are disjunctive, the underlying values are additive (I do not expect this statement to make sense for you; please skip to next point).
Practically, movements with various explicit goals are already against corporations (that are selected by markets to extract value from local communities) centrally scaling up the training of increasingly autonomous/power-seeking models. Some examples:
re: AI ethics and inclusion:
The Stochastic Parrots paper co-authored by Timnit Gebru (before Google AI managers let her go, so to speak), describes various reasons for slowing down the scaled training of (language) transformer models. These reasons include the environmental costs of compute, neglecting to curate training data carefully, and the failure to co-design with stakeholders affected. All reasons to not scale up AI models fast.
Note that I have not read any writings from Gebru that “AGI risk” is not a thing. More the question of why people are then diverting resources to AGI-related research while assuming that the development general AI is inevitable and beyond our control.
re: Digital democracy and human-centric AI:
The How AI Fails Us paper (see p5) argues against the validity of and further investment in the “centralization of capital and decision-making capacity under the direction of a small group of engineers of AI systems” where “the machine is independent from human input and oversight” and with “the target of “achieving general intelligence””. How does this not match up with arguing against “subjugating humanity autonomously [with the centralised] scale-up of AI models”?
People like Divya Siddarth, Glen Weyl, Audrey Tang, Jaron Lanier and Daron Acemoglu have repeatedly expressed their concerns about how current automation of work through AI models threatens the empowerment of humans in their work, creativity, and collective choice-making.
Weyl is also skeptical about the monolithic conception of “AGI” surpassing humans across some metrics. I disagree in that generally-capable self-learning/modifying machinery are physically possible. I agree in that monolithic oversimplified representations of AGI have allowed AI Safety researchers to make unsound presumptive claims about how they expect that machinery could be “aligned” in “principle.
As an example, you mentioned how governments could invest tens of billions of dollars in interpretability research. I touched on reasons here why interpretability research does not and cannot contribute top long-term AGI safety. Based on that, government-funded in interpretability research would distract smart AI researchers from actually contributing, and lend false confidence to AGI researchers that AGI could be interpreted sufficiently. Ie. this is “align-washing” the harmful activities by AI corporations, analogous to green-washing the harmful activities of fossil-fuel corporations.
As another example, your idea of Von Neuman Probes with error correcting codes, referred to by Christiano here, cannot soundly work for AGI code (as self-learning new code for processing inputs into outputs, and as introducing errors through interactions with the environment that cannot be detected and corrected). This is overdetermined. An ex-Pentagon engineer has spelled out the reasons to me. See a one-page summary by me here.
re: Privacy and data ownership:
If privacy and data ownership movements take their own claims seriously (and some do), they would push for banning the training of ML models on human-generated data or any sensor-based surveillance that can be used to track humans’ activities.
What do you mean here with a “huge lift”?
Koen Holtman has been involved with internet privacy movements for decades. Let me ping him in case he wants to share thoughts on what went wrong there in Europe/ and in China.
I agree that some specific leaders you cite have expressed distaste for model scaling, but it seems not to be a core concern. In a choice between more politically feasible measures that target concerns they believe are real vs concerns they believe are imaginary and bad, I don’t think you get the latter. And I think arguments based on those concerns get traction on measures addressing the concerns, but less so on secondary wishlist items of leaders .
I think that’s the reason privacy advocacy in legislation and the like hasn’t focused on banning computers in the past (and would have failed if they tried). For example:
AGI working with AI generated data or data shared under the terms and conditions of web services can power the development of highly intelligent catastrophically dangerous systems, and preventing AI from reading published content doesn’t seem close to the core motives there, especially for public support on privacy. So taking the biggest asks they can get based on privacy arguments I don’t think blocks AGI.
It looks this kind of concern at scale naturally goes towards things like compensation for creators (one of Lanier’s recs), UBI, voting systems, open-source AI, and such.
Jaron Lanier has written a lot dismissing the idea of AGI or work to address it. I’ve seen a lot of such dismissal from Glen Weyl. Acemoglu I don’t think wants to restrict AI development? I don’t know Siddarth or Tang’s work well.
From Twitter:
That reads to me as saying concerns about ‘AGI apocalypse’ are delusional nonsense but pursuit of a false dream of AGI incidentally cause harms like hateful AI speech through advancing weaker AI technology, while the delusions should not be an important priority.
I gave the example of barring model scaling above a certain budget.
I disagree extremely strongly with that claim. It’s prima facie absurd to think that, e.g. that using interpretability tools to discover that AI models were plotting to overthrow humanity would not help to avert that risk. For instance, that’s exactly the kind of thing that would enable a moratorium on scaling and empowering those models to improve the situationn.
This is overstating what role error-correcting codes play in that argument. They mean the same programs can be available and evaluate things for eons (and can evaluate later changes with various degrees of learning themselves), but don’t cover all changes that could derive from learning (although there are other reasons why those could be stable in preserving good or terrible properties).
I intend to respond to the rest tomorrow.
Some of your interpretations of writings by Timnit Gebru and Glen Weyl seem fair to me (though would need to ask them to confirm). I have not look much into Jaron Lanier’s writings on AGI so that prompts me to google that.
Perhaps you can clarify the other reasons why the changes in learning would be stable in preserving “good properties”? I’ll respond to your nuances regarding how to interpret your long-term-evaluating error correcting code after that.
re: Leaders of movements being skeptical of the notion of AGI.
Reflecting more, my impression is that Timnit Gebru is skeptical about the sci-fiy descriptions of AGI, and even more so about the social motives of people working on developing (safe) AGI. She does not say that AGI is an impossible concept or not actually a risk. She seems to question the overlapping groups of white male geeks who have been diverting efforts away from other societal issues, to both promoting AGI development and warning of AGI x-risks.
Regarding Jaron Lanier, yes, (re)reading this post I agree that he seems to totally dismiss the notion of AGI, seeing it more a result of a religious kind of thinking under which humans toil away at offering the training data necessary for statistical learning algorithms to function without being compensated.
Returning on error correction point:
Feel free to still clarify the other reasons why the changes in learning would be stable in preserving “good properties”. Then I will take that starting point to try explain why the mutually reinforcing dynamics of instrumental convergence and substrate-needs convergence override that stability.
Fundamentally though, we’ll still be discussing the application limits of error correction methods.
Three ways to explain why:
Any workable AI-alignment method involves receiving input signals, comparing input signals against internal references, and outputting corrective signals to maintain alignment of outside states against those references (ie. error correction).
Any workable AI-alignment method involves a control feedback loop – of detecting the actual (or simulating the potential) effects internally and then correcting actual (or preventing the potential) effects externally (ie. error correction).
Eg. mechanistic interpretability is essentially about “detecting the actual (or simulating the potential) effects internally” of AI.
The only way to actually (slightly) counteract AGI convergence on causing “instrumental” and “needed” effects within a more complex environment is to simulate/detect and then prevent/correct those environmental effects (ie. error correction).
~ ~ ~
Which brings us back to why error correction methods, of any kind and in any combination, cannot ensure long-term AGI Safety.
I reread your original post and Christiano’s comment to understand your reasoning better and see how I could limits of applicability of error correction methods.
I also messaged Forrest (the polymath) to ask for his input.
The messages were of a high enough quality that I won’t bother rewriting the text. Let me copy-paste the raw exchange below (with few spelling edits).
Remmelt 15:37
@Forrest, would value your thoughts on the way Carl Schulman is thinking about error correcting code, perhaps to pass on on the LessWrong Forum:
(https://www.lesswrong.com/posts/uFNgRumrDTpBfQGrs/let-s-think-about-slowing-down-ai?commentId=bY87i5v5StH9FWdWy).
Remmelt 15:38
Remmelt:
“As another example [of unsound monolithic reasoning], your idea of Von Neuman Probes with error correcting codes, referred to by Christiano here (https://www.lesswrong.com/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic?commentId=Jaf9b9YAARYdrK3jp), cannot soundly work for AGI code (as self-learning new code for processing inputs into outputs, and as introducing errors through interactions with the environment that cannot be detected and corrected). This is overdetermined. An ex-Pentagon engineer has spelled out the reasons to me. See a one-page summary by me here.”
Carl Shulman:
”This is overstating what role error-correcting codes play in that argument. They mean the same programs can be available and evaluate things for eons (and can evaluate later changes with various degrees of learning themselves), but don’t cover all changes that could derive from learning (although there are other reasons why those could be stable in preserving good or terrible properties).”
Remmelt 15:40
Excerpting from the comment by Christiano I link to above:
”The production-web has no interest in ensuring that its members value production above other ends, only in ensuring that they produce (which today happens for instrumental reasons). If consequentialists within the system intrinsically value production it’s either because of single-single alignment failures (i.e. someone who valued production instrumentally delegated to a system that values it intrinsically) or because of new distributed consequentialism distinct from either the production web itself or any of the actors in it, but you don’t describe what those distributed consequentialists are like or how they come about.
You might say: investment has to converge to 100% since people with lower levels of investment get outcompeted. But this it seems like the actual efficiency loss required to preserve human values seems very small even over cosmological time (e.g. see Carl on exactly this question: http://reflectivedisequilibrium.blogspot.com/2012/09/spreading-happiness-to-stars-seems.html).
And more pragmatically, such competition most obviously causes harm either via a space race and insecure property rights, or war between blocs with higher and lower savings rates (some of them too low to support human life, which even if you don’t buy Carl’s argument is really still quite low, conferring a tiny advantage). If those are the chief mechanisms then it seems important to think/talk about the kinds of agreements and treaties that humans (or aligned machines acting on their behalf!) would be trying to arrange in order to avoid those wars.”
Remmelt 15:41
And Carl Schulman’s original post on long-term error-correcting Von Neumann Probes:
(http://reflectivedisequilibrium.blogspot.com/2012/09/spreading-happiness-to-stars-seems.html):
”But the program of an AI, large stores of astronomical observations for navigation, and vast stores of technological information would take up an enormous amount of memory and storage space, perhaps many exabytes or more. Given this large body of information, adding additional directives to ensure that the probes eventually turn to producing welfare need only increase storage needs by a very small proportion, e.g. by 1 in 1 billion. Directives could directly specify the criteria to be eventually optimized, or could simply require compliance with further orders traveling behind the frontier of colonization.
...
Mutation is easier to resist for computers than animals
Biological life on Earth has evolved through mutation, and the reproductive process introduces significant errors in each generation. However, digital information storage allows for the comparison of redundant copies and the use of error-correcting codes, making substantive mutation many orders of magnitude less likely than in Earthly life.”
Remmelt 15:45
Returning to the new comment by Carl Schulman:
”This is overstating what role error-correcting codes play in that argument. They mean the same programs can be available and evaluate things for eons (and can evaluate later changes with various degrees of learning themselves)”
Remmelt 15:46
Thinking about this overnight, I think Carl’s stated reasoning is still unsound for multiple reasons:
1. Contradiction between a Von Neummann Probe being adaptable enough (ie. learning new ways of processing inputs into outputs) to travel across space and seed new civilisations, yet have error correcting code that allows comparison of new code with original redundant copies. Not going to work, for reasons Forrest amply explained and I tried to summarise here: https://docs.google.com/document/d/1-AAhqvgFNx_MlLkcSgw-chvmFoC4EZ4LmTl1IWcsqEA/edit
Ooh, and in Forrest’s AGI Error Correction post: https://mflb.com/ai_alignment_1/agi_error_correction_psr.html#p1
Think I’ll share that.
Remmelt 15:54
2. Confuses complicated pre-loaded technological knowledge/systems with complex adaptive systems. The fact that they are saying that adding in directives would only increase storage by 1 part in 1 billion parts is a giveaway, I think.
Remmelt 15:55
3. Inverse take on 1.
Algorithms which can flexibly ‘mutate’ and branch out into different versions become better at using resources and multiplying than more rigid or robustly functional designs. This makes Carl Schulman’s case for launching out self-replicating space probes with code error-checking/minimisation routines seem a lot more dicey. If a defecting group launches even one alternate design with a flexible code-mutating ability that confers an advantage that can’t easily be copied by the error-minimising designs without compromising on their ability to act on the directives humans originally coded in to ‘directly specify the criteria to be eventually optimized’ – well, then you might end up instead with swarms of spaceprobes that eat up the galaxy indiscriminately, including any remaining carbon-based lifeforms on planet Earth.
Underlying premise: even if humans construct a long-term aligned AI design – where humans can formally prove a model to causally constrain any possible process of agency emerging from and expanding across each of the physical parts in which this model infers its computational process to be embedded to stay within all fundamental bounds necessary for maintaining alignment with the values that humans broadly share in common – then in practice that design is ‘one step away’ from getting mutated into misalignment by a faction of humans who seek a capability advantage (does it give one though?) for manifesting their more granular personal values.
Remmelt 15:57
@Forrest anything to add to the above? How about I write that into a comment reply (which perhaps might make for a stand-alone post later?)
Forrest 16:25
The starting point is: That any procedure of AGI alignment will
(Ie, cannot not) resemble some sort of error correction algorithm
Forrest 16:28
This point cannot be disagreed with and still have the notion of alignment be persistent. If they argue here, they will need to establish a conformable coherent counter example. Simple contradiction is not enough, as the claim basis shifts to them.
Forrest 16:30
Once hard line equivalence is firmly established, then the argument re limits of causation in all error correction processes apply.
Remmelt 16:44
Good one. So start with how any AGI-alignment method must resemble an error correction algorithm in order to correct out misaligned behaviour learned and expressed in interaction with the environment.
Forrest 16:48
In regards to the idea that some machine could “have a strategy of optimizing for near-maximum X until condition Z is achieved, and *then* implement directive Q” makes at least three assumptions that are highly unlikely.
1; there is no assurance that in the long duration of the optimization of X that there would be persistence of directive Q.
2; there is no assurance that the condition ‘Z’ is consistently detectable, or that the triggering condition is even probable to being implemented, even if detected (as per a violation of assumption 1).
3; the integration coupling of directive Q is mismatched to the sheer mass of the output of the optimized code—ie, it is unlikely that ‘Q’, no matter how formed, will actually result in the consistent behavior described. Ie, ‘Q’ is significantly underspecified, and if ‘Q’ was appropriately specified, it would not be “lightweight” with respect to the optimization process ‘X’, and would have been optimized away to ineffectiveness. Hence, even if Z condition trigger happens, Q does not happen.
Note insofar as the logic is directly generalized over the logical claim form of “adopt a strategy of near-maximum colonization speed until they reach the most distant accessible galaxies, and only then divert resources to producing welfare”, the proposal itself is not well specified. Ie, the assumption that ‘Q’ and detection of ‘Z’ can be “simple” is a conceptual category error.
Remmelt 17:02
I read through this, and think I understand your points.
Forrest 17:10
Yes. I am trying to think of a good metaphor. It is maybe a bit like some single ordinary person trying to command everyone to do something in some consistent way, when that _everyone_ is 100000 people all yelling at one another in some huge auditorium somewhere. The assumption that one person, even at maximum personal signal volume and intensity, could be heard over all similar other people (in this case, the other objectives of the Von Neumann probe), and command them to implement some new action pattern reliably consistent with ‘Q’, is simply deeply mismatched to the output reality of the optimization process ‘X’. The single directive ‘Q’ is simply not powerful enough to fully conditionalize all of ‘X’, even under triggering conditions ‘Z’.
Also, I notice that the assumptions that the error correction process could be applied linearly to a dynamic self recursive system at arbitrary levels of abstraction is simply another category error. Wrong tool for the wrong job. That is less important, though, than the issue with the Q directive arbitrarily efficient effectivity mismatch.
Forrest 17:37
Also, I added the following document to assist in some of what you are trying to do above: https://mflb.com/ai_alignment_1/tech_align_error_correct_fail_psr.html#p1
This echos something I think I sent previously, but I could not find it in another doc, so I added it.
I addressed claims of similar forms at least 3 times times already on separate occasions (including in the post itself).
Suggest reading this: https://www.lesswrong.com/posts/bkjoHFKjRJhYMebXr/the-limited-upside-of-interpretability?commentId=wbWQaWJfXe7RzSCCE
“The fact that mechanistic interpretability can possibly be used to detect a few straightforwardly detectable misalignment of the kinds you are able to imagine right now does not mean that the method can be extended to detecting/simulating most or all human-lethal dynamics manifested in/by AGI over the long term.
If AGI behaviour converges on outcomes that result in our deaths through less direct routes, it really does not matter much whether the AI researcher humans did an okay job at detecting “intentional direct lethality” and “explicitly rendered deception”.”
This is like saying there’s no value to learning about and stopping a nuclear attack from killing you because you might get absolutely no benefit from not being killed then, and being tipped off about a threat trying to kill you, because later the opponent might kill you with nanotechnology before you can prevent it.
Removing intentional deception or harm greatly increases the capability of AIs that can be worked with without getting killed, to further improve safety measures. And as I said actually being able to show a threat to skeptics is immensely better for all solutions, including relinquishment, than controversial speculation.
No, it’s not like that.
It’s saying that if you can prevent a doomsday device from being lethal in some ways and not in others, then it’s still lethal. Focussing on some ways that you feel confident that you might be able to prevent the doomsday device from being lethal is IMO distracting dangerously from the point, which is that people should not built the doomsday device in the first place.
If mechanistic interpretability methods cannot prevent that interactions of AGI necessarily converge on total human extinction beyond theoretical limits of controllability, it means that these (or other “inspect internals”) methods cannot contribute to long-term AGI safety. And this is not idle speculation, nor based on prima facie arguments. It is based on 15 years of research by a polymath working outside this community.
In that sense, it would not really matter that mechanistic interpretability can do an okay job at detecting that a power-seeking AI was explicitly plotting to overthrow humanity.
That is, except for the extremely unlikely case you pointed to that such intentions are detected and on time, and humans all coordinate at once to impose an effective moratorium on scaling or computing larger models. But this is actually speculation, whereas that OpenAI promoted Olah’s fascinating Microscope-generated images as them making progress on understanding and aligning scalable ML models is not speculation.
Overall, my sense is that mechanistic interpretability is used to align-wash capability progress towards AGI, while not contributing to safety where it predominantly matters.
Exactly this kind of thinking is what I am concerned about. It implicitly assumes that you have a (sufficiently) comprehensive and sound understanding of the ways humans would get killed at a given level of capability, and therefore can rely on that understanding to conclude that capabilities of AIs can be greatly increased without humans getting killed.
How do you think capability developers would respond to that statement? Will they just stay on the safe side, saying “Well those alignment researchers say that mechanistic interpretability helps remove intentional deception or harm, but I’m just going to stay on the safe side and not scale any further”. No, they are going to use your statement to promote the potential safety of their scalable models, and remove whatever safety margin they can justify themselves taking and feel justified taking for themselves.
Not considering unknown unknowns is going to get us killed. Not considering what safety problems may be unsolvable is going to get us killed.
Age-old saying: “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.”
Sorry if I missed it earlier in the thread, but who is this “polymath”?
Forrest Landry.
From Math Expectations, a depersonalised post Forrest wrote of his impressions of a conversation with a grant investigator where the grant investigator kept looping back on the expectation that a “proof” based on formal reasoning must be written in mathematical notation. We did end up receiving the $170K grant.
I usually do not mention Forrest Landry’s name immediately for two reasons:
If you google his name, he comes across like a spiritual hippie. Geeks who don’t understand his use of language take that as a cue that he must not know anything about computational science, mathematics or physics (wrong – Forrest has deep insights into programming methods and eg. why Bell’s Theorem is a thing) .
Forrest prefers to work on the frontiers of research, rather than repeating himself in long conversations with tech people who cannot let go off their own mental models and quickly jump to motivated counterarguments that he heard and addressed many times before. So I act as a bridge-builder, trying to translate between Forrest speak and Alignment Forum speak.
Both of us prefer to work behind the scenes. I’ve only recently started to touch on the arguments in public.
You can find those arguments elaborated on here.
Warning: large inferential distance; do message clarifying questions – I’m game!
No, it’s not like that.
It’s saying that if you can prevent a doomsday device from being lethal in some ways and not in others, then it’s still lethal. Focussing on some ways that you feel confident that you might be able to prevent the doomsday device from being lethal is IMO distracting dangerously from the point, which is that people should not built the doomsday device in the first place.
As requested by Remmelt I’ll make some comments on the track record of privacy advocates, and their relevance to alignment.
I did some active privacy advocacy in the context of the early Internet in the 1990s, and have been following the field ever since. Overall, my assessment is that the privacy advocacy/digital civil rights community has had both failures and successes. It has not succeeded (yet) in its aim to stop large companies and governments from having all your data. On the other hand, it has been more successful in its policy advocacy towards limiting what large companies and governments are actually allowed to do with all that data.
The digital civil rights community has long promoted the idea that Internet based platforms and other computer systems must be designed and run in a way that is aligned with human values. In the context of AI and ML based computer systems, this has led to demands for AI fairness and transparency/explainability that have also found their way into policy like the GDPR, legislation in California, and the upcoming EU AI Act. AI fairness demands have influenced the course of AI research being done, e.g. there has been research on defining it even means for an AI model to be fair, and on making models that actually implement this meaning.
To a first approximation, privacy and digital rights advocates will care much more about what an ML model does, what effect its use has on society, than about the actual size of the ML model. So they are not natural allies for x-risk community initiatives that would seek a simple ban on models beyond a certain size. However, they would be natural allies for any initiative that seeks to design more aligned models, or to promote a growth of research funding in that direction.
To make a comment on the premise of the original post above: digital rights activists will likely tell you that, when it comes to interventions on AI research, speculating about the tractability of ‘slowing down AI research’ is misguided. What you really should be thinking about is changing the direction of AI research.
This is insightful for me, thank you!
Also, I stand corrected then on my earlier comment on that privacy and digital ownership advocates would/should care about models being trained on their own/person-tracking data such to restrict the scaling of models. I’m guessing I was not tracking well then what people in at least the civil rights spaces Koen moves around in are thinking and would advocate for.
This is a very spicy take, but I would (weakly) guess that a hypothetical ban on ML trainings that cost more than $10M would make AGI timelines marginally shorter rather than longer, via shifting attention and energy away from scaling and towards algorithm innovation.
Very interesting! Recently, US started to regulate export of computing power to China. Do you expect this to speed up AGI timeline in China, or do you expect regulation to be ineffective, or something else?
Reportedly, NVIDIA developed A800, which is just A100, to keep the letter but probably not the spirit of the regulation. I am trying to follow closely how A800 fares, because it seems to be an important data point on feasibility of regulating computing power.
I strongly agree with Steven about this. Personally, I expect it’ll be non-impactful in either direction. I think the majority of research groups already have sufficient compute available to make dangerous algorithmic progress, and they are not so compute-resource-rich that their scaling efforts are distracting them from more dangerous pursuits. I think the groups who would be more dangerous if they weren’t ‘resource drunk’ are mainly researchers at big companies.
I think the two camps are less orthogonal than your examples of privacy and compute reg portray. There’s room for plenty of excellent policy interventions that both camps could work together to support. For instance, increasing regulatory requirements for transparency on algorithmic decision-making (and crucially, building a capacity both in regulators and in the market supporting them to enforce this) is something that I think both camps would get behind (the xrisk one because it creates demand for interpretability and more and the other because eg. it’s easier to show fairness issues) and could productively work on together. I think there are subculture clash reasons the two camps don’t always get on, but that these can be overcome, particularly given there’s a common enemy (misaligned powerful AI). See also this paper Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society I know lots of people who are uncertain about how big the risks are, and care about both problems, and work on both (I am one of these—I care more about AGI risk, but I think the best things I can do to help avert it involve working with the people you think aren’t helpful).
Seems reason regarding public policy. But what about
1. private funders of AGI-relevant research
2. researchers doing AGI-relevant research?
Seems like there’s a lot of potential reframings that make it more feasible to separate safe-ish research from non-safe-ish research. E.g. software 2.0: we’re not trying to make a General Intelligence, we’re trying to replace some functions in our software with nets learned from data. This is what AlphaFold is like, and I assume is what ML for fusion energy is like. If there’s a real category like this, a fair amount of the conflict might be avoidable?
Most AI companies and most employees there seem not to buy risk much, and to assign virtually no resources to address those issues. Unilaterally holding back from highly profitable AI when they won’t put a tiny portion of those profits into safety mitigation again looks like an ask out of line with their weak interest. Even at the few significant companies with higher percentages of safety effort, it still looks to me like the power-weighted average of staff is extremely into racing to the front, at least to near the brink of catastrophe or until governments buy risks enough to coordinate slowdown.
So asks like investing in research that could demonstrate problems with higher confidence, or making models available for safety testing, or similar still seem much easier to get from those companies than stopping (and they have reasonable concerns that their unilateral decision might make the situation worse by reducing their ability to do helpful things, while regulatory industry-wide action requires broad support).
As with government, generating evidence and arguments that are more compelling could be super valuable, but pretending you have more support than you do yields incorrect recommendations about what to try.
Can anyone say confident why? Is there one reason that predominates, or several? Like it’s vaguely something about status, money, power, acquisitive mimesis, having a seat at the table… but these hypotheses are all weirdly dismissive of the epistemics of these high-powered people, so either we’re talking about people who are high-powered because of the managerial revolution (or politics or something), or we’re talking about researchers who are high-powered because they’re given power because they’re good at research. If it’s the former, politics, then it makes sense to strongly doubt their epistemics on priors, but we have to ask, why can they meaningfully direct the researchers who are actually good at advancing capabilities? If it’s the latter, good researchers have power, then why are their epistemics suddenly out the window here? I’m not saying their epistemics are actually good, I’m saying we have to understand why they’re bad if we’re going to slow down AI through this central route.
There are a lot of pretty credible arguments for them to try, especially with low risk estimates for AI disempowering humanity, and if their percentile of responsibility looks high within the industry.
One view is that the risk of AI turning against humanity is less than the risk of a nasty eternal CCP dictatorship if democracies relinquish AI unilaterally. You see this sort of argument made publicly by people like Eric Schmidt, and ‘the real risk isn’t AGI revolt, it’s bad humans’ is almost a reflexive take for many in online discussion of AI risk. That view can easily combine with the observation that there has been even less takeup of AI safety in China thus far than in liberal democracies, and mistrust of CCP decision-making and honesty, so it also reduces accident risk.
With respect to competition with other companies in democracies, some labs can correctly say that they have taken action that signals they are more into taking actions towards safety or altruistic values (including based on features like control by non-profit boards or % of staff working on alignment), and will have vastly more AI expertise, money, and other resources to promote those goals in the future by locally advancing AGI, e.g. OpenAI reportedly has a valuation of over $20B now and presumably more influence over the future of AI and ability to do alignment work than otherwise. Whereas some sitting on the sidelines may lack financial and technological/research influence when it is most needed. And, e.g. the OpenAI charter has this clause:
Then there are altruistic concerns about the speed of AI development. E.g. over 60 million people die every year, almost all of which could be prevented by aligned AI technologies. If you think AI risk is very low, then current people’s lives would be saved by expediting development even if risk goes up some.
And of course there are powerful non-altruistic interests in enormous amounts of money, fame, and personally getting to make a big scientific discovery.
Note that the estimate of AI risk magnitude, and the feasibility of general buy-in on the correct risk level, recurs over and over again, and so credible assessments and demonstrations of large are essential to making these decisions better.
Thank you, this seems like a high-quality steelman (I couldn’t judge if it passes an ITT).
Taking an extreme perspective here: do future generations of people not alive and who no one alive now would meet have any value?
One perspective is no they don’t. From that perspective “humanity” continues only as some arbitrary random numbers from our genetics. Even Clippy probably keeps at least one copy of the human genome in a file somewhere so it’s the same case.
That is, there is no difference between the outcomes of:
we delay AI a few generations and future generations of humanity take over the galaxy
we fall to rampant AIs and their superintelligent descendants take over the galaxy
If you could delay AI long enough you would be condemning the entire population of the world to death from aging, or essentially the same case where the rampant AI kills the entire world.
Carl S.
One view is that the risk of AI turning against humanity is less than the risk of a nasty eternal CCP dictatorship if democracies relinquish AI unilaterally. You see this sort of argument made publicly by people like Eric Schmidt, and ‘the real risk isn’t AGI revolt, it’s bad humans’ is almost a reflexive take for many in online discussion of AI risk. That view can easily combine with the observation that there has been even less takeup of AI safety in China thus far than in liberal democracies, and mistrust of CCP decision-making and honesty, so it also reduces accident risk.
My thought: seems like a convincing demonstration of risk could be usefully persuasive.
I’ll make an even stronger statement: So long as the probabilities of a technological singularity isn’t too low, they can still rationally keep working on it even if they know the risk is high, because the expected utility is much greater still.
This comment employs an oddly common failure mode of ignoring intermediate successes that align with market incentives, like “~N% of AI companies stop publishing their innovations on Arxiv for free”.
Those are good points. There are some considerations that go in the other direction. Sometimes it’s not obvious what’s a “failure to convince people” vs. “a failure of some people to be convincible.” (I mean convincible by object-level arguments as opposed to convincible through social cascades where a particular new view reaches critical mass.)
I believe both of the following:
Persuasion efforts haven’t been exhausted yet: we can do better at reaching not-yet-safety-concerned AI researchers. (That said, I think it’s at least worth considering that we’re getting close to exhausting low-hanging fruit?)
Even so, “persuasion as the main pillar of a strategy” is somewhat likely to be massively inadequate because it’s difficult to change the minds and culture of humans in general (even if they’re smart), let alone existing organizations.
Another point that’s maybe worth highlighting is that the people who could make large demands don’t have to be the same people who are best-positioned for making smaller asks. (This is Katja’s point about there not being a need for everyone to coordinate into a single “we.”) The welfarism vs. abolitionism debate in animal advocacy and discussion of the radical flank effect seems related. I also agree with a point lc makes in his post on slowing down AI. He points out that there’s arguably a “missing mood” around the way most people in EA and the AI alignment community communicate with safety-unconcerned researchers. The missing sense of urgency probably lowers the chance of successful persuasion efforts?
Lastly, it’s a challenge that there’s little consensus in the EA research community around important questions like “How hard is AI alignment?,” “How hard is alignment conditional on <5 years to TAI?,” and “How long are TAI timelines?” (Though maybe there’s quite some agreement on the second one and the answer is at least, “it’s not easy?”)
I’d imagine there would at least be quite a strong EA expert consensus on the following conditional statement (which has both normative and empirical components):
Based on this, some further questions one could try to estimate are:
How many people (perhaps weighted by their social standing within an organization, opinion leaders, etc.) are convincible of the above conditional statement? Is it likely we could reach a critical mass?
Doing this for any specific org (or relevant branch of government, etc.) that seems to play a central role
What’s the minimum consensus threshold for “We’re in Inconvenient World?” (I.e., what percentage would be indefensibly low to believe in light of peer disagreement unless one considers oneself the world’s foremost authority on the question?)
Sorry for responding very late, but it’s basically because contra the memes, most LWers do not agree with Eliezer’s views on how doomed we are. This is very much a fringe viewpoint on LW, not the mainstream.
So the missing mood is basically because most of LW doesn’t share Eliezer’s views on certain cruxes.