CEO of Convergence, an x-risk research and impact organization.
David_Kristoffersson
AI Clarity: An Initial Research Agenda
Announcing Convergence Analysis: An Institute for AI Scenario & Governance Research
Thanks for this post Ben. I think a lot of what you’re saying here could alternatively be filed under “Taking ideas seriously”: the dedication to follow through with the consequences of ideas, even if their conclusions are unorthodox or uncomfortable.
I would reckon: no single AI safety method “will work” because no single method is enough by itself. The idea expressed in the post would not “solve” AI alignment, but I think it’s a thought-provoking angle on part of the problem.
Weber again: “And, so in light of this historical view, we need to remember that bureaucracy, taken as it is, is just an instrument of precision that can be put to service by purely political, economic, or any other dominating or controlling interest. Therefore the simultaneous development of democratization and bureaucratization should not be exaggerated, no matter how typical the phenomena may be.” Yikes, okay, it seems like Weber understood the notion the orthogonality thesis.”
Isn’t this interesting, Weber’s point is similar to the orthogonality thesis. This makes me realize a wider implication: the orthogonality thesis is actually very similar to the general argument that “technological progress is good” vs “no it isn’t necessarily”.
Weber: democratization isn’t given from bureaucreatization
Orthogonality thesis: Intelligence and morality are orthogonal.
Technological caution argument: More powerful technology isn’t by default a good thing for us.I’m especially interested in constrasting orthogonality to technological caution. I’d like to express them in a common form. Intelligence is capability. Technology generally is capability. Morality = what is good. More capability dropped into parts of a society isn’t necessarily a good thing. Be it that part of society is an AI, a human, a social system or a socio-technical system.
This is a generalization of the orthogonality thesis and the technological caution argument, assuming that AI gets embedded in society (which should be assumed).
Thanks Justin! This is an interesting perspective. I’d enjoy seeing a compilation of different perspectives on ensuring AI alignment. (Another recurrent example would be the cybersecurity perspective on AI safety.)
Bureaucratization is the ultimate specific means to turn a mutually agreed upon community action rooted in subjective feeling into action rooted in a rational agreement by mutual consent.
This sounds a lot like the general situation of creating moral or judicial systems for a society. (When it works well.)
The principle of fixed competencies The principle of hierarchically organized positions
Interestingly, these may go counter to Agile-associated practices and some practices I would consider generally good. It seems to be good to cultivate specialities, but also to cultivate some breadth in competencies. And to nurture bottom-up flows! Hierarchy has its limitations.
I quite like the concept of alignment through coherence between the “coherence factors”!
“Wisdom” has many meanings. I would use the word differently to how the article is using it.
I think the healthy and compassionate response to this article would be to focus on addressing the harms victims have experienced. So I find myself disappointed by much of the voting and comment responses here.
I agree that the Bloomberg article doesn’t acknowledge that most of the harms that they list have been perpetrated by people who have already mostly been kicked out of the community, and uses some unfair framings. But I think the bigger issue is that of harms experienced by women that may not have been addressed: that of unreported cases, and of insufficient measures taken against reported ones. I don’t know if enough has been done, so it seems unwise to minimize the article and people who are upset about the sexual misconduct. And even if enough has been done in terms of responses and policy, I would prefer seeing more compassion.
I think I agree with your technological argument, but I’d take your 6 months and 2.5 years and multiply them by a factor of 2-4x.
Party of it is likely that we are conceiving the scenarios a bit differently. I might be including some additional practical considerations.
Yes, that’s most of the 2-5%.
Thank you for this post, Max.
My background here:
I’ve watched the Ukraine war very closely since it started.
I’m not at all familiar with nuclear risk estimations.
Summary: I wouldn’t give 70% for WW3/KABOOM from conventional NATO retaliation. I would give that 2-5% in this moment (I spent little time thinking about the precise number).
Motivation: I think conventional responses from NATO will cause Russia to generally back down. I think Putin wants to use the threat of nukes, not actually use them.
Even when cornered yet further, I expect Putin to assess that firing off nukes will make his situation even worse. Nuclear conflict would be an immense direct threat against himself and Russia, and the threat of nuclear conflict also increases the risk of people on the inside targeting him (because they don’t want to die). Authoritarians respect force. A NATO response would be a show of force.
Putin has told the Russian public in the past that Russia couldn’t win against NATO directly. Losing against NATO actually gives him a more palatable excuse: NATO is too powerful. Losing against Ukraine though, their little sibling, would be very humiliating. Losing in a contest of strength against someone supposedly weaker is almost unacceptable to authoritarians.
I think the most likely outcome is that Putin is deterred from firing a tactical nuke. And if he does fire one, NATO will respond conventionally (such as taking out the Black sea fleet), and this will cause Russia to back down in some manner.
The amount of effort going into AI as a whole ($10s of billions per year) is currently ~2 orders of magnitude larger than the amount of effort going into the kind of empirical alignment I’m proposing here, and at least in the short-term (given excitement about scaling), I expect it to grow faster than investment into the alignment work.
There’s a reasonable argument (shoutout to Justin Shovelain) that the risk is that work such as this done by AI alignment people will be closer to AGI than the work done by standard commercial or academic research, and therefore accelerate AGI more than average AI research would. Thus, $10s of billions per year into general AI is not quite the right comparison, because little of that money goes to matters “close to AGI”.
That said, on balance, I’m personally in favor of the work this post outlines.
Unfortunately, there is no good ‘where to start’ guide for anti-aging. This is insane, given this is the field looking for solutions to the biggest killer on Earth today.
Low hanging fruit intervention: Create a public guide to that effect on a web site.
That being said, I would bet that one would be able to find other formalisms that are equivalent after kicking down the door...
At least, we’ve now hit one limit in the shape of universal computation: No new formalism will be able to do something that couldn’t be done with computers. (Unless we’re gravely missing something about what’s going on in the universe...)
When it comes to the downside risk, it’s often that there are more unknown unknown that produce harm then positive unknown unknown. People are usually biased to overestimate the positive effects and underestimate the negative effects for the known unknown.
This seems plausible to me. Would you like to expand on why you think this is the case?
The asymmetry between creation and destruction? (I.e., it’s harder to build than it is to destroy.)
Very good point! The effect of not taking an action depends on what the counterfactual is: what would happen otherwise/anyway. Maybe the article should note this.
Excellent comment, thank you! Don’t let the perfect be the enemy of the good if you’re running from an exponential growth curve.
Looks promising to me. Technological development isn’t by default good.
Though I agree with the other commenters that this could fail in various ways. For one thing, if a policy like this is introduced without guidance on how to analyze the societal implications, people will think of wildly different things. ML researchers aren’t by default going to have the training to analyze societal consequences. (Well, who does? We should develop better tools here.)
I agree with the general shape of your argument, including that Cotra and Carlsmith are likely to overestimate the compute of the human brain, and that frontier algorithms are not as efficient as algorithms could be.
But I disagree that it will happen this quickly. :)