I think the way the issue is framed matters a lot. If it’s a “populist” framing (“elites are in it for themselves, they can’t be trusted”), that frame seems to have resonated with a segment of the right lately. Climate change has a sanctimonious frame in American politics that conservatives hate.
Ebenezer Dukakis
It looks like the comedian whose clip you linked has a podcast:
https://www.joshjohnsoncomedy.com/podcasts
I don’t see any guests in their podcast history, but maybe someone could invite him on a different podcast? His website lists appearances on other podcasts. I figure it’s worth trying stuff like this for VoI.
I think people should emphasize more the rate of improvement in this technology. Analogous to early days of COVID—it’s not where we are that’s worrisome; it’s where we’re headed.
For humans acting very much not alone, like big AGI research companies, yeah that’s clearly a big problem.
How about a group of superbabies that find and befriend each other? Then they’re no longer acting alone.
I don’t think the problem is about any of the people you listed having too much brainpower.
I don’t think problems caused by superbabies would look distinctively like “having too much brainpower”. They would look more like the ordinary problems humans have with each other. Brainpower would be a force multiplier.
(I feel we’re somewhat talking past each other, but I appreciate the conversation and still want to get where you’re coming from.)
Thanks. I mostly just want people to pay attention to this problem. I don’t feel like I have unique insight. I’ll probably stop commenting soon, since I think I’m hitting the point of diminishing returns.
I think this project should receive more red-teaming before it gets funded.
Naively, it would seem that the “second species argument” matches much more strongly to the creation of a hypothetical Homo supersapiens than it does to AGI.
We’ve observed many warning shots regarding catastrophic human misalignment. The human alignment problem isn’t easy. And “intelligence” seems to be a key part of the human alignment picture. Humans often lack respect or compassion for other animals that they deem intellectually inferior—e.g. arguing that because those other animals lack cognitive capabilities we have, they shouldn’t be considered morally relevant. There’s a decent chance that Homo supersapiens would think along similar lines, and reiterate our species’ grim history of mistreating those we consider our intellectual inferiors.
It feels like people are deferring to Eliezer a lot here, which seems unjustified given how much strategic influence Eliezer had before AI became a big thing, and how poorly things have gone (by Eliezer’s own lights!) since then. There’s been very little reasoning transparency in Eliezer’s push for genetic enhancement. I just don’t see why we’re deferring to Eliezer so much as a strategist, when I struggle to name a single major strategic success of his.
There’s a good chance their carbon children would have about the same attitude towards AI development as they do. So I suspect you’d end up ruled by their silicon grandchildren.
These are incredibly small peanuts compared to AGI omnicide.
The jailbreakability and other alignment failures of current AI systems are also incredibly small peanuts compared to AGI omnicide. Yet they’re still informative. Small-scale failures give us data about possible large-scale failures.
You’re somehow leaving out all the people who are smarter than those people, and who were great for the people around them and humanity? You’ve got like 99% actually alignment or something
Are you thinking of people such as Sam Altman, Demis Hassabis, Elon Musk, and Dario Amodei? If humans are 99% aligned, how is it that we ended up in a situation where major lab leaders look so unaligned? MIRI and friends had a fair amount of influence to shape this situation and align lab leaders, yet they appear to have failed by their own lights. Why?
When it comes to AI alignment, everyone on this site understands that if a “boxed” AI acts nice, that’s not a strong signal of actual friendliness. The true test of an AI’s alignment is what it does when it has lots of power and little accountability.
Maybe something similar is going on for humans. We’re nice when we’re powerless, because we have to be. But giving humans lots of power with little accountability doesn’t tend to go well.
Looking around you, you mostly see nice humans. That could be because humans are inherently nice. It could also be because most of the people around you haven’t been given lots of power with little accountability.
Dramatic genetic enhancement could give enhanced humans lots of power with little accountability, relative to the rest of us.
[Note also, the humans you see while looking around are strongly selected for, which becomes quite relevant if the enhancement technology is widespread. How do you think you’d feel about humanity if you lived in Ukraine right now?]
Which, yes, we should think about this, and prepare and plan and prevent, but it’s just a totally totally different calculus from AGI.
I want to see actual, detailed calculations of p(doom) from supersmart humans vs supersmart AI, conditional on each technology being developed. Before charging ahead on this, I want a superforecaster-type person to sit down, spend a few hours, generate some probability estimates, publish a post, and request that others red-team their work. I don’t feel like that is a lot to ask.
Humans are very far from fooming.
Tell that to all the other species that went extinct as a result of our activity on this planet?
I think it’s possible that the first superbaby will be aligned, same way it’s possible that the first AGI will be aligned. But it’s far from a sure thing. It’s true that the alignment problem is considerably different in character for humans vs AIs. Yet even in this particular community, it’s far from solved—consider Brent Dill, Ziz, Sam Bankman-Fried, etc.
Not to mention all of history’s great villains, many of whom believed themselves to be superior to the people they afflicted. If we use genetic engineering to create humans which are actually, massively, undeniably superior to everyone else, surely that particular problem is only gonna get worse. If this enhancement technology is going to be widespread, we should be using the history of human activity on this planet as a prior. Especially the history of human behavior towards genetically distinct populations with overwhelming technological inferiority. And it’s not pretty.
So yeah, there are many concrete details which differ between these two situations. But in terms of high-level strategic implications, I think there are important similarities. Given the benefit of hindsight, what should MIRI have done about AI back in 2005? Perhaps that’s what we should be doing about superbabies now.
Altman and Musk are arguably already misaligned relative to humanity’s best interests. Why would you expect smarter versions of them to be more aligned? That only makes sense if we’re in an “alignment by default” world for superbabies, which is far from obvious.
If you look at the grim history of how humans have treated each other on this planet, I don’t think it’s justified to have a prior that this is gonna go well.
I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement.
Humans didn’t have the potential for runaway self-improvement relative to apes. That was little comfort for the apes.
This is starting to sound a lot like AI actually. There’s a “capabilities problem” which is easy, an “alignment problem” which is hard, and people are charging ahead to work on capabilities while saying “gee, we’d really like to look into alignment at some point”.
Can anyone think of alignment-pilled conservative influencers besides Geoffrey Miller? Seems like we could use more people like that...
Maybe we could get alignment-pilled conservatives to start pitching stories to conservative publications?
Likely true, but I also notice there’s been a surprising amount of drift of political opinions from the left to the right in recent years. The right tends to put their own spin on these beliefs, but I suspect many are highly influenced by the left nonetheless.
Some examples of right-coded beliefs which I suspect are, to some degree, left-inspired:
-
“Capitalism undermines social cohesion. Consumerization and commoditization are bad. We’re a nation, not an economy.”
-
“Trans women undermine women’s rights and women’s spaces. Motherhood, and women’s dignity, must be defended from neoliberal profit motives.”
-
“US foreign policy is controlled by a manipulative deep state that pursues unnecessary foreign interventions to benefit elites.”
-
“US federal institutions like the FBI are generally corrupt and need to be dismantled.”
-
“We can’t trust elites. They control the media. They’re out for themselves rather than ordinary Americans.”
-
“Your race, gender, religion, etc. are some of the most important things about you. There’s an ongoing political power struggle between e.g. different races.”
-
“Big tech is corrosive for society.”
-
“Immigration liberalization is about neoliberal billionaires undermining wages for workers like me.”
-
“Shrinking the size of government is not a priority. We should make sure government benefits everyday people.”
-
Anti-semitism, possibly.
One interesting thing has been seeing the left switch to opposing the belief when it’s adopted by the right and takes a right-coded form. E.g. US institutions are built on white supremacy and genocide, fundamentally institutionally racist, backed by illegitimate police power, and need to be defunded/decolonized/etc… but now they are being targeted by DOGE, and it’s a disaster!
(Note that the reverse shift has also happened. E.g. Trump’s approaches to economic nationalism, bilateral relations w/ China, and contempt for US institutions were all adopted by Biden by some degree.)
So yeah, my personal take is that we shouldn’t worry about publication venue that much. Just avoid insulting anyone, and make your case in a way which will appeal to the right (e.g. “we need to defend our traditional way of being human from AI”). If possible, target center-leaning publications like The Atlantic over explicitly progressive publications like Mother Jones.
-
I think the National Review is the most prestigious conservative magazine in the US, but there are various others. City Journal articles have also struck me as high-quality in the past. I think Coleman Hughes writes for them, and he did a podcast with Eliezer Yudkowsky at one point.
However, as stated in the previous link, you should likely work your way up and start by pitching lower-profile publications.
The big one probably has to do with being able to corrupt the metrics so totally that whatever you think you made them unlearn actually didn’t happen, or just being able to relearn the knowledge so fast that unlearning doesn’t matter
I favor proactive approaches to unlearning which prevent the target knowledge from being acquired in the first place. E.g. for gradient routing, if you can restrict “self-awareness and knowledge of how to corrupt metrics” to a particular submodule of the network during learning, then if that submodule isn’t active, you can be reasonably confident that the metrics aren’t currently being corrupted. (Even if that submodule sandbags and underrates its own knowledge, that should be fine if the devs know to be wary of it. Just ablate that submodule whenever you’re measuring something that matters, regardless of whether your metrics say it knows stuff!)
Unlearning techniques should probably be battle-tested in low-stakes “model organism” type contexts, where metrics corruption isn’t expected.
while I wouldn’t call it the best
Curious what areas you are most excited about!
Regarding articles which target a popular audience such as How AI Takeover Might Happen in 2 Years, I get the sense that people are preaching to the choir by posting here and on X. Is there any reason people aren’t pitching pieces like this to prestige magazines like The Atlantic or wherever else? I feel like publishing in places like that is a better way to shift the elite discourse, assuming that’s the objective. (Perhaps it’s best to pitch to publications that people in the Trump admin read?)
Here’s an article on pitching that I found on the EA Forum by searching. I assume there are lots more tips on pitching online if you search.
- Feb 13, 2025, 4:17 AM; 6 points) 's comment on George Ingebretsen’s Shortform by (
I think unlearning could be a good fit for automated alignment research.
Unlearning could be a very general tool to address a lot of AI threat models. It might be possible to unlearn deception, scheming, manipulation of humans, cybersecurity, etc. I challenge you to come up with an AI safety failure story that can’t, in principle, be countered through targeted unlearning in some way, shape, or form.
Relative to some other kinds of alignment research, unlearning seems easy to automate, since you can optimize metrics for how well things have been unlearned.
I like this post.
Chinas has alienated virtually all its neighbours
That sounds like an exaggeration? My impression is that China has OK/good relations with countries such as Vietnam, Cambodia, Pakistan, Indonesia, North Korea, factions in Myanmar. And Russia, of course. If you’re serious about this claim, I think you should look at a map, make a list of countries which qualify as “neighbors” based purely on geographic distance, then look up relations for each one.
What I think is more likely than EA pivoting is a handful of people launch a lifeboat and recreate a high integrity version of EA.
Thoughts on how this might be done:
-
Interview a bunch of people who became disillusioned. Try to identify common complaints.
-
For each common complaint, research organizational psychology, history of high-performing organizations, etc. and brainstorm institutional solutions to address that complaint. By “institutional solutions”, I mean approaches which claim to e.g. fix an underlying bad incentive structure, so it won’t require continuous heroic effort to address the complaint.
-
Combine the most promising solutions into a charter for a new association of some kind. Solicit criticism/red-teaming for the charter.
-
Don’t try to replace EA all at once. Start small by aiming at a particular problem present in EA, e.g. bad funding incentives, criticism (it sucks too hard to both give and receive it), or bad feedback loops in the area of AI safety. Initially focus on solving that particular problem, but also build in the capability to scale up and address additional problems if things are going well.
-
Don’t market this as a “replacement for EA”. There’s no reason to have an adversarial relationship. When describing the new thing, focus on the specific problem which was selected as the initial focus, plus the distinctive features of the charter and the problems they are supposed to solve.
-
Think of this as an experiment, where you’re aiming to test one or more theses about what charter content will cause organizational outperformance.
I think it would be interesting if someone put together a reading list on high-performing organizations, social movement history, etc. etc. I suspect this is undersupplied on the current margin, compared with observing and theorizing about EA as it exists now. Without any understanding of history, you run the risk of being a “general fighting the last war”—addressing the problems EA has now, but inadvertently introduce a new set of problems. Seems like the ideal charter would exist in the intersection of “inside view says this will fix EA’s current issues” and “outside view says this has worked well historically”.
A reading list might be too much work, but there’s really no reason not to do an LLM-enabled literature review of some kind, at the very least.
I also think a reading list for leadership could be valuable. One impression of mine is that “EA leaders” aren’t reading books about how to lead, research on leadership, or what great leaders did.
-
The possibility for the society-like effect of multiple power centres creating prosocial incentives on the projects
OpenAI behaves in a generally antisocial way, inconsistent with its charter, yet other power centers haven’t reined it in. Even in the EA and rationalist communities, people don’t seem to have asked questions like “Is the charter legally enforceable? Should people besides Elon Musk be suing?”
If an idea is failing in practice, it seems a bit pointless to discuss whether it will work in theory.
I’m not sure about that, does Bernie Sanders rhetoric set off that detector?