It’s now been 2.5 years. I think this resolves negatively?
WilliamKiely
Around 50% within 2 years or over all time?
Thanks for the clarification. My conclusion is that I think your emoji was meant to signal disagreement with the claim that ‘opaque vector reasoning makes a difference’ rather than a thing I believe.
I had rogue AIs in mind as well, and I’ll take your word on “for catching already rogue AIs and stopping them, opaque vector reasoning doesn’t make much of a difference”.
Why do you think that?
Don’t the mountain of posts on optimization pressure explain why ending with “U3 was up a queen and was a giga-grandmaster and hardly needed the advantage. Humanity was predictably toast” is actually sufficient? In other words, doesn’t someone who understands all the posts on optimization pressure not need the rest of the story after the “U3 was up a queen” part to understand that the AIs could actually take over?
If you disagree, then what do you think the story offers that makes it a helpful concrete example for people who both are skeptical that AIs can take over and already understand the posts on optimization pressure?
Ryan disagree-reacted to the bold part of this sentence in my comment above and I’m not sure why: “This tweet predicts two objections to this story that align with my first and third bullet point (common objections) above.”
This seems pretty unimportant to gain clarity on, but I’ll explain my original sentence more clearly anyway:
For reference, my third bullet point was the common objection: “How would humanity fail to notice this and/or stop this?”
To my mind, someone objecting that the story is unrealistic because “there’s no reason why OpenAI would ever let the model do its thinking steps in opaque vectors instead of written out in English” (as stated in the tweet) is an objection of the form “humanity wouldn’t fail to stop AI from sneakily engaging in power-seeking behavior by thinking in opaque vectors.” Like it’s a “sure, AI could takeover if humanity were dumb like that, but there’s no way OpenAI would be dump like that.”
It seems like Ryan was disagreeing with this with his emoji, but maybe I misunderstood it.
Good point. At the same time, I think the underlying cruxes that lead people to being skeptical of the possibility that AIs could actually take over are commonly:
Why would an AI that well-intentioned human actors create be misaligned and motivated to takeover?
How would such an AI go from existing on computer servers to acquiring power in the physical world?
How would humanity fail to notice this and/or stop this?
I mention these points because people who mention these objections typically wouldn’t raise these objections to the idea of an intelligent alien species invading Earth and taking over.
People generally have no problem granting that aliens may not share our values, may have actuators / the ability to physically wage war against humanity, and could plausibly overpower us with their superior intellect and technological know-how.
Providing a detailed story of what a particular alien takeover process might look like then isn’t actually necessarily helpful to addressing the objections people raise about AI takeover.
I’d propose that authors of AI takeover stories should therefore make sure that they aren’t just describing aspects of a plausible AI takeover story that could just as easily be aspects of an alien takeover story, but are instead actually addressing peoples’ underlying reasons for being skeptical that AI could take over.
This means doing things like focusing on explaining:
what about the future development of AIs leads to the development of powerful agentic AIs with misaligned goals where takeover could be a plausible instrumental subgoal,
how the AIs initially acquire substantial amounts of power in the physical world,
how they do the above either without people noticing or without people stopping them.
(With this comment I don’t intend to make a claim about how well the OP story does these things, though that could be analyzed. I’m just making a meta point about what kind of description of a plausible AI takeover scenario I’d expect to actually engage with the actual reasons for disagreement of the people who say “can the AIs actually take over”.)
Edited to add: This tweet predicts two objections to this story that align with my first and third bullet point (common objections) above:
It was a good read, but the issue most people are going to have with this is how U3 develops that misalignment in its thoughts in the first place.
That, plus there’s no reason why OpenAI would ever let the model do its thinking steps in opaque vectors instead of written out in English, as it is currently
Thanks for the story. I found the beginning the most interesting.
U3 was up a queen and was a giga-grandmaster and hardly needed the advantage. Humanity was predictably toast.
I think ending the story like this is actually fine for many (most?) AI takeover stories. The “point of no return” has already occurred at this point (unless the takeover wasn’t highly likely to be successful), and so humanity’s fate is effectively already sealed even though the takeover hasn’t happened yet.
What happens leading up to the point of no return is the most interesting part because it’s the part where humanity can actually still make a difference to how the future goes.
After the point of no return, I primarily want to know what the (now practically inevitable) AI takeover implies for the future: does it mean near-term human extinction, or a future in which humanity is confined to Earth, or a managed utopia, etc?
Trying to come up with a detailed concrete plausible story of what the actual process of takeover actually looks like isn’t as interesting seeming (at least to me). So I would have preferred more detail and effort put into the beginning of the story explaining how humanity managed to fail to stop the creation of a powerful agentic AI that would takeover rather than see as much detail and effort put into imagining how the takeover actually happens.
Specifically I’m targeting futures that are at my top 20th percentile of rate of progress and safety difficulty.
Does this mean that you think AI takeover within 2 years is at least 20% likely?
Or are there scenarios where progress is even faster and safety is even more difficult than illustrated in your story and yet humanity avoids AI takeover?
The fake names are a useful reminder and clarification that it’s fiction.
I’m going to withdraw from this comment thread since I don’t think my further participation is a good use of time/energy. Thanks for sharing your thoughts and sorry we didn’t come to agreement.
I agree that that would be evidence of OP being more curious. I just don’t think that given what OP actually did it can be said that she wasn’t curious at all.
Thanks for the feedback, Holly. I really don’t want to accuse the OP of making a personal attack if OP’s intent was to not do that, and the reality is that I’m uncertain and can see a clear possibility that OP has no ill will toward Kat personally, so I’m not going to take the risk by making the accusation. Maybe my being on the autism spectrum is making me oblivious or something, in which case sorry I’m not able to see things as you seem them, but this is how I’m viewing the situation.
Hey Holly, great points about PETA.
I left one comment replying to a critical comment this post got saying that it wasn’t being charitable (which turned into a series of replies) and now I find myself in a position (a habit?) of defending the OP from potentially-insufficiently-charitable criticisms. Hence, when I read your sentence...
There’s a missing mood here—you’re not interested in learning if Kat’s strategy is effective at AI Safety.
...my thought is: Are you sure? When I read the post I remember reading:
But if it’s for the greater good, maybe I should just stop being grumpy.
But honestly, is this content for the greater good? Are the clickbait titles causing people to earnestly engage? Are peoples’ minds being changed? Are people thinking thoughtfully about the facts and ideas being presented?
This series of questions seems to me like it’s wondering whether Kat’s strategy is effective at AI safety, which is the thing you’re saying it’s not doing.
(I just scrolled up on my phone and saw that OP actually quoted this herself in the comment you’re replying to. (Oops. I had forgotten this as I had read that comment yesterday.))
Sure, the OP is also clearly venting about her personal distaste for Kat’s posts, but it seems to me that she is also asking the question that you say she isn’t interested in: are Kat’s posts actually effective?
(Side note: I kind of regret leaving any comments on this post at all. It doesn’t seem like the post did a good job encouraging a fruitful discussion. Maybe OP and anyone else who wants to discuss the topic should start fresh somewhere else with a different context. Just to put an idea out there: Maybe it’d be a more productive use of everyone’s energy for e.g. OP, Kat, and you Holly to get on a call together and discuss what sort of content is best to create and promote to help the cause of AI safety, and then (if someone was interested in doing so) write up a summary of your key takeaways to share.)
by putting her name in the headline (what I meant by name-calling)
Gotcha, that’s fair.
If it wasn’t meant to tarnish her reputation, why not instead make the post about just her issues with the disagreeable content?
I can think of multiple possible reasons. E.g. If OP sees a pattern of several bad or problematic posts, it can make sense to go above the object-level criticisms of those posts and talk about the meta-level questions.
but the standard you’ve set for determining intent is as naive as
Maybe, but in my view accusing someone of making personal attacks is a serious thing, so I’d rather be cautious, have a high bar of evidence, and take an “innocent until proven guilty” approach. Maybe I’ll be too charitable in some cases and fail to condemn someone for making a personal attack, but that’s worth it to avoid making the opposite mistake: accusing someone of making a personal attack who was doing no such thing.
because it’s fun to do
That stated fun motivation did bother me. Obviously given that people feel the post is attacking Kat personally making the post for fun isn’t a good enough reason. However, I do also see the post as raising legitmate questions about whether the sort of content that Kat produces and promotes a lot of is actually helping to raise the quality of discourse on EA and AI safety, etc, so it’s clearly not just a post for fun. The OP seemed to be fustrated and venting when writing the post, resulting in it having an unnecessarily harsh tone. But I don’t think this makes it amount to bullying.
Why don’t you hold yourself to a higher standard?
I try to. I guess we just disagree about which kind of mistake (described above) is worse. In the face of uncertainty, I think it’s better to caution on the side of not mistakenly accusing someone of bullying and engaging in a personal attack than on the side of mistakenly being too charitable and failing to call out someone who actually said something mean (especially when there are already a lot of other people in the comments like you doing that).
And yes, it was meant to tarnish her reputation because, well, did you not read the headline of the post?
[...]
But what drove reputation change here much more significantly is browsing name calling Kat in the headline
The headline I see is “Everywhere I Look, I See Kat Woods”. What name is this calling Kat? Am I missing something?
And why do you think that you can infer that the OP’s intent was to tarnish Kat’s peraonal reputation from that headline? That doesn’t make any sense.
Anyway, I don’t know the OP, but I’m confident in saying that the information here is not sufficient to conclude she was making a personal attack.
If she said that was her intent, I’d change my mind. Or if she said something that was unambiguously a personal attack I’d change my mind, but at the moment I see no reason not to read the post as well meaning innocent criticism.
And that’s why I think this is so inappropriate for this forum.
I also don’t think it’s very appropriate for this forum (largely because the complaint is not about Kat’s posting stlye on LessWrong). I didn’t downvote it because it seemed like it had already received a harsh enough reaction, but I didn’t upvote it either.
Herego, his actions expressly are meant to tarnish her reputation.
OP is a woman not a man.
How much funding did OpenAI provide EpochAI?
Or, how much funding do you expect to receive in total from OpenAI for FrontierMath if you haven’t received all funding yet?
I don’t think you’re being charitable. There is an important difference between a personal attack and criticism of the project someone is engaging in. My reading of the OP is that it’s the latter, while I understand you to be accusing the OP of the former.
He’s a dick politician (but a great husband)?
“Dick” is a term used for personal attacks.
If you said “He’s a bad politician; He’s a good husband and good man, and I know he’s trying to do good, but his policies are causing harm to the world” so we really shouldn’t support Pro-America Joe (or whatever—assume Pro-America is a cause we support and we just don’t agree with the way Joe goes about trying to promote America) then I’d say yes, that’s how we criticize Pro-America Joe without attacking him as a person.
expressly attempts to tarnish someone’s reputation
I don’t think that’s accurate. The OP clearly states:
One upfront caveat. I am speaking about “Kat Woods” the public figure, not the person. If you read something here and think, “That’s not a true/nice statement about Kat Woods”, you should know that I would instead like you to think “That’s not a true/nice statement about the public persona Kat Woods, the real human with complex goals who I’m sure is actually really cool if I ever met her, appears to be cultivating.”
I was happy to see the progression in what David Silver is saying re what goals AGIs should have:
David Silver, April 10, 2025 (from 35:33 of DeepMind podcast episode Is Human Data Enough? With David Silver):
David Silver: And so what we need is really a way to build a system which can adapt and which can say, well, which one of these is really the important thing to optimize in this situation. And so another way to say that is, wouldn’t it be great if we could have systems where, you know, a human maybe specifies, what they want, but that gets translated into, a set of different numbers that the system can then optimize for itself completely autonomously.
Hannah Fry: So, okay, an example then let’s say I said, okay, I want to be healthier this year. And that’s kind of a bit nebulous, a bit fuzzy. But what you’re saying here is that that can be translated into a series of metrics like resting heart rate or BMI or whatever it might be. And a combination of those metrics could then be used as a reward for reinforcement learning that, if I understood that correctly?
Silver: Absolutely correctly.
Fry: Are we talking about one metric, though? Are we talking about a combination here?
Silver: The general idea would be that you’ve got one thing which the human wants like two optimize my health. And and then the system can learn for itself. Like which rewards help you to be healthier. And so that can be like a combination of numbers that adapts over time. So it could be that it starts off saying, okay, well, you know, right now it’s your resting heart rate that really matters. And then later you might get some feedback saying, hang on. You know, I really don’t just care about that, I care about my anxiety level or something. And then that includes that into the mixture. And and based on on on feedback it could actually adapt. So one way to say this is that a very small amount of human data can allow the system to generate goals for itself that enable a vast amount of learning from experience.
Fry: Because this is where the real questions of alignment come in, right? I mean, if you said, for instance, let’s do a reinforcement learning algorithm that just minimizes my resting heart rate. I mean, quite quickly, zero is is like a good minimization strategy that which would achieve its objective, just not maybe quite in the way that you wanted it to. I mean, obviously you really want to avoid that kind of scenario. So how do you have confidence that the metrics that you’re choosing aren’t creating additional problems?
Silver: One way you can do this is to leverage the the same answer, which has been so effective so far elsewhere in AI, which is at that level, you can make use of of some human input. If it’s a human goal that we’re optimizing, then we probably at that level need to measure, you know, and say, well, you know, human gives feedback to say, actually, you know, I’m starting to feel uncomfortable. And in fact, while I don’t want to claim that we have the answers, and I think there’s an enormous amount of research to get this right and make sure that this kind of thing is safe, it could actually help in certain ways in terms of this kind of safety and adaptation. There’s this famous example of paving over the whole world with paperclips when, a system’s been asked to make as many paperclips as possible. If you have a system which which is really its overall goal is to, you know, support human, well-being. And, and it gets that feedback from humans about and it understands their, their distress signals and their happiness signals and so forth. The moment it starts to, you know, do create too many paperclips and starts to cause people distress, it would adapt that that combination and it would choose a different combination and start to optimize for something which isn’t going to pave over the world with paperclips. We’re not there yet. Yeah, but I think there are some, some versions of this which could actually end up not only addressing some of the alignment issues that have been faced by previous approaches to, you know, goal focused systems that maybe even, you know, be, be more adaptive and therefore safer than what we have today.