Thanks for the clarification. My conclusion is that I think your emoji was meant to signal disagreement with the claim that ‘opaque vector reasoning makes a difference’ rather than a thing I believe.
I had rogue AIs in mind as well, and I’ll take your word on “for catching already rogue AIs and stopping them, opaque vector reasoning doesn’t make much of a difference”.
WilliamKiely
Why do you think that?
Don’t the mountain of posts on optimization pressure explain why ending with “U3 was up a queen and was a giga-grandmaster and hardly needed the advantage. Humanity was predictably toast” is actually sufficient? In other words, doesn’t someone who understands all the posts on optimization pressure not need the rest of the story after the “U3 was up a queen” part to understand that the AIs could actually take over?
If you disagree, then what do you think the story offers that makes it a helpful concrete example for people who both are skeptical that AIs can take over and already understand the posts on optimization pressure?
Ryan disagree-reacted to the bold part of this sentence in my comment above and I’m not sure why: “This tweet predicts two objections to this story that align with my first and third bullet point (common objections) above.”
This seems pretty unimportant to gain clarity on, but I’ll explain my original sentence more clearly anyway:
For reference, my third bullet point was the common objection: “How would humanity fail to notice this and/or stop this?”
To my mind, someone objecting that the story is unrealistic because “there’s no reason why OpenAI would ever let the model do its thinking steps in opaque vectors instead of written out in English” (as stated in the tweet) is an objection of the form “humanity wouldn’t fail to stop AI from sneakily engaging in power-seeking behavior by thinking in opaque vectors.” Like it’s a “sure, AI could takeover if humanity were dumb like that, but there’s no way OpenAI would be dump like that.”
It seems like Ryan was disagreeing with this with his emoji, but maybe I misunderstood it.
Good point. At the same time, I think the underlying cruxes that lead people to being skeptical of the possibility that AIs could actually take over are commonly:
Why would an AI that well-intentioned human actors create be misaligned and motivated to takeover?
How would such an AI go from existing on computer servers to acquiring power in the physical world?
How would humanity fail to notice this and/or stop this?
I mention these points because people who mention these objections typically wouldn’t raise these objections to the idea of an intelligent alien species invading Earth and taking over.
People generally have no problem granting that aliens may not share our values, may have actuators / the ability to physically wage war against humanity, and could plausibly overpower us with their superior intellect and technological know-how.
Providing a detailed story of what a particular alien takeover process might look like then isn’t actually necessarily helpful to addressing the objections people raise about AI takeover.
I’d propose that authors of AI takeover stories should therefore make sure that they aren’t just describing aspects of a plausible AI takeover story that could just as easily be aspects of an alien takeover story, but are instead actually addressing peoples’ underlying reasons for being skeptical that AI could take over.
This means doing things like focusing on explaining:
what about the future development of AIs leads to the development of powerful agentic AIs with misaligned goals where takeover could be a plausible instrumental subgoal,
how the AIs initially acquire substantial amounts of power in the physical world,
how they do the above either without people noticing or without people stopping them.
(With this comment I don’t intend to make a claim about how well the OP story does these things, though that could be analyzed. I’m just making a meta point about what kind of description of a plausible AI takeover scenario I’d expect to actually engage with the actual reasons for disagreement of the people who say “can the AIs actually take over”.)
Edited to add: This tweet predicts two objections to this story that align with my first and third bullet point (common objections) above:
It was a good read, but the issue most people are going to have with this is how U3 develops that misalignment in its thoughts in the first place.
That, plus there’s no reason why OpenAI would ever let the model do its thinking steps in opaque vectors instead of written out in English, as it is currently
Thanks for the story. I found the beginning the most interesting.
U3 was up a queen and was a giga-grandmaster and hardly needed the advantage. Humanity was predictably toast.
I think ending the story like this is actually fine for many (most?) AI takeover stories. The “point of no return” has already occurred at this point (unless the takeover wasn’t highly likely to be successful), and so humanity’s fate is effectively already sealed even though the takeover hasn’t happened yet.
What happens leading up to the point of no return is the most interesting part because it’s the part where humanity can actually still make a difference to how the future goes.
After the point of no return, I primarily want to know what the (now practically inevitable) AI takeover implies for the future: does it mean near-term human extinction, or a future in which humanity is confined to Earth, or a managed utopia, etc?
Trying to come up with a detailed concrete plausible story of what the actual process of takeover actually looks like isn’t as interesting seeming (at least to me). So I would have preferred more detail and effort put into the beginning of the story explaining how humanity managed to fail to stop the creation of a powerful agentic AI that would takeover rather than see as much detail and effort put into imagining how the takeover actually happens.
Specifically I’m targeting futures that are at my top 20th percentile of rate of progress and safety difficulty.
Does this mean that you think AI takeover within 2 years is at least 20% likely?
Or are there scenarios where progress is even faster and safety is even more difficult than illustrated in your story and yet humanity avoids AI takeover?
The fake names are a useful reminder and clarification that it’s fiction.
I’m going to withdraw from this comment thread since I don’t think my further participation is a good use of time/energy. Thanks for sharing your thoughts and sorry we didn’t come to agreement.
I agree that that would be evidence of OP being more curious. I just don’t think that given what OP actually did it can be said that she wasn’t curious at all.
Thanks for the feedback, Holly. I really don’t want to accuse the OP of making a personal attack if OP’s intent was to not do that, and the reality is that I’m uncertain and can see a clear possibility that OP has no ill will toward Kat personally, so I’m not going to take the risk by making the accusation. Maybe my being on the autism spectrum is making me oblivious or something, in which case sorry I’m not able to see things as you seem them, but this is how I’m viewing the situation.
Hey Holly, great points about PETA.
I left one comment replying to a critical comment this post got saying that it wasn’t being charitable (which turned into a series of replies) and now I find myself in a position (a habit?) of defending the OP from potentially-insufficiently-charitable criticisms. Hence, when I read your sentence...
There’s a missing mood here—you’re not interested in learning if Kat’s strategy is effective at AI Safety.
...my thought is: Are you sure? When I read the post I remember reading:
But if it’s for the greater good, maybe I should just stop being grumpy.
But honestly, is this content for the greater good? Are the clickbait titles causing people to earnestly engage? Are peoples’ minds being changed? Are people thinking thoughtfully about the facts and ideas being presented?
This series of questions seems to me like it’s wondering whether Kat’s strategy is effective at AI safety, which is the thing you’re saying it’s not doing.
(I just scrolled up on my phone and saw that OP actually quoted this herself in the comment you’re replying to. (Oops. I had forgotten this as I had read that comment yesterday.))
Sure, the OP is also clearly venting about her personal distaste for Kat’s posts, but it seems to me that she is also asking the question that you say she isn’t interested in: are Kat’s posts actually effective?
(Side note: I kind of regret leaving any comments on this post at all. It doesn’t seem like the post did a good job encouraging a fruitful discussion. Maybe OP and anyone else who wants to discuss the topic should start fresh somewhere else with a different context. Just to put an idea out there: Maybe it’d be a more productive use of everyone’s energy for e.g. OP, Kat, and you Holly to get on a call together and discuss what sort of content is best to create and promote to help the cause of AI safety, and then (if someone was interested in doing so) write up a summary of your key takeaways to share.)
by putting her name in the headline (what I meant by name-calling)
Gotcha, that’s fair.
If it wasn’t meant to tarnish her reputation, why not instead make the post about just her issues with the disagreeable content?
I can think of multiple possible reasons. E.g. If OP sees a pattern of several bad or problematic posts, it can make sense to go above the object-level criticisms of those posts and talk about the meta-level questions.
but the standard you’ve set for determining intent is as naive as
Maybe, but in my view accusing someone of making personal attacks is a serious thing, so I’d rather be cautious, have a high bar of evidence, and take an “innocent until proven guilty” approach. Maybe I’ll be too charitable in some cases and fail to condemn someone for making a personal attack, but that’s worth it to avoid making the opposite mistake: accusing someone of making a personal attack who was doing no such thing.
because it’s fun to do
That stated fun motivation did bother me. Obviously given that people feel the post is attacking Kat personally making the post for fun isn’t a good enough reason. However, I do also see the post as raising legitmate questions about whether the sort of content that Kat produces and promotes a lot of is actually helping to raise the quality of discourse on EA and AI safety, etc, so it’s clearly not just a post for fun. The OP seemed to be fustrated and venting when writing the post, resulting in it having an unnecessarily harsh tone. But I don’t think this makes it amount to bullying.
Why don’t you hold yourself to a higher standard?
I try to. I guess we just disagree about which kind of mistake (described above) is worse. In the face of uncertainty, I think it’s better to caution on the side of not mistakenly accusing someone of bullying and engaging in a personal attack than on the side of mistakenly being too charitable and failing to call out someone who actually said something mean (especially when there are already a lot of other people in the comments like you doing that).
And yes, it was meant to tarnish her reputation because, well, did you not read the headline of the post?
[...]
But what drove reputation change here much more significantly is browsing name calling Kat in the headline
The headline I see is “Everywhere I Look, I See Kat Woods”. What name is this calling Kat? Am I missing something?
And why do you think that you can infer that the OP’s intent was to tarnish Kat’s peraonal reputation from that headline? That doesn’t make any sense.
Anyway, I don’t know the OP, but I’m confident in saying that the information here is not sufficient to conclude she was making a personal attack.
If she said that was her intent, I’d change my mind. Or if she said something that was unambiguously a personal attack I’d change my mind, but at the moment I see no reason not to read the post as well meaning innocent criticism.
And that’s why I think this is so inappropriate for this forum.
I also don’t think it’s very appropriate for this forum (largely because the complaint is not about Kat’s posting stlye on LessWrong). I didn’t downvote it because it seemed like it had already received a harsh enough reaction, but I didn’t upvote it either.
Herego, his actions expressly are meant to tarnish her reputation.
OP is a woman not a man.
How much funding did OpenAI provide EpochAI?
Or, how much funding do you expect to receive in total from OpenAI for FrontierMath if you haven’t received all funding yet?
I don’t think you’re being charitable. There is an important difference between a personal attack and criticism of the project someone is engaging in. My reading of the OP is that it’s the latter, while I understand you to be accusing the OP of the former.
He’s a dick politician (but a great husband)?
“Dick” is a term used for personal attacks.
If you said “He’s a bad politician; He’s a good husband and good man, and I know he’s trying to do good, but his policies are causing harm to the world” so we really shouldn’t support Pro-America Joe (or whatever—assume Pro-America is a cause we support and we just don’t agree with the way Joe goes about trying to promote America) then I’d say yes, that’s how we criticize Pro-America Joe without attacking him as a person.
expressly attempts to tarnish someone’s reputation
I don’t think that’s accurate. The OP clearly states:
One upfront caveat. I am speaking about “Kat Woods” the public figure, not the person. If you read something here and think, “That’s not a true/nice statement about Kat Woods”, you should know that I would instead like you to think “That’s not a true/nice statement about the public persona Kat Woods, the real human with complex goals who I’m sure is actually really cool if I ever met her, appears to be cultivating.”
I have two children, at least one of whom is a boy born on a day that I’ll tell you in 5 minutes.
“[A] boy born on a day that I’ll tell you in 5 minutes” is ambiguous. There are two possible meanings, yielding different answers.
If “a boy born on a day that I’ll tell you in 5 minutes” means “a boy, and I’ll tell you the name of a boy I have in 5 minutes” then the answer is 1⁄3 as Liron says.
However, if “a boy born on a day that I’ll tell you in 5 minutes” means “a boy born on a particular singular day that I just wrote down on this piece of paper and will show you in 5 minutes”, then this is equivalent to saying “a boy born on a Tuesday” and the answer is 13⁄27.
The reason why the second meaning is equivalent to “a boy born on a Tuesday” is because it’s a statement that at least one of the children is a particular kind of boy that only 1/7th of boys are, just like how “a boy born on a Tuesday” is a statement that at least one of the children is a particular kind of boy that only 1/7th of boys are. (Conversely, for the first interpretation: “a boy born on a day that I’ll tell you in 5 minutes” is a statement that at least one of the children is a a boy, period.)
Another way to notice the difference if it’s still not clear:
When told “I have two children, at least one of whom is [a boy born on a particular singular day that I just wrote down on this piece of paper and will show you in 5 minutes]”, you assign a 1/7th credence to the paper showing Sunday, 1/7th to Monday, 1/7th to Tuesday, etc.
Then, conditional on the paper showing Tuesday, you know that the parent just told you “I have two children, at least one of whom is [a boy born on [Tuesday and I will show you the paper showing Tuesday in 5 minutes]]”, which is equivalent to the parent saying “I have two children, at least one of whom is a boy born on Tuesday”.
So you then have a 1/7th credence that the paper shows Tuesday, and if it’s Tuesday, your credence that both children are boys is 13⁄27. So your overall credence, reflecting your uncertainty about what day the paper shows is (13/27)*(1/7)+(13/27)*(1/7)+(13/27)*(1/7)+(13/27)*(1/7)+(13/27)*(1/7)+(13/27)*(1/7)+(13/27)*(1/7)=13/27.- Jan 2, 2025, 11:09 AM; 3 points) 's comment on Practicing Bayesian Epistemology with “Two Boys” Probability Puzzles by (
the listening experience would have been better with another narrator
Perhaps, but I’m liking the narration so far. I find it about as good as your narration of your book, and even perhaps a bit better.
Around 50% within 2 years or over all time?