to be clear I think he’s not implementing good strategy in the face of the technical strategic landscape. some of the things he’s suggesting that sound awful to me may be good strategy if he phrased them in less misleading ways. but he’s doing the thing where you’re only required to be honest with the high detail interpretation of your sentence but allowed to mislead structurally, which is a thing that shows up in public communications from high skill technical people who don’t consider the vibes interpretation to be valid at all.
you say that, but this post is terrible for convincing people, and I knew it would be as I wrote it, hopefully that’s quite obvious. I continue to not be sure what part of my brain’s model of the way the world works as a whole is relevant to why I think his approach doesn’t work, I don’t even seem to be able to express a first order approximation. It just seems so obvious—which might mean I’m wrong, it might mean I deeply understand something to the point that I no longer know what the beginner explanation is, it might mean I’m bouncing off thinking about this in detail because the relevant models have an abort() in the paths I need to reason about this. actually, that last one sounds likely… hmm. eg, an approximate model that has validity guards so I don’t crash my social thinking? I guess?
like, yud is going around pissing off the acc folks in unnecessary ways. I think it’s possible to have better concentration of focus on what ways he irritates them—he’s not going to stop irritating most of them while making his points, but. but. idk.
Part of the problem might be twitter. If you’re on twitter, you are subject to the agency of the twitter recommender, which wants to upvote you when you say things that generate conflict. if you as a human do RL on twitter, you will be RL trained by the twitter algo to do … <the bad thing he’s doing>. but he did it long before twitter, too, it’s just particularly important now.
See my post AI scares and changing public beliefs for one theory of exactly why what Yudkowsky is doing is a bad idea. I was of course primarily thinking of his approach when writing about polarization.
The other post I’ve been contemplating writing is “An unrecognized goddamn principle of fucking rational discourse: be fucking nice”. Yudkowsky talks down to people. That’s not nice, and it makes them emotionally want to prove him wrong instead of want to find ways to agree with him.
I should clarify that being right and convincing people are right are NOT orthogonal here on less wrong. If you can explain why you’re sure you’re right here, it will convince people you’re right. Writing posts like this one is a way to draw people to a worthy project here.
I think you’re right and I think talking about this here is the right way to make sure that’s true and figure out what to collectively do about this issue.
No, he’s not right at all. That extends to a lot of pessimists on AI, but he is not, in fact right, even if he uses strong language and is confident in an outcome.
Why is it not okay? Is it because he should be signaling more that he knows that most other people wouldn’t justifiedly have enough confidence (yet) to make the same tradeoffs he’s advocating for? I think it makes sense to advocate for making tradeoffs even if others wouldn’t yet agree; convincing them would be much of the point of advocating.
he’s burning respectability that those who are actually making progress on his worries need. he has catastrophically broken models of social communication and is saying sentences that don’t mean the same thing when parsed even a little bit inaccurately. he is blaming others for misinterpreting him when he said something confusing. etc.
like he has an important point and I feel like there’s something in the way he’s making it that is interfering with its reception
maybe including to me? tammy just pointed out that maybe he’s making one of his old core points, you either align completely or align not at all; which, like, yep, agreed! how encode it better? what’s going on with how he explains it that is making it break? Why didn’t I know that instantly when writing this post? (not that that’s an easy question for others, or even me now, since the me who wrote this post is in the past)
like, something is going on here and it feels like it’s crashing a bunch of us. can someone who understands how these crashes work help debug
or maybe this post is useless to keep working on, because other posts have been made since that make much more direct and coherent contributions. If so, I’d appreciate a comment saying so explicitly; if I am convinced of that, I will 1. invert my own strong upvote into a strong downvote of OP, and 2. edit saying I no longer think it’s reasonable for this post to have zero karma and people are quite welcome to downvote it.
Thank you for inspiring me to write this!
😅 yup this isn’t a good post, tried to clarify that I very much know that and am asking for help, not saying something I already know
But he’s right?
to be clear I think he’s not implementing good strategy in the face of the technical strategic landscape. some of the things he’s suggesting that sound awful to me may be good strategy if he phrased them in less misleading ways. but he’s doing the thing where you’re only required to be honest with the high detail interpretation of your sentence but allowed to mislead structurally, which is a thing that shows up in public communications from high skill technical people who don’t consider the vibes interpretation to be valid at all.
Being right and being good at convincing people you’re right are not orthogonal, but they’re closer than we’d like to think.
you say that, but this post is terrible for convincing people, and I knew it would be as I wrote it, hopefully that’s quite obvious. I continue to not be sure what part of my brain’s model of the way the world works as a whole is relevant to why I think his approach doesn’t work, I don’t even seem to be able to express a first order approximation. It just seems so obvious—which might mean I’m wrong, it might mean I deeply understand something to the point that I no longer know what the beginner explanation is, it might mean I’m bouncing off thinking about this in detail because the relevant models have an abort() in the paths I need to reason about this. actually, that last one sounds likely… hmm. eg, an approximate model that has validity guards so I don’t crash my social thinking? I guess?
like, yud is going around pissing off the acc folks in unnecessary ways. I think it’s possible to have better concentration of focus on what ways he irritates them—he’s not going to stop irritating most of them while making his points, but. but. idk.
Part of the problem might be twitter. If you’re on twitter, you are subject to the agency of the twitter recommender, which wants to upvote you when you say things that generate conflict. if you as a human do RL on twitter, you will be RL trained by the twitter algo to do … <the bad thing he’s doing>. but he did it long before twitter, too, it’s just particularly important now.
See my post AI scares and changing public beliefs for one theory of exactly why what Yudkowsky is doing is a bad idea. I was of course primarily thinking of his approach when writing about polarization.
The other post I’ve been contemplating writing is “An unrecognized goddamn principle of fucking rational discourse: be fucking nice”. Yudkowsky talks down to people. That’s not nice, and it makes them emotionally want to prove him wrong instead of want to find ways to agree with him.
I should clarify that being right and convincing people are right are NOT orthogonal here on less wrong. If you can explain why you’re sure you’re right here, it will convince people you’re right. Writing posts like this one is a way to draw people to a worthy project here.
I think you’re right and I think talking about this here is the right way to make sure that’s true and figure out what to collectively do about this issue.
No, he’s not right at all. That extends to a lot of pessimists on AI, but he is not, in fact right, even if he uses strong language and is confident in an outcome.
Why is it not okay? Is it because he should be signaling more that he knows that most other people wouldn’t justifiedly have enough confidence (yet) to make the same tradeoffs he’s advocating for? I think it makes sense to advocate for making tradeoffs even if others wouldn’t yet agree; convincing them would be much of the point of advocating.
he’s burning respectability that those who are actually making progress on his worries need. he has catastrophically broken models of social communication and is saying sentences that don’t mean the same thing when parsed even a little bit inaccurately. he is blaming others for misinterpreting him when he said something confusing. etc.
https://mobile.twitter.com/jachiam0/status/1641867859751239681
https://mobile.twitter.com/lovetheusers/status/1641989542092713987
in contrast, good safety communication:
https://mobile.twitter.com/soundboy/status/1641789276445630465
https://mobile.twitter.com/liron/status/1641928889072238592
https://mobile.twitter.com/anthrupad/status/1641997798131265536
Hm. You may be right. Maybe picking a few sentences or a paragraph or two from the TIME article or his tweets, and rewriting them, would help clarify.
Oh nice, yudkowsky’s ted talk seems to not have the problem I was trying to figure out. hooray!
like it’s not just from prosaic that I’m worried, I am also worried he’s miscommunicating the pr of agent foundations too
like he has an important point and I feel like there’s something in the way he’s making it that is interfering with its reception
maybe including to me? tammy just pointed out that maybe he’s making one of his old core points, you either align completely or align not at all; which, like, yep, agreed! how encode it better? what’s going on with how he explains it that is making it break? Why didn’t I know that instantly when writing this post? (not that that’s an easy question for others, or even me now, since the me who wrote this post is in the past)
like, something is going on here and it feels like it’s crashing a bunch of us. can someone who understands how these crashes work help debug
or maybe this post is useless to keep working on, because other posts have been made since that make much more direct and coherent contributions. If so, I’d appreciate a comment saying so explicitly; if I am convinced of that, I will 1. invert my own strong upvote into a strong downvote of OP, and 2. edit saying I no longer think it’s reasonable for this post to have zero karma and people are quite welcome to downvote it.