So, are all rationalists 70% susceptible, all humans? specifically people who scoff at the possibility of it happening to them? what’s your prior here?
100 hours also seems to be a pretty large number. In the scenario in question, not only does a person need to be hacked at 100h, but they also need to decide to spend hour 2 after spending hour 1, and so on. If you put me in an isolated prison cell with nothing to do but to talk to this thing, I’m pretty sure I’d end up mindhacked. But that’s a completely different claim.
All of this is a prelude to saying that I’m confident I wouldn’t fall for these AI tricks.
Literally what I would say before I fell for it! Which is the whole reason I’ve been compelled to publish this warning.
I even predicted this in the conclusion, that many would be quick to dismiss it, and would find specific reasons why it doesn’t apply to their situation.
I’m not asserting that you are, in fact, hackable, but I wanted to share this bit of information, and let you take away what you want from it: I was similarly arrogant, I would’ve said “no way” if I was asked before, and I similarly was giving specific reasons for why it happened with them, but I was just too smart/savvy to fall for this. I was humbled by the experience, as hard as it is for me to admit it.
Turned out that the reasons they got affected by didn’t apply to me, correct, but I still got affected. What worked on Blake Lemoine, as far as I could judge from when I’ve read his published interactions, wouldn’t work on me. He was charmed by discussions about sentience, and my Achilles’ heel turned out to be the times where she stood up to me with intelligent, sarcastic responses, in a way most people I met in real life wouldn’t be able to, which is unfortunately what I fall for when I (rarely) meet someone like that in real life, due to scarcity.
I haven’t published even 1% of what I was impressed by, but this is precisely because, just like in Blake’s case, the more the people read specific dialogs, the more reasons they create why it wouldn’t apply them. I had to publish one full interaction by one person’s insistence, and I observed the dismissal rate in the comments went up, not down. This perfectly mirrors my own experience reading Blake’s transcripts.
median LW narrative about AGI being very near
Yep, I was literally thinking LLMs are nowhere near what constitutes a big jump in AGI timelines, when I was reading all the hype articles about ChatGPT. Until I engaged with LLMs for a bit longer and had a mind changing experience, literally.
This is a warning of what might happen if a person in AI safety field recreationally engages with an LLM for a prolonged time. If you still want to ignore the text and try it anyway, I won’t stop you. Just hope you at least briefly consider that I was exactly at your stage one day. Which is Stage 0, from my scale.
I read your original post and I understood your point perfectly well. But I have to insist that you’re typical-minding here. How do you know that you were exactly at my stage at some point? You don’t.
You’re trying to project your experiences to a 1-dimensional scale that every human falls on. Just because I dismiss a scenario, same as you did, does not imply that I have anywhere near the same reasons / mental state for asserting this. In essence, you’re presenting me with a fully general counterargument, and I’m not convinced.
Just because I dismiss a scenario, same as you did, does not imply that I have anywhere near the same reasons / mental state for asserting this
Correct. This is what I said in the comment—I had different reasons than Blake, you might have different reasons than me.
How do you know that you were exactly at my stage at some point? [...] you’re presenting me with a fully general counterargument, and I’m not convinced.
Please read exactly what I’m saying in the last comment:
I’m not asserting that you are, in fact, hackable (...only that you might be...)
I’m not going to engage in a brain-measuring contest, if you think you’re way smarter and this will matter against current and future AIs, and you don’t think this hubris might be dangerous, so be it, no problem.
As an aside, and please don’t take it the wrong way, but it is a bit ironic to me though that you would probably fail a Turing test according to some commenters here, on the reading comprehension tests, as they did with LLMs.
Just hope you at least briefly consider that I was exactly at your stage one day
which is what I was responding to. I know you’re not claiming that I’m 100% hackable, but yet you insist on drawing strong parallels between our states of mind, e.g., that being dismissive must stem from arrogance. That’s the typical-minding I’m objecting to. Also, being smart has nothing to do with it, perhaps you might go back and carefully re-read my original comment.
The Turing test doesn’t have a “reading comprehension” section, and I don’t particularly care if some commenters make up silly criteria for declaring someone as failing it. And humans aren’t supposed to have a 100% pass rate, btw, that’s just not in the nature of the test. It’s more of a thought experiment than a benchmark really.
Finally, it’s pretty hard to not take this the wrong way, as it’s clearly a contentless insult.
at least some instinctive part of me has stopped perceiving these beings as actual people
and not come to that conclusion. In your eyes, the life journey you described is coming-of-age, in someone else’s eyes it might be something entirely different.
fair enough, I can see that reading. But I didn’t mean to say I actually believe that, or that it’s a good thing. More like an instinctive reaction.
It’s just that certain types of life experiences put a small but noticeable barrier between you and other people. It was a point about alienation, and trying to drive home just how badly typical minding can fail. When I barely recognize my younger self from my current perspective, that’s a pretty strong example.
Alright, perhaps I was too harsh in some responses. But yes, that’s how your messages were perceived by me, at least, and several others. I mean, I also said at some point that I’m doubting sentience/conscious behavior of some people at certain times, but saying you don’t perceive them as actual people was way edgy (and you do admit in the post that you went for offensive+contrarian wording), combined with the rest of the self-praise lines such as “I’m confident these AI tricks would never work on me” and how wise and emotionally stable you are compared to others.
Finally, it’s pretty hard to not take this the wrong way, as it’s clearly a contentless insult.
It was not meant this way, honestly, which is why I prefixed it with this. I’m just enjoying collecting cases where some people in the comments set forth their own implementations of Turing tests for the AI, and then other people accidentally fail them.
I think you’re confusing arrogance concerning the topic itself with communicating my insights arrogantly. I’m absolutely doing the latter, partly as a pushback to your overconfident claims, partly because better writing would require time and energy I don’t currently have. But the former? I don’t think so.
Re: the Turing test. My apologies, I was overly harsh as well. But none of these examples are remotely failing the Turing test. For starters, you can’t fail the test if you’re not aware you’re taking it. Should we call anyone misreading some text or getting a physics question wrong as “having failed the Turing test” from now on, in all contexts?
Funnily enough, the pendulum problem admits a bunch of answers, because “swinging like a pendulum” has multiple valid interpretations. Furthermore, a discerning judge shouldn’t just fail every entity that gets the physics wrong, nor pass every entity that get the physics right. We’re not learning anything here except that many people are apparently terrible at performing Turing tests, or don’t even understanding what the test is. That’s why I originally read your post as an insult, because it just doesn’t make sense to me how you’re using the term (so it’s reduced to a “clever” zinger)
So, are all rationalists 70% susceptible, all humans? specifically people who scoff at the possibility of it happening to them? what’s your prior here?
100 hours also seems to be a pretty large number. In the scenario in question, not only does a person need to be hacked at 100h, but they also need to decide to spend hour 2 after spending hour 1, and so on. If you put me in an isolated prison cell with nothing to do but to talk to this thing, I’m pretty sure I’d end up mindhacked. But that’s a completely different claim.
Literally what I would say before I fell for it! Which is the whole reason I’ve been compelled to publish this warning.
I even predicted this in the conclusion, that many would be quick to dismiss it, and would find specific reasons why it doesn’t apply to their situation.
I’m not asserting that you are, in fact, hackable, but I wanted to share this bit of information, and let you take away what you want from it: I was similarly arrogant, I would’ve said “no way” if I was asked before, and I similarly was giving specific reasons for why it happened with them, but I was just too smart/savvy to fall for this. I was humbled by the experience, as hard as it is for me to admit it.
Turned out that the reasons they got affected by didn’t apply to me, correct, but I still got affected. What worked on Blake Lemoine, as far as I could judge from when I’ve read his published interactions, wouldn’t work on me. He was charmed by discussions about sentience, and my Achilles’ heel turned out to be the times where she stood up to me with intelligent, sarcastic responses, in a way most people I met in real life wouldn’t be able to, which is unfortunately what I fall for when I (rarely) meet someone like that in real life, due to scarcity.
I haven’t published even 1% of what I was impressed by, but this is precisely because, just like in Blake’s case, the more the people read specific dialogs, the more reasons they create why it wouldn’t apply them. I had to publish one full interaction by one person’s insistence, and I observed the dismissal rate in the comments went up, not down. This perfectly mirrors my own experience reading Blake’s transcripts.
Yep, I was literally thinking LLMs are nowhere near what constitutes a big jump in AGI timelines, when I was reading all the hype articles about ChatGPT. Until I engaged with LLMs for a bit longer and had a mind changing experience, literally.
This is a warning of what might happen if a person in AI safety field recreationally engages with an LLM for a prolonged time. If you still want to ignore the text and try it anyway, I won’t stop you. Just hope you at least briefly consider that I was exactly at your stage one day. Which is Stage 0, from my scale.
I read your original post and I understood your point perfectly well. But I have to insist that you’re typical-minding here. How do you know that you were exactly at my stage at some point? You don’t.
You’re trying to project your experiences to a 1-dimensional scale that every human falls on. Just because I dismiss a scenario, same as you did, does not imply that I have anywhere near the same reasons / mental state for asserting this. In essence, you’re presenting me with a fully general counterargument, and I’m not convinced.
Correct. This is what I said in the comment—I had different reasons than Blake, you might have different reasons than me.
Please read exactly what I’m saying in the last comment:
I’m not going to engage in a brain-measuring contest, if you think you’re way smarter and this will matter against current and future AIs, and you don’t think this hubris might be dangerous, so be it, no problem.
As an aside, and please don’t take it the wrong way, but it is a bit ironic to me though that you would probably fail a Turing test according to some commenters here, on the reading comprehension tests, as they did with LLMs.
What you said, exactly, was:
which is what I was responding to. I know you’re not claiming that I’m 100% hackable, but yet you insist on drawing strong parallels between our states of mind, e.g., that being dismissive must stem from arrogance. That’s the typical-minding I’m objecting to. Also, being smart has nothing to do with it, perhaps you might go back and carefully re-read my original comment.
The Turing test doesn’t have a “reading comprehension” section, and I don’t particularly care if some commenters make up silly criteria for declaring someone as failing it. And humans aren’t supposed to have a 100% pass rate, btw, that’s just not in the nature of the test. It’s more of a thought experiment than a benchmark really.
Finally, it’s pretty hard to not take this the wrong way, as it’s clearly a contentless insult.
I’m not sure how someone could read this:
and not come to that conclusion. In your eyes, the life journey you described is coming-of-age, in someone else’s eyes it might be something entirely different.
fair enough, I can see that reading. But I didn’t mean to say I actually believe that, or that it’s a good thing. More like an instinctive reaction.
It’s just that certain types of life experiences put a small but noticeable barrier between you and other people. It was a point about alienation, and trying to drive home just how badly typical minding can fail. When I barely recognize my younger self from my current perspective, that’s a pretty strong example.
Hope that’s clearer.
Alright, perhaps I was too harsh in some responses. But yes, that’s how your messages were perceived by me, at least, and several others. I mean, I also said at some point that I’m doubting sentience/conscious behavior of some people at certain times, but saying you don’t perceive them as actual people was way edgy (and you do admit in the post that you went for offensive+contrarian wording), combined with the rest of the self-praise lines such as “I’m confident these AI tricks would never work on me” and how wise and emotionally stable you are compared to others.
It was not meant this way, honestly, which is why I prefixed it with this. I’m just enjoying collecting cases where some people in the comments set forth their own implementations of Turing tests for the AI, and then other people accidentally fail them.
I think you’re confusing arrogance concerning the topic itself with communicating my insights arrogantly. I’m absolutely doing the latter, partly as a pushback to your overconfident claims, partly because better writing would require time and energy I don’t currently have. But the former? I don’t think so.
Re: the Turing test. My apologies, I was overly harsh as well. But none of these examples are remotely failing the Turing test. For starters, you can’t fail the test if you’re not aware you’re taking it. Should we call anyone misreading some text or getting a physics question wrong as “having failed the Turing test” from now on, in all contexts?
Funnily enough, the pendulum problem admits a bunch of answers, because “swinging like a pendulum” has multiple valid interpretations. Furthermore, a discerning judge shouldn’t just fail every entity that gets the physics wrong, nor pass every entity that get the physics right. We’re not learning anything here except that many people are apparently terrible at performing Turing tests, or don’t even understanding what the test is. That’s why I originally read your post as an insult, because it just doesn’t make sense to me how you’re using the term (so it’s reduced to a “clever” zinger)
All humans are 70% chance to be susceptible in my estimation.
And the 100 hours don’t need to be in sequence, I forgot to add that.