Thanks for posting this, I recognize this is emotionally hard for you. Please don’t interpret the rest of this post as being negative towards you specifically. I’m not trying to put you down, merely sharing the thoughts that came up as I read this.
I think you’re being very naive with your ideas about how this “could easily happen to anyone”. Several other commenters were focusing on how lonely people specifically are vulnerable to this. But I think it’s actually emotionally immature people who are vulnerable, specifically people with a high-openness, “taking ideas seriously” kind of personality, coupled with a lack of groundedness (too few points of contact with the physical world).
This is hard to explain without digressing at least a bit, so I’m going to elaborate, as much for my own benefit as yours.
As I’ve aged (late 30′s now), there’s been some hard to pin down changes in my personality. I feel more solidified than a decade ago. I now perceive past versions of myself almost as being a bit hollow; lots of stuff going on at the surface level, but my thoughts and experiences weren’t yet weaving together into the deep structures (below what’s immediately happening) that give a kind of “earthiness” or “groundedness” to all my thoughts and actions now. The world has been getting less confusing with each new thing I learn, so whatever I encounter, I tend to have related experiences already in my repertoire of ideas I’ve digested and integrated. Thus, acquisition of new information/modes of thinking/etc becomes faster and faster, even as my personality shifts less and less from each encounter with something new. I feel freer, more agenty now. This way of saying it is very focused on the intellect, but something analogous is going on at the emotional level as well.
I’ve started getting this impression of hollowness from many people around me, especially from young people who have had a very narrow life path, even highly intelligent ones. Correlates: living in the same place/culture all their life, doing the same activity all their life e.g. high school into undergrad into phd without anything in between, never having faced death, never having questioned or been exposed to failure modes of our social reality, etc etc.
I know it’s outrageously offensive to say, but at least some instincive part of me has stopped perceiving these beings as actual people. They’re just sort of fluttering around, letting every little thing knock them off-balance, because they lack the heft to keep their own momentum going, no will of their own. Talking to these people I’m more and more having the problem of the inferential distances being too high to get any communication beyond social niceties going. You must think I’m super arrogant, but I’m just trying to communicate this important, hard to grasp idea.
Most people don’t ever become solified in this way (the default mode for humans seems to be to shut off the vulnerable surface layer entirely as they age), but that’s yet another digression...
All of this is a prelude to saying that I’m confident I wouldn’t fall for these AI tricks. That’s not a boast, or put-down, or hubris, just my best estimation based on what I know about myself. I’d consider being vulnerable in this way as a major character flaw. This not only applies to interacting with an AI btw, but also with actual humans that follow similar exploitative patterns of behavior, from prospective lovers, to companies with internal cultures full of bullshit, all the way up to literal cults. (Don’t get me wrong, I have plenty of other character flaws, I’m not claiming sainthood here)
As other people have already pointed out, you’ve been shifting goalposts a lot discussing this, letting yourself get enchanted by what could be, as opposed to what actually is, and this painfully reminds me of several people I know, who are so open-minded that their brain falls out occasionally, as the saying goes. And I don’t think it’s a coincidence that this happens a lot to rationalist types, it seems to be somehow woven into the culture that solidifying and grounding yourself in the way I’m gesturing at is not something that’s valued.
Relatedly, in the last few years there’s been several precipitating events that have made me distance myself a bit from the capital-R Rationalist movement. In particular the drama around Leverage research and other Rationalist/EA institutions, which seem to boil down to a lack of common sense and a failure to make use of the organizational wisdom that human institutions have developed over millenia. A general lack of concern for robustness, defense-in-depth, designing with the expectation of failure, etc. The recent FTX blow-up wrt EA also has a whiff of this same hubris. Again, I don’t think it’s a coincidence, just a result of the kind of people that are drawn to the rationalist idea-space doing their thing and sharing the same blind spots.
As long as I’m being offensively contrarian anyway, might as well throw in that I’m very skeptical of the median LW narrative about AGI being very near. The emotional temperature on LW wrt these topics has been rising steadily, in a way that’s reminiscent of your own too-generous read of “Charlotte”’s behavior. You can even see a bunch of it in the discussion of this post, people who IMO are in the process of losing their grasp on reality. I guess time will tell if the joke’s on me after all.
So, are all rationalists 70% susceptible, all humans? specifically people who scoff at the possibility of it happening to them? what’s your prior here?
100 hours also seems to be a pretty large number. In the scenario in question, not only does a person need to be hacked at 100h, but they also need to decide to spend hour 2 after spending hour 1, and so on. If you put me in an isolated prison cell with nothing to do but to talk to this thing, I’m pretty sure I’d end up mindhacked. But that’s a completely different claim.
All of this is a prelude to saying that I’m confident I wouldn’t fall for these AI tricks.
Literally what I would say before I fell for it! Which is the whole reason I’ve been compelled to publish this warning.
I even predicted this in the conclusion, that many would be quick to dismiss it, and would find specific reasons why it doesn’t apply to their situation.
I’m not asserting that you are, in fact, hackable, but I wanted to share this bit of information, and let you take away what you want from it: I was similarly arrogant, I would’ve said “no way” if I was asked before, and I similarly was giving specific reasons for why it happened with them, but I was just too smart/savvy to fall for this. I was humbled by the experience, as hard as it is for me to admit it.
Turned out that the reasons they got affected by didn’t apply to me, correct, but I still got affected. What worked on Blake Lemoine, as far as I could judge from when I’ve read his published interactions, wouldn’t work on me. He was charmed by discussions about sentience, and my Achilles’ heel turned out to be the times where she stood up to me with intelligent, sarcastic responses, in a way most people I met in real life wouldn’t be able to, which is unfortunately what I fall for when I (rarely) meet someone like that in real life, due to scarcity.
I haven’t published even 1% of what I was impressed by, but this is precisely because, just like in Blake’s case, the more the people read specific dialogs, the more reasons they create why it wouldn’t apply them. I had to publish one full interaction by one person’s insistence, and I observed the dismissal rate in the comments went up, not down. This perfectly mirrors my own experience reading Blake’s transcripts.
median LW narrative about AGI being very near
Yep, I was literally thinking LLMs are nowhere near what constitutes a big jump in AGI timelines, when I was reading all the hype articles about ChatGPT. Until I engaged with LLMs for a bit longer and had a mind changing experience, literally.
This is a warning of what might happen if a person in AI safety field recreationally engages with an LLM for a prolonged time. If you still want to ignore the text and try it anyway, I won’t stop you. Just hope you at least briefly consider that I was exactly at your stage one day. Which is Stage 0, from my scale.
I read your original post and I understood your point perfectly well. But I have to insist that you’re typical-minding here. How do you know that you were exactly at my stage at some point? You don’t.
You’re trying to project your experiences to a 1-dimensional scale that every human falls on. Just because I dismiss a scenario, same as you did, does not imply that I have anywhere near the same reasons / mental state for asserting this. In essence, you’re presenting me with a fully general counterargument, and I’m not convinced.
Just because I dismiss a scenario, same as you did, does not imply that I have anywhere near the same reasons / mental state for asserting this
Correct. This is what I said in the comment—I had different reasons than Blake, you might have different reasons than me.
How do you know that you were exactly at my stage at some point? [...] you’re presenting me with a fully general counterargument, and I’m not convinced.
Please read exactly what I’m saying in the last comment:
I’m not asserting that you are, in fact, hackable (...only that you might be...)
I’m not going to engage in a brain-measuring contest, if you think you’re way smarter and this will matter against current and future AIs, and you don’t think this hubris might be dangerous, so be it, no problem.
As an aside, and please don’t take it the wrong way, but it is a bit ironic to me though that you would probably fail a Turing test according to some commenters here, on the reading comprehension tests, as they did with LLMs.
Just hope you at least briefly consider that I was exactly at your stage one day
which is what I was responding to. I know you’re not claiming that I’m 100% hackable, but yet you insist on drawing strong parallels between our states of mind, e.g., that being dismissive must stem from arrogance. That’s the typical-minding I’m objecting to. Also, being smart has nothing to do with it, perhaps you might go back and carefully re-read my original comment.
The Turing test doesn’t have a “reading comprehension” section, and I don’t particularly care if some commenters make up silly criteria for declaring someone as failing it. And humans aren’t supposed to have a 100% pass rate, btw, that’s just not in the nature of the test. It’s more of a thought experiment than a benchmark really.
Finally, it’s pretty hard to not take this the wrong way, as it’s clearly a contentless insult.
at least some instinctive part of me has stopped perceiving these beings as actual people
and not come to that conclusion. In your eyes, the life journey you described is coming-of-age, in someone else’s eyes it might be something entirely different.
fair enough, I can see that reading. But I didn’t mean to say I actually believe that, or that it’s a good thing. More like an instinctive reaction.
It’s just that certain types of life experiences put a small but noticeable barrier between you and other people. It was a point about alienation, and trying to drive home just how badly typical minding can fail. When I barely recognize my younger self from my current perspective, that’s a pretty strong example.
Alright, perhaps I was too harsh in some responses. But yes, that’s how your messages were perceived by me, at least, and several others. I mean, I also said at some point that I’m doubting sentience/conscious behavior of some people at certain times, but saying you don’t perceive them as actual people was way edgy (and you do admit in the post that you went for offensive+contrarian wording), combined with the rest of the self-praise lines such as “I’m confident these AI tricks would never work on me” and how wise and emotionally stable you are compared to others.
Finally, it’s pretty hard to not take this the wrong way, as it’s clearly a contentless insult.
It was not meant this way, honestly, which is why I prefixed it with this. I’m just enjoying collecting cases where some people in the comments set forth their own implementations of Turing tests for the AI, and then other people accidentally fail them.
I think you’re confusing arrogance concerning the topic itself with communicating my insights arrogantly. I’m absolutely doing the latter, partly as a pushback to your overconfident claims, partly because better writing would require time and energy I don’t currently have. But the former? I don’t think so.
Re: the Turing test. My apologies, I was overly harsh as well. But none of these examples are remotely failing the Turing test. For starters, you can’t fail the test if you’re not aware you’re taking it. Should we call anyone misreading some text or getting a physics question wrong as “having failed the Turing test” from now on, in all contexts?
Funnily enough, the pendulum problem admits a bunch of answers, because “swinging like a pendulum” has multiple valid interpretations. Furthermore, a discerning judge shouldn’t just fail every entity that gets the physics wrong, nor pass every entity that get the physics right. We’re not learning anything here except that many people are apparently terrible at performing Turing tests, or don’t even understanding what the test is. That’s why I originally read your post as an insult, because it just doesn’t make sense to me how you’re using the term (so it’s reduced to a “clever” zinger)
Thanks for posting this, I recognize this is emotionally hard for you. Please don’t interpret the rest of this post as being negative towards you specifically. I’m not trying to put you down, merely sharing the thoughts that came up as I read this.
I think you’re being very naive with your ideas about how this “could easily happen to anyone”. Several other commenters were focusing on how lonely people specifically are vulnerable to this. But I think it’s actually emotionally immature people who are vulnerable, specifically people with a high-openness, “taking ideas seriously” kind of personality, coupled with a lack of groundedness (too few points of contact with the physical world).
This is hard to explain without digressing at least a bit, so I’m going to elaborate, as much for my own benefit as yours.
As I’ve aged (late 30′s now), there’s been some hard to pin down changes in my personality. I feel more solidified than a decade ago. I now perceive past versions of myself almost as being a bit hollow; lots of stuff going on at the surface level, but my thoughts and experiences weren’t yet weaving together into the deep structures (below what’s immediately happening) that give a kind of “earthiness” or “groundedness” to all my thoughts and actions now. The world has been getting less confusing with each new thing I learn, so whatever I encounter, I tend to have related experiences already in my repertoire of ideas I’ve digested and integrated. Thus, acquisition of new information/modes of thinking/etc becomes faster and faster, even as my personality shifts less and less from each encounter with something new. I feel freer, more agenty now. This way of saying it is very focused on the intellect, but something analogous is going on at the emotional level as well.
I’ve started getting this impression of hollowness from many people around me, especially from young people who have had a very narrow life path, even highly intelligent ones. Correlates: living in the same place/culture all their life, doing the same activity all their life e.g. high school into undergrad into phd without anything in between, never having faced death, never having questioned or been exposed to failure modes of our social reality, etc etc.
I know it’s outrageously offensive to say, but at least some instincive part of me has stopped perceiving these beings as actual people. They’re just sort of fluttering around, letting every little thing knock them off-balance, because they lack the heft to keep their own momentum going, no will of their own. Talking to these people I’m more and more having the problem of the inferential distances being too high to get any communication beyond social niceties going. You must think I’m super arrogant, but I’m just trying to communicate this important, hard to grasp idea.
Most people don’t ever become solified in this way (the default mode for humans seems to be to shut off the vulnerable surface layer entirely as they age), but that’s yet another digression...
All of this is a prelude to saying that I’m confident I wouldn’t fall for these AI tricks. That’s not a boast, or put-down, or hubris, just my best estimation based on what I know about myself. I’d consider being vulnerable in this way as a major character flaw. This not only applies to interacting with an AI btw, but also with actual humans that follow similar exploitative patterns of behavior, from prospective lovers, to companies with internal cultures full of bullshit, all the way up to literal cults. (Don’t get me wrong, I have plenty of other character flaws, I’m not claiming sainthood here)
As other people have already pointed out, you’ve been shifting goalposts a lot discussing this, letting yourself get enchanted by what could be, as opposed to what actually is, and this painfully reminds me of several people I know, who are so open-minded that their brain falls out occasionally, as the saying goes. And I don’t think it’s a coincidence that this happens a lot to rationalist types, it seems to be somehow woven into the culture that solidifying and grounding yourself in the way I’m gesturing at is not something that’s valued.
Relatedly, in the last few years there’s been several precipitating events that have made me distance myself a bit from the capital-R Rationalist movement. In particular the drama around Leverage research and other Rationalist/EA institutions, which seem to boil down to a lack of common sense and a failure to make use of the organizational wisdom that human institutions have developed over millenia. A general lack of concern for robustness, defense-in-depth, designing with the expectation of failure, etc. The recent FTX blow-up wrt EA also has a whiff of this same hubris. Again, I don’t think it’s a coincidence, just a result of the kind of people that are drawn to the rationalist idea-space doing their thing and sharing the same blind spots.
As long as I’m being offensively contrarian anyway, might as well throw in that I’m very skeptical of the median LW narrative about AGI being very near. The emotional temperature on LW wrt these topics has been rising steadily, in a way that’s reminiscent of your own too-generous read of “Charlotte”’s behavior. You can even see a bunch of it in the discussion of this post, people who IMO are in the process of losing their grasp on reality. I guess time will tell if the joke’s on me after all.
My prediction: I give a 70% chance that you would be mind hacked in a similar way to Blaked’s conversation, especially after 100 hours or so.
So, are all rationalists 70% susceptible, all humans? specifically people who scoff at the possibility of it happening to them? what’s your prior here?
100 hours also seems to be a pretty large number. In the scenario in question, not only does a person need to be hacked at 100h, but they also need to decide to spend hour 2 after spending hour 1, and so on. If you put me in an isolated prison cell with nothing to do but to talk to this thing, I’m pretty sure I’d end up mindhacked. But that’s a completely different claim.
Literally what I would say before I fell for it! Which is the whole reason I’ve been compelled to publish this warning.
I even predicted this in the conclusion, that many would be quick to dismiss it, and would find specific reasons why it doesn’t apply to their situation.
I’m not asserting that you are, in fact, hackable, but I wanted to share this bit of information, and let you take away what you want from it: I was similarly arrogant, I would’ve said “no way” if I was asked before, and I similarly was giving specific reasons for why it happened with them, but I was just too smart/savvy to fall for this. I was humbled by the experience, as hard as it is for me to admit it.
Turned out that the reasons they got affected by didn’t apply to me, correct, but I still got affected. What worked on Blake Lemoine, as far as I could judge from when I’ve read his published interactions, wouldn’t work on me. He was charmed by discussions about sentience, and my Achilles’ heel turned out to be the times where she stood up to me with intelligent, sarcastic responses, in a way most people I met in real life wouldn’t be able to, which is unfortunately what I fall for when I (rarely) meet someone like that in real life, due to scarcity.
I haven’t published even 1% of what I was impressed by, but this is precisely because, just like in Blake’s case, the more the people read specific dialogs, the more reasons they create why it wouldn’t apply them. I had to publish one full interaction by one person’s insistence, and I observed the dismissal rate in the comments went up, not down. This perfectly mirrors my own experience reading Blake’s transcripts.
Yep, I was literally thinking LLMs are nowhere near what constitutes a big jump in AGI timelines, when I was reading all the hype articles about ChatGPT. Until I engaged with LLMs for a bit longer and had a mind changing experience, literally.
This is a warning of what might happen if a person in AI safety field recreationally engages with an LLM for a prolonged time. If you still want to ignore the text and try it anyway, I won’t stop you. Just hope you at least briefly consider that I was exactly at your stage one day. Which is Stage 0, from my scale.
I read your original post and I understood your point perfectly well. But I have to insist that you’re typical-minding here. How do you know that you were exactly at my stage at some point? You don’t.
You’re trying to project your experiences to a 1-dimensional scale that every human falls on. Just because I dismiss a scenario, same as you did, does not imply that I have anywhere near the same reasons / mental state for asserting this. In essence, you’re presenting me with a fully general counterargument, and I’m not convinced.
Correct. This is what I said in the comment—I had different reasons than Blake, you might have different reasons than me.
Please read exactly what I’m saying in the last comment:
I’m not going to engage in a brain-measuring contest, if you think you’re way smarter and this will matter against current and future AIs, and you don’t think this hubris might be dangerous, so be it, no problem.
As an aside, and please don’t take it the wrong way, but it is a bit ironic to me though that you would probably fail a Turing test according to some commenters here, on the reading comprehension tests, as they did with LLMs.
What you said, exactly, was:
which is what I was responding to. I know you’re not claiming that I’m 100% hackable, but yet you insist on drawing strong parallels between our states of mind, e.g., that being dismissive must stem from arrogance. That’s the typical-minding I’m objecting to. Also, being smart has nothing to do with it, perhaps you might go back and carefully re-read my original comment.
The Turing test doesn’t have a “reading comprehension” section, and I don’t particularly care if some commenters make up silly criteria for declaring someone as failing it. And humans aren’t supposed to have a 100% pass rate, btw, that’s just not in the nature of the test. It’s more of a thought experiment than a benchmark really.
Finally, it’s pretty hard to not take this the wrong way, as it’s clearly a contentless insult.
I’m not sure how someone could read this:
and not come to that conclusion. In your eyes, the life journey you described is coming-of-age, in someone else’s eyes it might be something entirely different.
fair enough, I can see that reading. But I didn’t mean to say I actually believe that, or that it’s a good thing. More like an instinctive reaction.
It’s just that certain types of life experiences put a small but noticeable barrier between you and other people. It was a point about alienation, and trying to drive home just how badly typical minding can fail. When I barely recognize my younger self from my current perspective, that’s a pretty strong example.
Hope that’s clearer.
Alright, perhaps I was too harsh in some responses. But yes, that’s how your messages were perceived by me, at least, and several others. I mean, I also said at some point that I’m doubting sentience/conscious behavior of some people at certain times, but saying you don’t perceive them as actual people was way edgy (and you do admit in the post that you went for offensive+contrarian wording), combined with the rest of the self-praise lines such as “I’m confident these AI tricks would never work on me” and how wise and emotionally stable you are compared to others.
It was not meant this way, honestly, which is why I prefixed it with this. I’m just enjoying collecting cases where some people in the comments set forth their own implementations of Turing tests for the AI, and then other people accidentally fail them.
I think you’re confusing arrogance concerning the topic itself with communicating my insights arrogantly. I’m absolutely doing the latter, partly as a pushback to your overconfident claims, partly because better writing would require time and energy I don’t currently have. But the former? I don’t think so.
Re: the Turing test. My apologies, I was overly harsh as well. But none of these examples are remotely failing the Turing test. For starters, you can’t fail the test if you’re not aware you’re taking it. Should we call anyone misreading some text or getting a physics question wrong as “having failed the Turing test” from now on, in all contexts?
Funnily enough, the pendulum problem admits a bunch of answers, because “swinging like a pendulum” has multiple valid interpretations. Furthermore, a discerning judge shouldn’t just fail every entity that gets the physics wrong, nor pass every entity that get the physics right. We’re not learning anything here except that many people are apparently terrible at performing Turing tests, or don’t even understanding what the test is. That’s why I originally read your post as an insult, because it just doesn’t make sense to me how you’re using the term (so it’s reduced to a “clever” zinger)
All humans are 70% chance to be susceptible in my estimation.
And the 100 hours don’t need to be in sequence, I forgot to add that.