If the average Joe on the street would not succumb to their mind being hacked by Eliezer Yudkowsky, or hell, by a late 2022 chatbot, and you potentially would (by virtue of being a part of the reference class of LessWrong users or whatever)—then you have failed and it is not obvious you can make an expected positive contribution to the field of AI risk reduction at all without becoming far more, for lack of a better word, normal. I don’t understand how people think that spending your time working on increasingly elaborate pseudophilosophical things that they then call “AI alignment” works if they are also the type of people who are highly vulnerable to getting mindhacked by ChatGPT—perhaps this is a bucket error or I’m attacking a strawman? I don’t think Eliezer or Nate or whatever would fall to this failure mode but in general the more philosophical parts of alignment to me feel worrying (and specifically I mean the MIRI-CFAR-sphere, although again maybe worried about attacking a strawman), because the potential negatives of “having people close to alignment solutions be unusually vulnerable to being hacked by AI.”
IMO, I don’t agree with this take, since I think a common problem here is people falsely believe they wouldn’t fall for ChatGPT or some other nonsense. In general people way overrate how well they would do in this situation.
IMO, I don’t agree with this take, since I think a common problem here is people falsely believe they wouldn’t fall for ChatGPT or some other nonsense. In general people way overrate how well they would do in this situation.