ryan_greenblatt comments on Learning By Writing

ryan_greenblatt 17 Jan 2024 21:28 UTC
11 points
I strongly disagree with the commentary you provide being important or relevant for most people in practice.
- trevor 18 Jan 2024 15:55 UTC
  2 points
  Parent
  This is important and relevant for most people here, in practice.
  The world is changing and there is a risk that people in high-stakes areas like AI safety will be weaponized against each other as part of elite conflict. Understanding generative AI doesn’t really help with this, and in practice often misleads people to overestimate the difficulty of human manipulation; I wrote a 4-minute overview of the current state of the attack surface here (e.g. massive amounts of A/B testing to steer people in specific directions), and I described the fundamental problem previously:
  If there were intelligent aliens, made of bundles of tentacles or crystals or plants that think incredibly slowly, their minds would also have zero days that could be exploited because any mind that evolved naturally would probably be like the human brain, a kludge of spaghetti code that is operating outside of its intended environment, and they would also would not even begin to scratch the surface of finding and labeling those zero days until, like human civilization today, they began surrounding thousands or millions of their kind with sensors that could record behavior several hours a day and find webs of correlations.
  I’ve explained why people in AI safety should not consider themselves at anywhere near the same risk level as average citizens:
  This problem connects to the AI safety community in the following way:
  State survival and war power ==> already depends on information warfare capabilities.
  Information warfare capabilities ==> already depends on SOTA psychological research systems.
  SOTA psychological research systems ==> already improves and scales mainly from AI capabilities research, diminishing returns on everything else.^[1]
  AI capabilities research ==> already under siege from the AI safety community.
  Therefore, the reason why this might be such a big concern is:
  State survival and war power ==> their toes potentially already being stepped on by the AI safety community?
  I’ve also described why agents in general should be taking basic precautions as part of instrumental convergence:
  There isn’t much point in having a utility function in the first place if hackers can change it at any time. There might be parts that are resistant to change, but it’s easy to overestimate yourself on this; for example, if you value the longterm future and think that no false argument can persuade you otherwise, but a social media news feed plants paranoia or distrust of Will Macaskill, then you are one increment closer to not caring about the longterm future; and if that doesn’t work, the multi-armed bandit algorithm will keep trying until it finds something that works.
  The human brain is a kludge of spaghetti code, so there’s probably something somewhere. The human brain has zero days, and the capability and cost of social media platforms to use massive amounts of human behavior data to find complex social engineering techniques is a profoundly technical matter, you can’t get a handle on this with intuition or pre 2010s historical precedent. Thus, you should assume that your utility function and values are at risk of being hacked at an unknown time, and should therefore be assigned a discount rate to account for the risk over the course of several years.
  Slow takeoff over the course of the next 10 years alone guarantees that this discount rate is too high in reality for people in the AI safety community to continue to go on believing that it is something like zero. I think that approaching zero is a reasonable target, but not with the current state of affairs where people don’t even bother to cover up their webcams, have important and sensitive conversations about the fate of the earth in rooms with smartphones, and use social media for nearly an hour a day (scrolling past nearly a thousand posts). The discount rate in this environment cannot be considered “reasonably” close to zero if the attack surface is this massive; and the world is changing this quickly.
  If people have anything they value at all, and the AI safety community probably does have that, then the current AI safety paradigm of zero effort is wildly inappropriate, it is basically total submission to invisible hackers.
  In fact, I’ve made a very solid case that pouring your thoughts directly into a modern computer, even through a keyboard with no other sensor exposure, is a deranged and depraved thing for an agent to do, and it is right to recoil in horror when you see an influential person encouraging many other influential people to do it.