Kaj_Sotala comments on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Kaj_Sotala Apr 17, 2025, 8:22 PM
2 points
0
Yeah, I could imagine an AI being superhuman in some narrow but important domain like persuasion, cybersecurity, or bioweapons despite this. Intuitively that feels like it wouldn’t be enough to take over the world, but it could possibly still fail in a way that took humanity down with it.
- Eli Tyre Apr 17, 2025, 8:32 PM
  2 points
  0
  Parent
  I’m not sure that I share that intuition, I think because my background model of humans has them as much less general than I imagine yours does.
  - Kaj_Sotala Apr 18, 2025, 5:14 AM
    2 points
    0
    Parent
    To make it a bit more explicit:
    If you are superintelligent in the bioweapon domain: seems pretty obvious why that wouldn’t let you take over the world. Sure maybe you can get all the humans killed, but unless automation also advances very substantially, this will leave nobody to maintain the infrastructure that you need to run.
    Cybersecurity: if you just crash all the digital infrastructure, then similar. If you try to run some scheme where you extort humans to get what you want, expect humans to fight back, and then you are quickly in a very novel situation and the kind of a “world war” nobody has ever seen before.
    Persuasion: depends on what we take the limits of persuasion to be. If it’s possible to completely take over the mind of anyone by speaking ten words to them then sure, you win. But if we look at humans, great persuaders often aren’t persuasive to everyone—rather they appeal very strongly to a segment of the population that happens to respond to a particular message while turning others off. (Trump, Eliezer, most politicians.) This strategy will get you part of the population while polarizing the rest against you and then you need more than persuasion ability to figure out how to get your faction to triumph.
    If you want to run some galaxy-brained scheme where you give people inconsistent messages in order to appeal to all of them, you risk getting caught and need more than persuasion ability to make it work.
    You can also be persuasive by being generally truthful and providing people with a lot of value and doing beneficial things. One can try to fake this by doing things that look beneficial but aren’t, but then you need more than persuasion ability to figure out what those would be.
    Probably the best strategy would be to keep being genuinely helpful until people trust you enough to put you in a position of power and then betray that trust. I could imagine this working. But it would be a slow strategy as it would take time to build up that level of trust, and in the meanwhile many people would want to inspect your source code etc. to verify that you are trustworthy, and you’d need to ensure that doesn’t reveal anything suspicious.