This AI wouldn’t be trying to convince a human to help it, just that it’s going to succeed.
So instead of convincing humans that a hell-world is good, it would convince the humans that it was going to create a hell-world (and they would all disapprove, so it would score low).
I think what this ends up doing is having everyone agree with a world that sounds superficially good but is actually terrible in a way that’s difficult for unaided humans to realize e.g. the AI convinces everyone that it will create an idyllic natural world where people live forager lifestyles in harmony etc. etc., everyone approves because they like nature and harmony and stuff, it proceeds to create such an idyllic natural world, and wild animal suffering outweighs human enjoyment forevermore.
This AI wouldn’t be trying to convince a human to help it, just that it’s going to succeed.
So instead of convincing humans that a hell-world is good, it would convince the humans that it was going to create a hell-world (and they would all disapprove, so it would score low).
I think what this ends up doing is having everyone agree with a world that sounds superficially good but is actually terrible in a way that’s difficult for unaided humans to realize e.g. the AI convinces everyone that it will create an idyllic natural world where people live forager lifestyles in harmony etc. etc., everyone approves because they like nature and harmony and stuff, it proceeds to create such an idyllic natural world, and wild animal suffering outweighs human enjoyment forevermore.