I think this line of discussion would be well served by marking a natural boundary in the cluster “crazy.” Instead of saying “Vassar can drive people crazy” I’d rather taboo “crazy” and say:
Many people are using their verbal idea-tracking ability to implement a coalitional strategy instead of efficiently compressing external reality. Some such people will experience their strategy as invalidated by conversations with Vassar, since he’ll point out ways their stories don’t add up. A common response to invalidation is to submit to the invalidator by adopting the invalidator’s story. Since Vassar’s words aren’t selected to be a valid coalitional strategy instruction set, attempting to submit to him will often result in attempting obviously maladaptive coalitional strategies.
People using their verbal idea-tracking ability to implement a coalitional strategy cannot give informed consent to conversations with Vassar, because in a deep sense they cannot be informed of things through verbal descriptions, and the risk is one that cannot be described without the recursive capacity of descriptive language.
Personally I care much more, maybe lexically more, about the upside of minds learning about their situation, than the downside of mimics going into maladaptive death spirals, though it would definitely be better all round if we can manage to cause fewer cases of the latter without compromising the former, much like it’s desirable to avoid torturing animals, and it would be desirable for city lights not to interfere with sea turtles’ reproductive cycle by resembling the moon too much.
EDIT: Ben is correct to say we should taboo “crazy.”
This is a very uncharitable interpretation (entirely wrong). The highly scrupulous people here can undergo genuine psychological collapse if they learn their actions aren’t as positive utility as they thought. (entirely wrong)
I also don’t think people interpret Vassar’s words as a strategy and implement incoherence. Personally, I interpreted Vassar’s words as factual claims then tried to implement a strategy on them. When I was surprised by reality a bunch, I updated away. I think the other people just no longer have a coalitional strategy installed and don’t know how to function without one. This is what happened to me and why I repeatedly lashed out at others when I perceived them as betraying me, since I no longer automatically perceived them as on my side. I rebuilt my rapport with those people and now have more honest relationships with them. (still endorsed)
The highly scrupulous people here can undergo genuine psychological collapse if they learn their actions aren’t as positive utility as they thought.
“That which can be destroyed by the truth should be”—I seem to recall reading that somewhere.
And: “If my actions aren’t as positive utility as I think, then I desire to believe that my actions aren’t as positive utility as I think”.
If one has such a mental makeup that finding out that one’s actions have worse effects than one imagined causes genuine psychological collapse, then perhaps the first order of business is to do everything in one’s power to fix that (really quite severe and glaring) bug in one’s psyche—and only then to attempt any substantive projects in the service of world-saving, people-helping, or otherwise doing anything really consequential.
Personally, I interpreted Vassar’s words as factual claims then tried to implement a strategy on them. When I was surprised by reality a bunch, I updated away.
What specific claims turned out to be false? What counterevidence did you encounter?
Specific claim: the only nontrivial obstacle in front of us is not being evil
This is false. Object-level stuff is actually very hard.
Specific claim: nearly everyone in the aristocracy is agentically evil. (EDIT: THIS WAS NOT SAID. WE BASICALLY AGREE ON THIS SUBJECT.)
This is a wrong abstraction. Frame of Puppets seems naively correct to me, and has become increasingly reified by personal experience of more distant-to-my-group groups of people, to use a certain person’s language. Ideas and institutions have the agency; they wear people like skin.
Specific claim: this is how to take over New York.
Specific claim: this is how to take over New York.
Didn’t work.
I think this needs to be broken up into 2 claims:
1 If we execute strategy X, we’ll take over New York.
2 We can use straightforward persuasion (e.g. appeals to reason, profit motive) to get an adequate set of people to implement strategy X.
2 has been falsified decisively. The plan to recruit candidates via appealing to people’s explicit incentives failed, there wasn’t a good alternative, and as a result there wasn’t a chance to test other parts of the plan (1).
That’s important info and worth learning from in a principled way. Definitely I won’t try that sort of thing again in the same way, and it seems like I should increase my credence both that plans requiring people to respond to economic incentives by taking initiative to play against type will fail, and that I personally might be able to profit a lot by taking initiative to play against type, or investing in people who seem like they’re already doing this, as long as I don’t have to count on other unknown people acting similarly in the future.
But I find the tendency to respond to novel multi-step plans that would require someone do take initiative by sitting back and waiting for the plan to fail, and then saying, “see? novel multi-step plans don’t work!” extremely annoying. I’ve been on both sides of that kind of transaction, but if we want anything to work out well we have to distinguish cases of “we / someone else decided not to try” as a different kind of failure from “we tried and it didn’t work out.”
Specific claim: the only nontrivial obstacle in front of us is not being evil
This is false. Object-level stuff is actually very hard.
This seems to be conflating the question of “is it possible to construct a difficult problem?” with the question of “what’s the rate-limiting problem?”. If you have a specific model for how to make things much better for many people by solving a hard technical problem before making substantial progress on human alignment, I’d very much like to hear the details. If I’m persuaded I’ll be interested in figuring out how to help.
So far this seems like evidence to the contrary, though, as it doesn’t look like you thought you could get help making things better for many people by explaining the opportunity.
I think this line of discussion would be well served by marking a natural boundary in the cluster “crazy.” Instead of saying “Vassar can drive people crazy” I’d rather taboo “crazy” and say:
Personally I care much more, maybe lexically more, about the upside of minds learning about their situation, than the downside of mimics going into maladaptive death spirals, though it would definitely be better all round if we can manage to cause fewer cases of the latter without compromising the former, much like it’s desirable to avoid torturing animals, and it would be desirable for city lights not to interfere with sea turtles’ reproductive cycle by resembling the moon too much.
My problem with this comment is it takes people who:
can’t verbally reason without talking things through (and are currently stuck in a passive role in a conversation)
and who:
respond to a failure of their verbal reasoning
under circumstances of importance (in this case moral importance)
and conditions of stress, induced by
trying to concentrate while in a passive role
failing to concentrate under conditions of high moral importance
by simply doing as they are told—and it assumes they are incapable of reasoning under any circumstances.
It also then denies people who are incapable of independent reasoning the right to be protected from harm.
EDIT: Ben is correct to say we should taboo “crazy.”
This is a very uncharitable interpretation (entirely wrong). The highly scrupulous people here can undergo genuine psychological collapse if they learn their actions aren’t as positive utility as they thought. (entirely wrong)
I also don’t think people interpret Vassar’s words as a strategy and implement incoherence. Personally, I interpreted Vassar’s words as factual claims then tried to implement a strategy on them. When I was surprised by reality a bunch, I updated away. I think the other people just no longer have a coalitional strategy installed and don’t know how to function without one. This is what happened to me and why I repeatedly lashed out at others when I perceived them as betraying me, since I no longer automatically perceived them as on my side. I rebuilt my rapport with those people and now have more honest relationships with them. (still endorsed)
Beyond this, I think your model is accurate.
“That which can be destroyed by the truth should be”—I seem to recall reading that somewhere.
And: “If my actions aren’t as positive utility as I think, then I desire to believe that my actions aren’t as positive utility as I think”.
If one has such a mental makeup that finding out that one’s actions have worse effects than one imagined causes genuine psychological collapse, then perhaps the first order of business is to do everything in one’s power to fix that (really quite severe and glaring) bug in one’s psyche—and only then to attempt any substantive projects in the service of world-saving, people-helping, or otherwise doing anything really consequential.
Thank you for echoing common sense!
What is psychological collapse?
For those who can afford it, taking it easy for a while is a rational response to noticing deep confusion, continuing to take actions based on a discredited model would be less appealing, and people often become depressed when they keep confusedly trying to do things that they don’t want to do.
Are you trying to point to something else?
What specific claims turned out to be false? What counterevidence did you encounter?
Specific claim: the only nontrivial obstacle in front of us is not being evil
This is false. Object-level stuff is actually very hard.
Specific claim: nearly everyone in the aristocracy is agentically evil. (EDIT: THIS WAS NOT SAID. WE BASICALLY AGREE ON THIS SUBJECT.)
This is a wrong abstraction. Frame of Puppets seems naively correct to me, and has become increasingly reified by personal experience of more distant-to-my-group groups of people, to use a certain person’s language. Ideas and institutions have the agency; they wear people like skin.
Specific claim: this is how to take over New York.
Didn’t work.
I think this needs to be broken up into 2 claims:
1 If we execute strategy X, we’ll take over New York. 2 We can use straightforward persuasion (e.g. appeals to reason, profit motive) to get an adequate set of people to implement strategy X.
2 has been falsified decisively. The plan to recruit candidates via appealing to people’s explicit incentives failed, there wasn’t a good alternative, and as a result there wasn’t a chance to test other parts of the plan (1).
That’s important info and worth learning from in a principled way. Definitely I won’t try that sort of thing again in the same way, and it seems like I should increase my credence both that plans requiring people to respond to economic incentives by taking initiative to play against type will fail, and that I personally might be able to profit a lot by taking initiative to play against type, or investing in people who seem like they’re already doing this, as long as I don’t have to count on other unknown people acting similarly in the future.
But I find the tendency to respond to novel multi-step plans that would require someone do take initiative by sitting back and waiting for the plan to fail, and then saying, “see? novel multi-step plans don’t work!” extremely annoying. I’ve been on both sides of that kind of transaction, but if we want anything to work out well we have to distinguish cases of “we / someone else decided not to try” as a different kind of failure from “we tried and it didn’t work out.”
This is actually completely fair. So is the other comment.
This seems to be conflating the question of “is it possible to construct a difficult problem?” with the question of “what’s the rate-limiting problem?”. If you have a specific model for how to make things much better for many people by solving a hard technical problem before making substantial progress on human alignment, I’d very much like to hear the details. If I’m persuaded I’ll be interested in figuring out how to help.
So far this seems like evidence to the contrary, though, as it doesn’t look like you thought you could get help making things better for many people by explaining the opportunity.