Specific claim: the only nontrivial obstacle in front of us is not being evil
This is false. Object-level stuff is actually very hard.
Specific claim: nearly everyone in the aristocracy is agentically evil. (EDIT: THIS WAS NOT SAID. WE BASICALLY AGREE ON THIS SUBJECT.)
This is a wrong abstraction. Frame of Puppets seems naively correct to me, and has become increasingly reified by personal experience of more distant-to-my-group groups of people, to use a certain person’s language. Ideas and institutions have the agency; they wear people like skin.
Specific claim: this is how to take over New York.
Specific claim: this is how to take over New York.
Didn’t work.
I think this needs to be broken up into 2 claims:
1 If we execute strategy X, we’ll take over New York.
2 We can use straightforward persuasion (e.g. appeals to reason, profit motive) to get an adequate set of people to implement strategy X.
2 has been falsified decisively. The plan to recruit candidates via appealing to people’s explicit incentives failed, there wasn’t a good alternative, and as a result there wasn’t a chance to test other parts of the plan (1).
That’s important info and worth learning from in a principled way. Definitely I won’t try that sort of thing again in the same way, and it seems like I should increase my credence both that plans requiring people to respond to economic incentives by taking initiative to play against type will fail, and that I personally might be able to profit a lot by taking initiative to play against type, or investing in people who seem like they’re already doing this, as long as I don’t have to count on other unknown people acting similarly in the future.
But I find the tendency to respond to novel multi-step plans that would require someone do take initiative by sitting back and waiting for the plan to fail, and then saying, “see? novel multi-step plans don’t work!” extremely annoying. I’ve been on both sides of that kind of transaction, but if we want anything to work out well we have to distinguish cases of “we / someone else decided not to try” as a different kind of failure from “we tried and it didn’t work out.”
Specific claim: the only nontrivial obstacle in front of us is not being evil
This is false. Object-level stuff is actually very hard.
This seems to be conflating the question of “is it possible to construct a difficult problem?” with the question of “what’s the rate-limiting problem?”. If you have a specific model for how to make things much better for many people by solving a hard technical problem before making substantial progress on human alignment, I’d very much like to hear the details. If I’m persuaded I’ll be interested in figuring out how to help.
So far this seems like evidence to the contrary, though, as it doesn’t look like you thought you could get help making things better for many people by explaining the opportunity.
Specific claim: the only nontrivial obstacle in front of us is not being evil
This is false. Object-level stuff is actually very hard.
Specific claim: nearly everyone in the aristocracy is agentically evil. (EDIT: THIS WAS NOT SAID. WE BASICALLY AGREE ON THIS SUBJECT.)
This is a wrong abstraction. Frame of Puppets seems naively correct to me, and has become increasingly reified by personal experience of more distant-to-my-group groups of people, to use a certain person’s language. Ideas and institutions have the agency; they wear people like skin.
Specific claim: this is how to take over New York.
Didn’t work.
I think this needs to be broken up into 2 claims:
1 If we execute strategy X, we’ll take over New York. 2 We can use straightforward persuasion (e.g. appeals to reason, profit motive) to get an adequate set of people to implement strategy X.
2 has been falsified decisively. The plan to recruit candidates via appealing to people’s explicit incentives failed, there wasn’t a good alternative, and as a result there wasn’t a chance to test other parts of the plan (1).
That’s important info and worth learning from in a principled way. Definitely I won’t try that sort of thing again in the same way, and it seems like I should increase my credence both that plans requiring people to respond to economic incentives by taking initiative to play against type will fail, and that I personally might be able to profit a lot by taking initiative to play against type, or investing in people who seem like they’re already doing this, as long as I don’t have to count on other unknown people acting similarly in the future.
But I find the tendency to respond to novel multi-step plans that would require someone do take initiative by sitting back and waiting for the plan to fail, and then saying, “see? novel multi-step plans don’t work!” extremely annoying. I’ve been on both sides of that kind of transaction, but if we want anything to work out well we have to distinguish cases of “we / someone else decided not to try” as a different kind of failure from “we tried and it didn’t work out.”
This is actually completely fair. So is the other comment.
This seems to be conflating the question of “is it possible to construct a difficult problem?” with the question of “what’s the rate-limiting problem?”. If you have a specific model for how to make things much better for many people by solving a hard technical problem before making substantial progress on human alignment, I’d very much like to hear the details. If I’m persuaded I’ll be interested in figuring out how to help.
So far this seems like evidence to the contrary, though, as it doesn’t look like you thought you could get help making things better for many people by explaining the opportunity.