I think organizing a group around political action is much more fraught than organizing a group around a set of shared epistemic virtues and confusions, and I expect a community that spent most of its time on something much closer to political advocacy would very quickly go insane. I think especially young people should really try to not spend their first few years after getting interested in the future of humanity going around and convincing others. I think that’s a terrible epistemic environment in which to engage honestly with the ideas.
I think the downside risk of most of these interventions is pretty huge, mostly because of effects on epistemics and morals (I don’t care that much about e.g. annoying capabilities researchers or something). I think a lot of these interventions have tempting paths where you exaggerate or lie or generally do things that are immoral in order to acquire political power, and I think this will both predictably cause a lot of high-integrity people to feel alienated and will cause the definitions and ontologies around AI alignment to get muddled, both within our own minds and the mind of the public.
I agree truth matters, but I have a question here: Why can’t we sacrifice a small amount of integrity and conventional morality in order to win political power, when the world is at stake? After all, we can resume it later, when the problem is solved.
As my favorite quote from Michael Vassar says: “First they came for our epistemology. And then they...well, we don’t really know what happened next”.
But some more detailed arguments:
In-particular if you believe in slow-takeoff worlds, a lot of the future of the world rests on our ability to stay sane when the world turns crazy. I think in most slow-takeoff worlds we are going to see a lot more things that make the world at least as crazy and tumultuous as during COVID, and I think our rationality was indeed not strong enough to really handle COVID gracefully (we successfully noticed it was happening before the rest of the world, but then fucked up by being locked down for far too long and too strictly when it was no longer worth it).
At the end of the day, you also just have to solve the AI Alignment problem, and I think that is the kind of problem where you should look very skeptically at trading off marginal sanity. I sure see a lot of people sliding off the problem and instead just rationalize doing capabilities research instead, which sure seems like a crazy error to me, but shows that even quite smart people with a bit less sanity are tempted to do pretty crazy things in this domain (or alternatively, if they are right, that a lot of the best things to do are really counterintuitive and require some galaxy-brain level thinking).
I think especially when you are getting involved in political domains and are sitting on billions of dollars of redirectable money and thousands of extremely bright redirectable young people, you will get a lot of people trying to redirect your resources towards their aims (or you yourself will feel the temptation that you can get a lot more resources for yourself if you just act a bit more adversarially). I think resistance against adversaries (at least for the kind of thing that we are trying to do) gets a lot worse if you lose marginal sanity. People can see each others reasoning being faulty, this reduces trust, which increases adversarialness, which reduces trust, etc.
And then there is also just the classical “Ghandi murder pill” argument where it sure seems that slightly less sane people care less about their sanity. I think the arguments for staying sane and honest and clear-viewed yourself are actually pretty subtle, and less sanity I think can easily send us off on a slope of trading away more sanity for more power. I think this is a pretty standard pathway that you can read about in lots of books and has happened in lots of institutions.
I think one of the foremost questions I and many people ask when deciding who to allocate political power to is “will this person abuse their position” (or relatedly “will the person follow-through on their stated principles when I cannot see their behavior”), and only secondarily “is this person the most competent for it or the most intelligent person with it”. Insofar as this is typical, in an iterated game you should act as someone who can be given political power without concern about whether you will abuse it, if you would like to be given it at all.
I tend to believe that, if I’m ever in a situation where I feel that I might want to trade ethics/integrity to get what I want, instead, if I am smarter or work harder for a few more months, I will be able to get it without making any such sacrifices, and this is better because ethical people with integrity will continue to trust and work with me.
A related way I think about this is that ethical people with integrity work together, but don’t want to work with people who don’t have ethics or integrity. For example I know someone who once deceived their manager, to get their job done (cf. Moral Mazes). Until I see this person repent or produce credible costly signals to the contrary, I will not give this person much resources or work with them.
That said ‘conventional’ ethics, i.e. the current conventions, include things like recycling and not asking people out on dates in the same company, and I already don’t think these are actually involved in ethical behavior, so I’ve no truck with dismissing those.
“whether you will be caught abusing it” not “whether you will abuse it”. For certain kinds of actions it is possible to reliably evade detection and factor in the fact that you can reliably evade detection.
I don’t know who you have in your life, but in my life there is a marked difference between the people who clearly care about integrity, who clearly care about following the wishes of others when they have power over resources that they in some way owe to others, and those who do not (e.g. spending an hour thinking through the question “Hm, now that John let me stay at his house while he’s away, how would he want me to treat it?”). The cognition such people run would be quite costly to run differently at the specific time to notice that it’s worth it to take adversarial action, and they pay a heavy cost doing it in all the times I end up being able to check. The people whose word means something are clear to me, and my agreements with them are more forthcoming and simple than with others.
If an organisation reliably accepts certains forms of corruption, the current leadership may want people who engage in those forms of corruption to be given power and brought into leadership.
It is my experience that those with integrity are indeed not the sorts of people that people in corrupt organizations want to employ, they do not perform as the employers wish, and get caught up in internal conflicts.
And it is possible if you create a landscape with different incentives people won’t fall back to their old behaviours but show new ones instead.
I do have a difference in my mind between “People who I trust to be honest and follow through on commitments in the current, specific incentive landscape” and “People who I trust to be honest and follow through on commitments in a wide variety of incentive landscapes”.
I agree truth matters, but I have a question here: Why can’t we sacrifice a small amount of integrity and conventional morality in order to win political power, when the world is at stake? After all, we can resume it later, when the problem is solved.
As my favorite quote from Michael Vassar says: “First they came for our epistemology. And then they...well, we don’t really know what happened next”.
But some more detailed arguments:
In-particular if you believe in slow-takeoff worlds, a lot of the future of the world rests on our ability to stay sane when the world turns crazy. I think in most slow-takeoff worlds we are going to see a lot more things that make the world at least as crazy and tumultuous as during COVID, and I think our rationality was indeed not strong enough to really handle COVID gracefully (we successfully noticed it was happening before the rest of the world, but then fucked up by being locked down for far too long and too strictly when it was no longer worth it).
At the end of the day, you also just have to solve the AI Alignment problem, and I think that is the kind of problem where you should look very skeptically at trading off marginal sanity. I sure see a lot of people sliding off the problem and instead just rationalize doing capabilities research instead, which sure seems like a crazy error to me, but shows that even quite smart people with a bit less sanity are tempted to do pretty crazy things in this domain (or alternatively, if they are right, that a lot of the best things to do are really counterintuitive and require some galaxy-brain level thinking).
I think especially when you are getting involved in political domains and are sitting on billions of dollars of redirectable money and thousands of extremely bright redirectable young people, you will get a lot of people trying to redirect your resources towards their aims (or you yourself will feel the temptation that you can get a lot more resources for yourself if you just act a bit more adversarially). I think resistance against adversaries (at least for the kind of thing that we are trying to do) gets a lot worse if you lose marginal sanity. People can see each others reasoning being faulty, this reduces trust, which increases adversarialness, which reduces trust, etc.
And then there is also just the classical “Ghandi murder pill” argument where it sure seems that slightly less sane people care less about their sanity. I think the arguments for staying sane and honest and clear-viewed yourself are actually pretty subtle, and less sanity I think can easily send us off on a slope of trading away more sanity for more power. I think this is a pretty standard pathway that you can read about in lots of books and has happened in lots of institutions.
A few quick thoughts:
I think one of the foremost questions I and many people ask when deciding who to allocate political power to is “will this person abuse their position” (or relatedly “will the person follow-through on their stated principles when I cannot see their behavior”), and only secondarily “is this person the most competent for it or the most intelligent person with it”. Insofar as this is typical, in an iterated game you should act as someone who can be given political power without concern about whether you will abuse it, if you would like to be given it at all.
I tend to believe that, if I’m ever in a situation where I feel that I might want to trade ethics/integrity to get what I want, instead, if I am smarter or work harder for a few more months, I will be able to get it without making any such sacrifices, and this is better because ethical people with integrity will continue to trust and work with me.
A related way I think about this is that ethical people with integrity work together, but don’t want to work with people who don’t have ethics or integrity. For example I know someone who once deceived their manager, to get their job done (cf. Moral Mazes). Until I see this person repent or produce credible costly signals to the contrary, I will not give this person much resources or work with them.
That said ‘conventional’ ethics, i.e. the current conventions, include things like recycling and not asking people out on dates in the same company, and I already don’t think these are actually involved in ethical behavior, so I’ve no truck with dismissing those.
I don’t know who you have in your life, but in my life there is a marked difference between the people who clearly care about integrity, who clearly care about following the wishes of others when they have power over resources that they in some way owe to others, and those who do not (e.g. spending an hour thinking through the question “Hm, now that John let me stay at his house while he’s away, how would he want me to treat it?”). The cognition such people run would be quite costly to run differently at the specific time to notice that it’s worth it to take adversarial action, and they pay a heavy cost doing it in all the times I end up being able to check. The people whose word means something are clear to me, and my agreements with them are more forthcoming and simple than with others.
It is my experience that those with integrity are indeed not the sorts of people that people in corrupt organizations want to employ, they do not perform as the employers wish, and get caught up in internal conflicts.
I do have a difference in my mind between “People who I trust to be honest and follow through on commitments in the current, specific incentive landscape” and “People who I trust to be honest and follow through on commitments in a wide variety of incentive landscapes”.