I enjoyed the story and I agree with most of the points you’re making, but I’m not sure calling the situation “morally depraved” is a good strategy. (I’d probably call it “suboptimal” or “dire” depending how strongly I wanted to express my feelings.) I think I’m more skeptical than you are that it’s possible to do much better (i.e., build functional information-processing institutions) before the world changes a lot for other reasons (e.g., superintelligent AIs are invented), so I’d optimize more for not making enemies or alienating people than for making people realize how bad the situation is or joining your cause.
On a practical level, do you think there are any charities existing today that are doing substantially better than others with regard to these problems?
Agreed that calling things morally depraved is a limited strategy and has downsides. Part of the dialogue was meant to explore the limitations of moral outrage. Perhaps a better reframing is to say that much of the economy is trapped by one or more basilisks (in the sense of Roko’s basilisk); the blame goes on the basilisk rather than on the people trapped in it.
I think we do disagree on how important it is to make people realize how bad the situation is. In particular, I am not sure how you expect x-risk research/strategy to accomplish its objectives without having at least one functional information-processing institution! If solving AI safety (or somehow preventing AI risk without solving AI safety directly) requires solving hard philosophical problems, comparably hard to or harder than what have been solved in the past, then that requires people to optimize for something other than optics. Moreover, it almost certainly requires a network of communicating people, such that the network is processing information about hard philosophical problems.
I do think there are institutions that process information well enough to do pretty-hard technical things (such as SpaceX), and some relevant questions are (a) what are they doing right, and (b) how can they be improved on.
It could be that AGI will not be developed for a very long time, due to how dysfunctional current institutions are. That actually seems somewhat likely [EDIT: to be more precise, dysfunctional institutions creating AGI is more of an out-of-model tail risk than a mainline event, I understated this for some reason]. But, that would imply that there’s a lot of time to build functional institutions.
Regarding charities: I think people should often be willing to fund things that they can actually check on and see the results of. For example, if you can evaluate open source software, then that allows you to selectively donate to projects that produce good software. You can also use recommendations by people you trust, who themselves evaluate things. (Unfortunately, if you’re one of the only people actually checking, then there might not be strong UDT reasons to donate at all)
However, if something is hard to evaluate, then going based only on what you see results in optimization for optics. The “generalized MIRI donor strategy” would be to, when you notice a problem that you are ill-equipped to solve yourself, other people have produced some output you can evaluate (so you believe they are competent), and it seems like they are well-motivated to solve the problem (or otherwise do useful things), increase these people’s slack so they can optimize for something other than optics. I think this is one of the better strategies for donors. It could suggest giving either to a charity itself or the individuals in it, depending on which increases slack the most. (On the object level, I think MIRI has produced useful output, and that increasing their employees’ slack is a good thing, though obviously I haven’t evaluated other charities nearly as much as I have evaluated MIRI, so combine your information state with mine, and also only trust me as much as makes sense given what you know about me).
In particular, I am not sure how you expect x-risk research/strategy to accomplish its objectives without having at least one functional information-processing institution! If solving AI safety (or somehow preventing AI risk without solving AI safety directly) requires solving hard philosophical problems, comparably hard to or harder than what have been solved in the past, then that requires people to optimize for something other than optics.
I guess it’s a disjunction of various low-probability scenarios like:
Metaphilosophy turns out to be easy.
Another AI alignment approach turns out to be easy (i.e., does not require solving hard philosophical problems)
Existing institutions turn out to be more effective than they appear.
Some kind of historical fluke or close call with x-risk causes world leaders to get together and cooperate to stop AI development.
In my mind “building much better information-processing institutions turns out to be easier than I expect” is comparable in likelihood to these scenarios so I don’t feel like pushing really hard on that at a high cost to myself personally or to my contributions to these other scenarios. But if others want to do it I guess I’m willing to help if I can do so at relatively low cost.
I do think there are institutions that process information well enough to do pretty-hard technical things (such as SpaceX), and some relevant questions are (a) what are they doing right, and (b) how can they be improved on.
It’s not clear to me that whatever institutional innovations SpaceX came up with (if any, i.e., if their success isn’t due to other factors) to be more successful than their competitors on solving technical problems would transfer to the kinds of problems you described for the charity sector. Until I see some analysis showing that, it seems a priori unlikely to me.
It could be that AGI will not be developed for a very long time, due to how dysfunctional current institutions are.
That actually seems somewhat likely. But, that would imply that there’s a lot of time to build functional institutions.
Don’t you think it takes institutions that are more functional to build aligned AGI than to build unaligned AGI? I have a lot of uncertainty about all this but I expect that we’re currently near or over the threshold for unaligned AGI but well under the threshold for aligned AGI. (It seems totally plausible that by pushing for more functional institutions you end up pushing them over the threshold for unaligned AGI but still under the threshold for aligned AGI, but that might be a risk worth taking.)
Regarding charities: I think people should often be willing to fund things that they can actually check on and see the results of.
Almost nobody can do this for AI alignment, and it’s really costly even for the people that can. For example I personally don’t understand or can’t motivate myself to learn a lot of things that get posted to AF. Seriously, what percentage of current donors for AI alignment can do this?
You can also use recommendations by people you trust, who themselves evaluate things.
How do you suggest people to find trustworthy advisors to help them evaluate things, if they themselves are not experts in the field? Wouldn’t they have to rely on the “optics” of the advisors? Is this likely to be substantially better than relying on the optics of charities?
(Unfortunately, if you’re one of the only people actually checking, then there might not be strong UDT reasons to donate at all)
(I’m not convinced that UDT applies to humans, so if I was donating, it would be to satisfy some faction of my moral parliament.)
The “generalized MIRI donor strategy” would be to
Why is this called “generalized MIRI donor strategy”?
other people have produced some output you can evaluate (so you believe they are competent), and it seems like they are well-motivated to solve the problem (or otherwise do useful things)
This seems really hard for almost anyone to evaluate, both the technical side (see above), and the motivational side. And wouldn’t this cause a lot of people to optimize for producing output that typical donors could evaluate (rather than what’s actually important), and for producing potentially wasteful signals of motivation?
increase these people’s slack so they can optimize for something other than optics
Can you be more explicit about what you’re proposing? What exactly should one do to give these people more slack? Give unconditional cash gifts? I imagine that cash gifts wouldn’t be all that helpful unless it’s over the threshold where they can just quit their job and work on whatever they want, otherwise they’d still have to try to keep their jobs which means working on whatever their employers tell them to, which means optimizing for optics (since that’s what their employers need them to do to keep the donations coming in).
On the object level, I think MIRI has produced useful output, and that increasing their employees’ slack is a good thing, though obviously I haven’t evaluated other charities nearly as much as I have evaluated MIRI, so combine your information state with mine, and also only trust me as much as makes sense given what you know about me
If I followed this strategy, how would I know that I was funding (or doing the best that I can to fund) the most cost-effective work (i.e., that there isn’t more cost-effective work that I’m not funding because I can’t evaluate that work myself and I don’t know anyone I trust who can evaluate that work)? How would I know that I wasn’t creating incentives that are even worse than what’s typical today (namely to produce what people like me can evaluate and to produce signals of well-motivation)? How do I figure out who I can trust to help me evaluate individuals and charities? Do you think there’s a level of evaluative ability in oneself and one’s trusted advisors below which it would be better to outsource the evaluation to something like BERI, FLI, or OPP instead? If so, how do I figure out whether I’m above or below that threshold?
Instead of replying to me point by point, feel free to write up your thoughts more systematically in another post.
In my mind “building much better information-processing institutions turns out to be easier than I expect” is comparable in likelihood to these scenarios
My sense is that building functional institutions is something that has been done in history multiple times, and that the other things you list are extreme tail scenarios. For some context, I recommend reading Samo’s blog; I think my views on this are pretty similar to his.
It’s not clear to me that whatever institutional innovations SpaceX came up with (if any, i.e., if their success isn’t due to other factors) to be more successful than their competitors on solving technical problems would transfer to the kinds of problems you described for the charity sector.
Perhaps the answer is that some problems (including AI alignment) are not best solved by charities. It seems like something that works at a well-funded company (that has long time horizons, like SpaceX) would probably also work for a well-funded nonprofit (since at this point the difference is mostly nominal), though probably it’s easier to get a company this well-funded than a nonprofit.
Don’t you think it takes institutions that are more functional to build aligned AGI than to build unaligned AGI?
Most likely, yes. The question of whether we’re above or below the threshold for unaligned AGI seems like a judgment call. Based on my models (such as this one), the chance of AGI “by default” in the next 50 years is less than 15%, since the current rate of progress is not higher than the average rate since 1945, and if anything is lower (the insights model linked has a bias towards listing recent insights).
Almost nobody can do this for AI alignment, and it’s really costly even for the people that can.
Agreed, maybe people who can’t do this should just give discretion over their AI donations to their more-technical friends, or just not donate in the AI space in the first place.
Why is this called “generalized MIRI donor strategy”?
I’m calling it this because it seems like the right generalization of the best reason to donate to MIRI.
Can you be more explicit about what you’re proposing? What exactly should one do to give these people more slack? Give unconditional cash gifts? I imagine that cash gifts wouldn’t be all that helpful unless it’s over the threshold where they can just quit their job and work on whatever they want, otherwise they’d still have to try to keep their jobs which means working on whatever their employers tell them to, which means optimizing for optics
I am thinking of either unconditional cash gifts or donations to their employer. Some charities will spend less on short-term signalling when they have more savings. You could talk to the people involved to get a sense of what their situation is, and what would be most helpful to them. If there’s a non-concavity here (where 10x the money is over 10x as good as 1x the money), I jokingly suggest gambling, but I am actually suspicious of this particular non-concavity, since having incrementally more savings means you can last incrementally longer after quitting your job.
If I followed this strategy, how would I know that I was funding (or doing the best that I can to fund) the most cost-effective work (i.e., that there isn’t more cost-effective work that I’m not funding because I can’t evaluate that work myself and I don’t know anyone I trust who can evaluate that work)?
You don’t, so you might have to lower your expectations. Given the situation, all the options are pretty suboptimal.
How would I know that I wasn’t creating incentives that are even worse than what’s typical today (namely to produce what people like me can evaluate and to produce signals of well-motivation)?
You don’t, but you can look at what your models about incentives say constitutes better/worse incentives. This is an educated guess, and being risk averse about it doesn’t make any sense in the x-risk space (since the idea of being risk averse about risk reduction is incoherent).
How do I figure out who I can trust to help me evaluate individuals and charities?
(This also answers some of the other points about figuring out who to trust)
Talk to people, see who tends to leave you with better rather than worse ideas. See who has a coherent worldview that makes correct predictions. See who seems curious. Make your own models and use them as consistency checks. Work on projects with people and see how they go. Have friend networks that you can ask about things. This is a pretty big topic.
Some of these things would fall under optics, insofar as they are an incomplete evaluation that can be gamed. So you have to use multiple sources of information in addition to priors, and even then you will make mistakes. This is not avoidable, as far as I can tell. There’s something of a red queen race, where signals of good thinking eventually get imitated, so you can’t just use a fixed classifier.
Do you think there’s a level of evaluative ability in oneself and one’s trusted advisors below which it would be better to outsource the evaluation to something like BERI, FLI, or OPP instead?
Perhaps random people on the street would be better off deferring to one of these organizations, though it’s not clear why they would defer to one of these in particular (as opposed to, say, the Gates Foundation or their university). I am not actually sure how you end up with the list BERI/FLI/OPP; the main commonality is that they give out grants to other charities, but that doesn’t necessarily put them in a better epistemic position than, say, MIRI or FHI, or individual researchers (consider that, when you give grants, there is pressure to deceive you in particular). In general the question of who to outsource the evaluation to is not substantially easier than the question of which people you should give money to, and could be considered a special case (where the evaluator itself is considered as a charity).
Based on my models (such as this one), the chance of AGI “by default” in the next 50 years is less than 15%, since the current rate of progress is not higher than the average rate since 1945, and if anything is lower (the insights model linked has a bias towards listing recent insights).
Both this comment and my other comment are way understating our beliefs about AGI. After talking to Jessica about it offline to clarify our real beliefs rather than just playing games with plausible deniability, my actual probability is between 0.5 and 1% in the next 50 years. Jessica can confirm that hers is pretty similar, but probably weighted towards 1%.
My sense is that building functional institutions is something that has been done in history multiple times, and that the other things you list are extreme tail scenarios.
People are trying to build more functional institutions, or at least unintentionally experimenting with different institutional designs, all the time, so the fact that building functional institutions is something that has been done in history multiple times doesn’t imply that any particular attempt to build a particular kind of functional institutions has a high or moderate chance of success.
For some context, I recommend reading Samo’s blog
Can you recommend some specific articles?
It seems like something that works at a well-funded company (that has long time horizons, like SpaceX) would probably also work for a well-funded nonprofit
Are there any articles about what is different about SpaceX as far as institutional design? I think I tried to find some earlier but couldn’t. In my current state of knowledge I don’t see much reason to think that SpaceX’s success is best explained by it having made a lot of advances in institutional design. If it actually did create a lot of advances in institutional design, and they can be applied to a nonprofit working on AI alignment, wouldn’t it be high priority to write down what those advances are and how they can be applied to the nonprofit, so other people can critique those ideas?
Let’s reorganize the rest of our discussion by splitting your charity suggestion into two parts which we can consider separately:
Use your own or your friends’ evaluations of technical work instead of outsourcing evaluations to bigger / more distant / more formal organizations.
Give awards/tenure instead of grants.
(Is this a fair summary/decomposition of your suggestion?) I think aside from my previous objections which I’m still not sure about, perhaps my true rejection to 1 combined with 2 is that I don’t want to risk feeling duped and/or embarrassed if I help fund someone’s tenure and it turns out that I or my friends misjudged their technical abilities or motivation (i.e., they end up producing low quality work or suffer a large decrease in productivity). Yeah from an altruistic perspective I should be risk neutral but I don’t think I can override the social/face-saving part of my brain on this. To avoid this I could fund an organization instead of an individual but in that case it’s likely that most of the money would go towards expansion rather than increasing slack. I guess another problem is that if I start thinking about what AI alignment work to fund I get depressed thinking about the low chance of success of any particular approach, and it seems a lot easier to just donate to an evaluator charity and make it their problem.
It seems to me that 2 by itself is perhaps more promising and worth a try. In other words we could maybe convince a charity to reallocate some money from grants to awards/tenure and check if that improves outcomes, or create a new charity for this purpose. Is that something you’d support?
People are trying to build more functional institutions, or at least unintentionally experimenting with different institutional designs, all the time, so the fact that building functional institutions is something that has been done in history multiple times doesn’t imply that any particular attempt to build a particular kind of functional institutions has a high or moderate chance of success.
Agree, but this is a general reason why doing things is hard. Lots of people are working on philosophy too. I think my chance of success is way higher than the base rate, to the point where anchoring on the base rate does not make sense, but I can understand people not believing me on this. (Arguments of this form might fall under modest epistemology.)
In my current state of knowledge I don’t see much reason to think that SpaceX’s success is best explained by it having made a lot of advances in institutional design.
If one organization is much more effective than comparable ones, and it wasn’t totally by accident, then there are causal reasons for the difference in effectiveness. Even if it isn’t the formal structure of the organization, it could be properties of the people who seeded the organization, properties of the organization’s mission, etc. I am taking a somewhat broad view of “institutional design” that would include these things too. I am not actually saying anything less trivial than “some orgs work much better than other orgs and understanding why is important so this can be replicated and built upon”.
If it actually did create a lot of advances in institutional design, and they can be applied to a nonprofit working on AI alignment, wouldn’t it be high priority to write down what those advances
Yes, the general project of examining and documenting working institutions is high-priority. There will probably be multiple difficulties in directly applying SpaceX’s model (e.g. the fact that SpaceX is more engineering than research), so documenting multiple institutions would help. I am otherwise occupied right now, though.
(Is this a fair summary/decomposition of your suggestion?)
Yes, this would be my suggestion for donors.
I appreciate the introspection you’ve done on this, this is useful information. I think there’s a general issue where the AI alignment problem is really hard, so most people try to push the responsibility somewhere else, and those that do take responsibility usually end up feeling like they are highly constrained into doing the “responsible” thing (e.g. using defensible bureaucratic systems rather than their intuition), which is often at odds with the straightforward just-solve-it mental motion (which is used in, for example, playing video games), or curiosity in general (e.g. mathematical or scientific). I’ve experienced this personally. I recommend Ben’s post on this dynamic. This is part of why I exited the charity sector and don’t identify as an EA anymore. I don’t know how to fix this other than by taking the whole idea of non-local responsibility (i.e. beyond things like “I am responsible for driving safely and paying my bills on time”) less seriously, so I kind of do that.
In other words we could maybe convince a charity to reallocate some money from grants to awards/tenure and check if that improves outcomes, or create a new charity for this purpose. Is that something you’d support?
Yes, both awards and tenure seem like improvements, and in any case well worth experimenting with.
SpaceX might be more competitive than its competitors not because it’s particularly functional (i.e., compared to typical firms in other fields) but because its competitors are particularly non-functional.
If it is particularly effective but not due to the formal structure of the organization, then those informal things are likely to be hard to copy to another organization.
SpaceX can be very successful by having just marginally better institutional design than its competitors because they’re all trying to do the same things. A successful AI alignment organization however would have to be much more effective than organizations that are trying to build unaligned AI (or nominally trying to build aligned AI but cutting lots of corners to win the race, or following some AI alignment approach that seems easy but is actually fatally flawed).
those that do take responsibility usually end up feeling like they are highly constrained into doing the “responsible” thing (e.g. using defensible bureaucratic systems rather than their intuition), which is often at odds with the straightforward just-solve-it mental motion (which is used in, for example, playing video games), or curiosity in general (e.g. mathematical or scientific)
I don’t think that playing video games, math, and science are good models here because those all involve relatively fast feedback cycles which make it easy to build up good intuitions. It seems reasonable to not trust one’s intuitions in AI alignment, and the desire to appear defensible also seems understandable and hard to eliminate, but perhaps we can come up with better “defensible bureaucratic systems” than what exists today, i.e., systems that can still appear defensible but make better decisions than they currently do. I wonder if this problem has been addressed by anyone.
Yes, both awards and tenure seem like improvements, and in any case well worth experimenting with.
Ok, I made the suggestion to BERI since it seems like they might be open to this kind of thing.
ETA: Another consideration against individuals directly funding other individuals is that it wouldn’t be tax deductible. This could reduce the funding by up to 40%. If the funding is done through a tax-exempt non-profit, then IRS probably has some requirements about having formal procedures for deciding who/what to fund.
(Just got around to reading this. As a point of reference, it seems that at least Open Phil seems to have decided that tax-deductability is not more important than being able to give to things freely, which is why the Open Philanthropy Project is an LLC. I think this is at least slight evidence towards that tradeoff being worth it.)
There’s an enormous difference between having millions of dollars of operating expenditures in an LLC (so that an org is legally allowed to do things like investigate non-deductible activities like investment or politics), and giving up the ability to make billions of dollars of tax-deductible donations. Open Philanthropy being an LLC (so that its own expenses aren’t tax-deductible, but it has LLC freedom) doesn’t stop Good Ventures from making all relevant donations tax-deductible, and indeed the overwhelming majority of grants on its grants page are deductible.
Yep, sorry. I didn’t mean to imply that all of Open Phil’s funding is non-deductible, just that they decided that it was likely enough that they would find non-deductible opportunities that they went through the effort of restructuring their org to do so (and also gave up a bunch of other benefits like the ability to sponsor visas efficiently). My comment wasn’t very clear on that.
Here’s Open Phil’s blog post on why they decided to operate as an LLC. After reading it, I think their reasons are not very relevant to funding AI alignment research. (Mainly they want the freedom to recommend donations to non-501(c)(3) organizations like political groups.)
I am also pretty interested in 2 (ex-post giving). In 2015, there was impactpurchase.org. I got in contact with them about it, and the major updates Paul reported were a) being willing to buy partial contributions (not just for people who were claiming full responsibility for things) and b) more focused on what’s being funded (like for example, only asking for people to submit claims on blog posts and articles).
I realise that things like impactpurchase is possibly framed in terms of a slightly divergent reason for 2 (it seems more focused on changing the incentive landscape, whereas the posts above include thinking about whether giving slack to people with track records will lead those people to be counterfactually more effective in future).
but I’m not sure calling the situation “morally depraved” is a good strategy
“Total depravity” was a central tenet of a pretty successful coordination network known for maintaining a higher than usual level of personal integrity. This is not just a weird coincidence. One has to be able to describe what’s going on in order to coordinate to do better. Any description of this thing is going to register as an act as well as a factual description, but that’s not a strong reason to avoid a clear noneuphemistic handle for this sort of thing.
I think I’m more skeptical than you are that it’s possible to do much better (i.e., build functional information-processing institutions) before the world changes a lot for other reasons (e.g., superintelligent AIs are invented)
Where do you think the superintelligent AIs will come from? AFAICT it doesn’t make sense to put more than 20% on AGI before massive international institutional collapse, even being fairly charitable to both AGI projects and prospective longevity of current institutions.
Huh, I notice I’ve not explicitly estimated my timeline distribution for massive international institutional collapse, and that I want to do that. Do you have any links to places where others/you have thought about it?
I enjoyed the story and I agree with most of the points you’re making, but I’m not sure calling the situation “morally depraved” is a good strategy. (I’d probably call it “suboptimal” or “dire” depending how strongly I wanted to express my feelings.) I think I’m more skeptical than you are that it’s possible to do much better (i.e., build functional information-processing institutions) before the world changes a lot for other reasons (e.g., superintelligent AIs are invented), so I’d optimize more for not making enemies or alienating people than for making people realize how bad the situation is or joining your cause.
On a practical level, do you think there are any charities existing today that are doing substantially better than others with regard to these problems?
Agreed that calling things morally depraved is a limited strategy and has downsides. Part of the dialogue was meant to explore the limitations of moral outrage. Perhaps a better reframing is to say that much of the economy is trapped by one or more basilisks (in the sense of Roko’s basilisk); the blame goes on the basilisk rather than on the people trapped in it.
I think we do disagree on how important it is to make people realize how bad the situation is. In particular, I am not sure how you expect x-risk research/strategy to accomplish its objectives without having at least one functional information-processing institution! If solving AI safety (or somehow preventing AI risk without solving AI safety directly) requires solving hard philosophical problems, comparably hard to or harder than what have been solved in the past, then that requires people to optimize for something other than optics. Moreover, it almost certainly requires a network of communicating people, such that the network is processing information about hard philosophical problems.
I do think there are institutions that process information well enough to do pretty-hard technical things (such as SpaceX), and some relevant questions are (a) what are they doing right, and (b) how can they be improved on.
It could be that AGI will not be developed for a very long time, due to how dysfunctional current institutions are. That actually seems somewhat likely [EDIT: to be more precise, dysfunctional institutions creating AGI is more of an out-of-model tail risk than a mainline event, I understated this for some reason]. But, that would imply that there’s a lot of time to build functional institutions.
Regarding charities: I think people should often be willing to fund things that they can actually check on and see the results of. For example, if you can evaluate open source software, then that allows you to selectively donate to projects that produce good software. You can also use recommendations by people you trust, who themselves evaluate things. (Unfortunately, if you’re one of the only people actually checking, then there might not be strong UDT reasons to donate at all)
However, if something is hard to evaluate, then going based only on what you see results in optimization for optics. The “generalized MIRI donor strategy” would be to, when you notice a problem that you are ill-equipped to solve yourself, other people have produced some output you can evaluate (so you believe they are competent), and it seems like they are well-motivated to solve the problem (or otherwise do useful things), increase these people’s slack so they can optimize for something other than optics. I think this is one of the better strategies for donors. It could suggest giving either to a charity itself or the individuals in it, depending on which increases slack the most. (On the object level, I think MIRI has produced useful output, and that increasing their employees’ slack is a good thing, though obviously I haven’t evaluated other charities nearly as much as I have evaluated MIRI, so combine your information state with mine, and also only trust me as much as makes sense given what you know about me).
I guess it’s a disjunction of various low-probability scenarios like:
Metaphilosophy turns out to be easy.
Another AI alignment approach turns out to be easy (i.e., does not require solving hard philosophical problems)
Existing institutions turn out to be more effective than they appear.
Some kind of historical fluke or close call with x-risk causes world leaders to get together and cooperate to stop AI development.
In my mind “building much better information-processing institutions turns out to be easier than I expect” is comparable in likelihood to these scenarios so I don’t feel like pushing really hard on that at a high cost to myself personally or to my contributions to these other scenarios. But if others want to do it I guess I’m willing to help if I can do so at relatively low cost.
It’s not clear to me that whatever institutional innovations SpaceX came up with (if any, i.e., if their success isn’t due to other factors) to be more successful than their competitors on solving technical problems would transfer to the kinds of problems you described for the charity sector. Until I see some analysis showing that, it seems a priori unlikely to me.
Don’t you think it takes institutions that are more functional to build aligned AGI than to build unaligned AGI? I have a lot of uncertainty about all this but I expect that we’re currently near or over the threshold for unaligned AGI but well under the threshold for aligned AGI. (It seems totally plausible that by pushing for more functional institutions you end up pushing them over the threshold for unaligned AGI but still under the threshold for aligned AGI, but that might be a risk worth taking.)
Almost nobody can do this for AI alignment, and it’s really costly even for the people that can. For example I personally don’t understand or can’t motivate myself to learn a lot of things that get posted to AF. Seriously, what percentage of current donors for AI alignment can do this?
How do you suggest people to find trustworthy advisors to help them evaluate things, if they themselves are not experts in the field? Wouldn’t they have to rely on the “optics” of the advisors? Is this likely to be substantially better than relying on the optics of charities?
(I’m not convinced that UDT applies to humans, so if I was donating, it would be to satisfy some faction of my moral parliament.)
Why is this called “generalized MIRI donor strategy”?
This seems really hard for almost anyone to evaluate, both the technical side (see above), and the motivational side. And wouldn’t this cause a lot of people to optimize for producing output that typical donors could evaluate (rather than what’s actually important), and for producing potentially wasteful signals of motivation?
Can you be more explicit about what you’re proposing? What exactly should one do to give these people more slack? Give unconditional cash gifts? I imagine that cash gifts wouldn’t be all that helpful unless it’s over the threshold where they can just quit their job and work on whatever they want, otherwise they’d still have to try to keep their jobs which means working on whatever their employers tell them to, which means optimizing for optics (since that’s what their employers need them to do to keep the donations coming in).
If I followed this strategy, how would I know that I was funding (or doing the best that I can to fund) the most cost-effective work (i.e., that there isn’t more cost-effective work that I’m not funding because I can’t evaluate that work myself and I don’t know anyone I trust who can evaluate that work)? How would I know that I wasn’t creating incentives that are even worse than what’s typical today (namely to produce what people like me can evaluate and to produce signals of well-motivation)? How do I figure out who I can trust to help me evaluate individuals and charities? Do you think there’s a level of evaluative ability in oneself and one’s trusted advisors below which it would be better to outsource the evaluation to something like BERI, FLI, or OPP instead? If so, how do I figure out whether I’m above or below that threshold?
Instead of replying to me point by point, feel free to write up your thoughts more systematically in another post.
My sense is that building functional institutions is something that has been done in history multiple times, and that the other things you list are extreme tail scenarios. For some context, I recommend reading Samo’s blog; I think my views on this are pretty similar to his.
Perhaps the answer is that some problems (including AI alignment) are not best solved by charities. It seems like something that works at a well-funded company (that has long time horizons, like SpaceX) would probably also work for a well-funded nonprofit (since at this point the difference is mostly nominal), though probably it’s easier to get a company this well-funded than a nonprofit.
Most likely, yes. The question of whether we’re above or below the threshold for unaligned AGI seems like a judgment call. Based on my models (such as this one), the chance of AGI “by default” in the next 50 years is less than 15%, since the current rate of progress is not higher than the average rate since 1945, and if anything is lower (the insights model linked has a bias towards listing recent insights).
Agreed, maybe people who can’t do this should just give discretion over their AI donations to their more-technical friends, or just not donate in the AI space in the first place.
I’m calling it this because it seems like the right generalization of the best reason to donate to MIRI.
I am thinking of either unconditional cash gifts or donations to their employer. Some charities will spend less on short-term signalling when they have more savings. You could talk to the people involved to get a sense of what their situation is, and what would be most helpful to them. If there’s a non-concavity here (where 10x the money is over 10x as good as 1x the money), I jokingly suggest gambling, but I am actually suspicious of this particular non-concavity, since having incrementally more savings means you can last incrementally longer after quitting your job.
You don’t, so you might have to lower your expectations. Given the situation, all the options are pretty suboptimal.
You don’t, but you can look at what your models about incentives say constitutes better/worse incentives. This is an educated guess, and being risk averse about it doesn’t make any sense in the x-risk space (since the idea of being risk averse about risk reduction is incoherent).
(This also answers some of the other points about figuring out who to trust)
Talk to people, see who tends to leave you with better rather than worse ideas. See who has a coherent worldview that makes correct predictions. See who seems curious. Make your own models and use them as consistency checks. Work on projects with people and see how they go. Have friend networks that you can ask about things. This is a pretty big topic.
Some of these things would fall under optics, insofar as they are an incomplete evaluation that can be gamed. So you have to use multiple sources of information in addition to priors, and even then you will make mistakes. This is not avoidable, as far as I can tell. There’s something of a red queen race, where signals of good thinking eventually get imitated, so you can’t just use a fixed classifier.
Perhaps random people on the street would be better off deferring to one of these organizations, though it’s not clear why they would defer to one of these in particular (as opposed to, say, the Gates Foundation or their university). I am not actually sure how you end up with the list BERI/FLI/OPP; the main commonality is that they give out grants to other charities, but that doesn’t necessarily put them in a better epistemic position than, say, MIRI or FHI, or individual researchers (consider that, when you give grants, there is pressure to deceive you in particular). In general the question of who to outsource the evaluation to is not substantially easier than the question of which people you should give money to, and could be considered a special case (where the evaluator itself is considered as a charity).
Both this comment and my other comment are way understating our beliefs about AGI. After talking to Jessica about it offline to clarify our real beliefs rather than just playing games with plausible deniability, my actual probability is between 0.5 and 1% in the next 50 years. Jessica can confirm that hers is pretty similar, but probably weighted towards 1%.
People are trying to build more functional institutions, or at least unintentionally experimenting with different institutional designs, all the time, so the fact that building functional institutions is something that has been done in history multiple times doesn’t imply that any particular attempt to build a particular kind of functional institutions has a high or moderate chance of success.
Can you recommend some specific articles?
Are there any articles about what is different about SpaceX as far as institutional design? I think I tried to find some earlier but couldn’t. In my current state of knowledge I don’t see much reason to think that SpaceX’s success is best explained by it having made a lot of advances in institutional design. If it actually did create a lot of advances in institutional design, and they can be applied to a nonprofit working on AI alignment, wouldn’t it be high priority to write down what those advances are and how they can be applied to the nonprofit, so other people can critique those ideas?
Let’s reorganize the rest of our discussion by splitting your charity suggestion into two parts which we can consider separately:
Use your own or your friends’ evaluations of technical work instead of outsourcing evaluations to bigger / more distant / more formal organizations.
Give awards/tenure instead of grants.
(Is this a fair summary/decomposition of your suggestion?) I think aside from my previous objections which I’m still not sure about, perhaps my true rejection to 1 combined with 2 is that I don’t want to risk feeling duped and/or embarrassed if I help fund someone’s tenure and it turns out that I or my friends misjudged their technical abilities or motivation (i.e., they end up producing low quality work or suffer a large decrease in productivity). Yeah from an altruistic perspective I should be risk neutral but I don’t think I can override the social/face-saving part of my brain on this. To avoid this I could fund an organization instead of an individual but in that case it’s likely that most of the money would go towards expansion rather than increasing slack. I guess another problem is that if I start thinking about what AI alignment work to fund I get depressed thinking about the low chance of success of any particular approach, and it seems a lot easier to just donate to an evaluator charity and make it their problem.
It seems to me that 2 by itself is perhaps more promising and worth a try. In other words we could maybe convince a charity to reallocate some money from grants to awards/tenure and check if that improves outcomes, or create a new charity for this purpose. Is that something you’d support?
Agree, but this is a general reason why doing things is hard. Lots of people are working on philosophy too. I think my chance of success is way higher than the base rate, to the point where anchoring on the base rate does not make sense, but I can understand people not believing me on this. (Arguments of this form might fall under modest epistemology.)
On the Loss and Preservation of Knowledge
Live versus Dead Players
Functional Institutions are the Exception
Great Founder Theory
If one organization is much more effective than comparable ones, and it wasn’t totally by accident, then there are causal reasons for the difference in effectiveness. Even if it isn’t the formal structure of the organization, it could be properties of the people who seeded the organization, properties of the organization’s mission, etc. I am taking a somewhat broad view of “institutional design” that would include these things too. I am not actually saying anything less trivial than “some orgs work much better than other orgs and understanding why is important so this can be replicated and built upon”.
Yes, the general project of examining and documenting working institutions is high-priority. There will probably be multiple difficulties in directly applying SpaceX’s model (e.g. the fact that SpaceX is more engineering than research), so documenting multiple institutions would help. I am otherwise occupied right now, though.
Yes, this would be my suggestion for donors.
I appreciate the introspection you’ve done on this, this is useful information. I think there’s a general issue where the AI alignment problem is really hard, so most people try to push the responsibility somewhere else, and those that do take responsibility usually end up feeling like they are highly constrained into doing the “responsible” thing (e.g. using defensible bureaucratic systems rather than their intuition), which is often at odds with the straightforward just-solve-it mental motion (which is used in, for example, playing video games), or curiosity in general (e.g. mathematical or scientific). I’ve experienced this personally. I recommend Ben’s post on this dynamic. This is part of why I exited the charity sector and don’t identify as an EA anymore. I don’t know how to fix this other than by taking the whole idea of non-local responsibility (i.e. beyond things like “I am responsible for driving safely and paying my bills on time”) less seriously, so I kind of do that.
Yes, both awards and tenure seem like improvements, and in any case well worth experimenting with.
SpaceX might be more competitive than its competitors not because it’s particularly functional (i.e., compared to typical firms in other fields) but because its competitors are particularly non-functional.
If it is particularly effective but not due to the formal structure of the organization, then those informal things are likely to be hard to copy to another organization.
SpaceX can be very successful by having just marginally better institutional design than its competitors because they’re all trying to do the same things. A successful AI alignment organization however would have to be much more effective than organizations that are trying to build unaligned AI (or nominally trying to build aligned AI but cutting lots of corners to win the race, or following some AI alignment approach that seems easy but is actually fatally flawed).
I don’t think that playing video games, math, and science are good models here because those all involve relatively fast feedback cycles which make it easy to build up good intuitions. It seems reasonable to not trust one’s intuitions in AI alignment, and the desire to appear defensible also seems understandable and hard to eliminate, but perhaps we can come up with better “defensible bureaucratic systems” than what exists today, i.e., systems that can still appear defensible but make better decisions than they currently do. I wonder if this problem has been addressed by anyone.
Ok, I made the suggestion to BERI since it seems like they might be open to this kind of thing.
ETA: Another consideration against individuals directly funding other individuals is that it wouldn’t be tax deductible. This could reduce the funding by up to 40%. If the funding is done through a tax-exempt non-profit, then IRS probably has some requirements about having formal procedures for deciding who/what to fund.
(Just got around to reading this. As a point of reference, it seems that at least Open Phil seems to have decided that tax-deductability is not more important than being able to give to things freely, which is why the Open Philanthropy Project is an LLC. I think this is at least slight evidence towards that tradeoff being worth it.)
There’s an enormous difference between having millions of dollars of operating expenditures in an LLC (so that an org is legally allowed to do things like investigate non-deductible activities like investment or politics), and giving up the ability to make billions of dollars of tax-deductible donations. Open Philanthropy being an LLC (so that its own expenses aren’t tax-deductible, but it has LLC freedom) doesn’t stop Good Ventures from making all relevant donations tax-deductible, and indeed the overwhelming majority of grants on its grants page are deductible.
Yep, sorry. I didn’t mean to imply that all of Open Phil’s funding is non-deductible, just that they decided that it was likely enough that they would find non-deductible opportunities that they went through the effort of restructuring their org to do so (and also gave up a bunch of other benefits like the ability to sponsor visas efficiently). My comment wasn’t very clear on that.
Here’s Open Phil’s blog post on why they decided to operate as an LLC. After reading it, I think their reasons are not very relevant to funding AI alignment research. (Mainly they want the freedom to recommend donations to non-501(c)(3) organizations like political groups.)
I am also pretty interested in 2 (ex-post giving). In 2015, there was impactpurchase.org. I got in contact with them about it, and the major updates Paul reported were a) being willing to buy partial contributions (not just for people who were claiming full responsibility for things) and b) more focused on what’s being funded (like for example, only asking for people to submit claims on blog posts and articles).
I realise that things like impactpurchase is possibly framed in terms of a slightly divergent reason for 2 (it seems more focused on changing the incentive landscape, whereas the posts above include thinking about whether giving slack to people with track records will lead those people to be counterfactually more effective in future).
“Total depravity” was a central tenet of a pretty successful coordination network known for maintaining a higher than usual level of personal integrity. This is not just a weird coincidence. One has to be able to describe what’s going on in order to coordinate to do better. Any description of this thing is going to register as an act as well as a factual description, but that’s not a strong reason to avoid a clear noneuphemistic handle for this sort of thing.
Where do you think the superintelligent AIs will come from? AFAICT it doesn’t make sense to put more than 20% on AGI before massive international institutional collapse, even being fairly charitable to both AGI projects and prospective longevity of current institutions.
Huh, I notice I’ve not explicitly estimated my timeline distribution for massive international institutional collapse, and that I want to do that. Do you have any links to places where others/you have thought about it?
Why isn’t this a fully general argument for never rocking the boat?
You quoted the conclusion, not the argument. The argument is based on skepticism that rocking the boat will do much good.