It seems like there are some intrinisic connections between the clusters of concepts known as “EA”, “LW-style rationality”, and “HRAD research”; is this a worrying sign?
Specifically, it seems like the core premise of EA relies largely on a good understanding of the world, in a systemic and explicit manner (beause existing heuristics aren’t selected for “maximizing altruism”[1]), linking closely to LW, which tries to answer the same question. At the same time, my understanding of HRAD research is that it aims to elucidate a framework for how consequentialist agents “ought to reason” in theory, so the consequentialist reasoning of the first highly capable AI systems is legible to humans. Understanding how an idealized agent “ought to reason” or “ought to make decisions” seems highly relevant to the project of improving human rationality (which is then relevant to the EA project).
Now, imagine a world where HRAD is not a great use of resources (e.g. because AI risk is not a legitimate concern, because underlying philosophical assumptions are wrong, because the marginal tractability of alternate safety approaches is much higher, etc). Would the basic connections between ideas in last paragraph still hold? I’m worried that they would, leading any community with goals similar to EA to be biased towards HRAD research for reasons unrelated to the underlying state of the world.
Is this a legitimate concern? What else has been written on this issue?
[1] To expand on this a bit: LW-style rationality often underperforms accumulated heuristics, experience, and domain knowledge in established fields, and probably does best in new fields where quantification is valuable, with high uncertainty, low societal incentives to get a correct answer, dissimilarity to ancestral enviroments, high propensity to cognitive biases/emotional responses. I think almost all of these descriptors are true for the EA movement.
The intrinsic connection is primarily that they arose out of the same broad community, and there is heavy overlap between personnel as a consequence.
I say this is not a worrying sign, because the comparison isn’t between their shared methods and some better memeplex, it is between their shared methods and the status quo. That is to say, there’s no reason to believe something else would be happening in place of this; more likely everyone would have scattered throughout what was already happening.
It’s very important to distinguish those etceteras you listed, because those are three different worlds. In the world where AI risk is in fact low, HRAD can still be very successful in mitigating it further and also fruitful in thinking about similar risks. In the world where the underlying philosophical assumptions are wrong, demonstrating that wrongness is valuable in-and-of itself to the greater safety project. In the world where alternate safety approaches have higher tractability, how would we even tell without comparison to the challenges encountered in HRAD?
HRAD is also the product of specific investigations into tractability and philosophical soundness. I expect they will iterate on these very questions again in the future. If it winds up a dead end I expect the associated communities to notice and then to shift focus elsewhere.
The intrinsic connection is primarily that they arose out of the same broad community, and there is heavy overlap between personnel as a consequence.
I disagree with this though! I think anyone that wants to think along EA lines is inevitably going to want to investigate how to improve epistemic rationality, which naturally leads to thinking about decision making for idealized agents. Having community overlap is one thing, but the ideas seem so closely related that EA can’t develop in any possible world without being biased towards HRAD research.
It’s very important to distinguish those etceteras you listed
I mean surely there would be some worlds in which HRAD research was not the most valuable use of (some portion of*) EA money; it doesn’t really matter whether the specific examples I gave work, just that EA would be unable to distinguish worlds where HRAD is an optimal use of resources from the world where it is not.
I expect the associated communities to notice and then to shift focus elsewhere.
But why? Is it not at all concerning that aliens with no knowledge of Earth or humanity could plausibly guess that a movement dedicated to a maximizing, impartial, welfarist conception of the good would also be intrinsically attracted to learning about idealized reasoning procedures? The link between them is completely unconnected to the object-level question “is HRAD research the best use of [some] EA money?”, or even to the specifics of how the LW/EA communities formed around specific personalities in this world.
Is it not at all concerning that aliens with no knowledge of Earth or humanity could plausibly guess that a movement dedicated to a maximizing, impartial, welfarist conception of the good would also be intrinsically attracted to learning about idealized reasoning procedures?
This is not at all concerning. If we are concerned about this then we should also be concerned that aliens could plausibly guess a movement dedicated to space exploration would be intrinsically attracted to learning about idealized dynamical procedures. It seems to me this is just a prior that groups with a goal investigate instrumentally useful things.
My model of your model so far is this: because the EA community is interested in LessWrong, and because LessWrong facilitated the group that work on HRAD research, the EA community will move their practices closer to implications of this research even in the case where it is wrong. Is that accurate?
My expectation is that EAs will give low weight to the details of HRAD research, even in the case where it is a successful program. The biggest factor is the timelines: HRAD research is in service of the long term goal of reasoning correctly about AGI; EA is about doing as much good as possible, as soon as possible. The iconic feature of the EA movement is the giving pledge, which is largely predicated on the idea that money given now is more impactful than money given later. There is a lot of discussion about alternatives and different practices, for example the donor’s dilemma and mission hedging, but these are operational concerns rather than theoretical/idealized ones.
Even if I assume HRAD is a productive line of research, I strongly expect that the path to changing EA practice leads from some surprising result, evaluated all the way up to the level of employment and investment decisions. This means the result would need to be surprising, then it would need to withstand scrutiny, then it would need to lead to conclusions big enough to shift activity like donations, employment, and investments, cost of change included and all. I would be deeply shocked if this happened, and then further shocked if it had a broad enough impact to change the course of EA as a group.
It seems like there are some intrinisic connections between the clusters of concepts known as “EA”, “LW-style rationality”, and “HRAD research”; is this a worrying sign?
Specifically, it seems like the core premise of EA relies largely on a good understanding of the world, in a systemic and explicit manner (beause existing heuristics aren’t selected for “maximizing altruism”[1]), linking closely to LW, which tries to answer the same question. At the same time, my understanding of HRAD research is that it aims to elucidate a framework for how consequentialist agents “ought to reason” in theory, so the consequentialist reasoning of the first highly capable AI systems is legible to humans. Understanding how an idealized agent “ought to reason” or “ought to make decisions” seems highly relevant to the project of improving human rationality (which is then relevant to the EA project).
Now, imagine a world where HRAD is not a great use of resources (e.g. because AI risk is not a legitimate concern, because underlying philosophical assumptions are wrong, because the marginal tractability of alternate safety approaches is much higher, etc). Would the basic connections between ideas in last paragraph still hold? I’m worried that they would, leading any community with goals similar to EA to be biased towards HRAD research for reasons unrelated to the underlying state of the world.
Is this a legitimate concern? What else has been written on this issue?
[1] To expand on this a bit: LW-style rationality often underperforms accumulated heuristics, experience, and domain knowledge in established fields, and probably does best in new fields where quantification is valuable, with high uncertainty, low societal incentives to get a correct answer, dissimilarity to ancestral enviroments, high propensity to cognitive biases/emotional responses. I think almost all of these descriptors are true for the EA movement.
The intrinsic connection is primarily that they arose out of the same broad community, and there is heavy overlap between personnel as a consequence.
I say this is not a worrying sign, because the comparison isn’t between their shared methods and some better memeplex, it is between their shared methods and the status quo. That is to say, there’s no reason to believe something else would be happening in place of this; more likely everyone would have scattered throughout what was already happening.
It’s very important to distinguish those etceteras you listed, because those are three different worlds. In the world where AI risk is in fact low, HRAD can still be very successful in mitigating it further and also fruitful in thinking about similar risks. In the world where the underlying philosophical assumptions are wrong, demonstrating that wrongness is valuable in-and-of itself to the greater safety project. In the world where alternate safety approaches have higher tractability, how would we even tell without comparison to the challenges encountered in HRAD?
HRAD is also the product of specific investigations into tractability and philosophical soundness. I expect they will iterate on these very questions again in the future. If it winds up a dead end I expect the associated communities to notice and then to shift focus elsewhere.
To sum up, we have noticed the skulls. Hail, humanity! We who are about to die salute you.
I disagree with this though! I think anyone that wants to think along EA lines is inevitably going to want to investigate how to improve epistemic rationality, which naturally leads to thinking about decision making for idealized agents. Having community overlap is one thing, but the ideas seem so closely related that EA can’t develop in any possible world without being biased towards HRAD research.
I mean surely there would be some worlds in which HRAD research was not the most valuable use of (some portion of*) EA money; it doesn’t really matter whether the specific examples I gave work, just that EA would be unable to distinguish worlds where HRAD is an optimal use of resources from the world where it is not.
But why? Is it not at all concerning that aliens with no knowledge of Earth or humanity could plausibly guess that a movement dedicated to a maximizing, impartial, welfarist conception of the good would also be intrinsically attracted to learning about idealized reasoning procedures? The link between them is completely unconnected to the object-level question “is HRAD research the best use of [some] EA money?”, or even to the specifics of how the LW/EA communities formed around specific personalities in this world.
I don’t understand the source of your concern.
This is not at all concerning. If we are concerned about this then we should also be concerned that aliens could plausibly guess a movement dedicated to space exploration would be intrinsically attracted to learning about idealized dynamical procedures. It seems to me this is just a prior that groups with a goal investigate instrumentally useful things.
My model of your model so far is this: because the EA community is interested in LessWrong, and because LessWrong facilitated the group that work on HRAD research, the EA community will move their practices closer to implications of this research even in the case where it is wrong. Is that accurate?
My expectation is that EAs will give low weight to the details of HRAD research, even in the case where it is a successful program. The biggest factor is the timelines: HRAD research is in service of the long term goal of reasoning correctly about AGI; EA is about doing as much good as possible, as soon as possible. The iconic feature of the EA movement is the giving pledge, which is largely predicated on the idea that money given now is more impactful than money given later. There is a lot of discussion about alternatives and different practices, for example the donor’s dilemma and mission hedging, but these are operational concerns rather than theoretical/idealized ones.
Even if I assume HRAD is a productive line of research, I strongly expect that the path to changing EA practice leads from some surprising result, evaluated all the way up to the level of employment and investment decisions. This means the result would need to be surprising, then it would need to withstand scrutiny, then it would need to lead to conclusions big enough to shift activity like donations, employment, and investments, cost of change included and all. I would be deeply shocked if this happened, and then further shocked if it had a broad enough impact to change the course of EA as a group.