I find myself someone confused by s-risks as defined here; it’s easy to generate clearly typical cases that very few would want, and hard to figure out where the boundaries are, and thus hard to figure out how much I should imagine this motivation impacting the research.
That is, consider the “1950s sci-fi prediction,” where a slightly-more-competent version of humanity manages to colonize lots of different planets in ways that make them sort of duplicates of Earth. This seems like it would count as an s-risk if each planet has comparable levels of suffering to modern Earth and there are vastly more such planets. While this feels to me like “much worse than is possible,” I’m not yet sold it’s below the “ok” bar in the maxipok sense, but also it wouldn’t seem too outlandish to think it’s below that bar (depending on how bad you think life on Earth is now).
Do you think focusing on s-risks leads to meaningfully different technical goals than focusing on other considerations? I don’t get that sense from the six headings, but I can imagine how it might add different constraints or different focus for some of them. For example, on the point of AI strategy and governance, it seems easiest to encourage cooperation when there are no external forces potentially removing participants from a coalition, but adding in particular ethical views possibly excludes people who could have been included. You might imagine, say, a carnivorous TAI developer who wants factory farming to make it to the stars.
This isn’t necessarily a point against this view, according to me; it definitely is the case that focusing on alignment at all implies having some sort of ethical view or goal you want to implement, and it may be the case that being upfront about those goals simplifies or directs the technical work, as opposed to saying “we’ll let future-us figure out what the moral goals are, first let’s figure out how to implement any goals at all.” But it does make me interested in how much disagreement you think there is on the desirability of future outcomes, weighted by their likelihood or something, between people primarily motivated by continued existence of human civilization and people primarily motivated by avoiding filling the universe with suffering or whatever other categories you think are worth considering.
Do you think focusing on s-risks leads to meaningfully different technical goals than focusing on other considerations?
I think it definitely leads to a difference in prioritization among the things one could study under the broad heading of AI safety. Hopefully this will be clear in the body of the agenda. And, some considerations around possible downsides of certain alignment work might be more salient to those focused on s-risk; the possibility that attempts at alignment with human values could lead to very bad “near misses” is an example. (I think some other EAF researchers have more developed views on this than myself.) But, in this document and my own current research I’ve tried to choose directions that are especially important from the s-risk perspective but which are also valuable by the lights of non-s-risk-focused folks working in the area.
[Just speaking for myself here]
I find myself someone confused by s-risks as defined here
For what it’s worth, EAF is currently deliberating about this definition and it might change soon.
We are now using a new definition of s-risks. I’ve edited this post to reflect the change.
New definition:
S-risks are risks of events that bring about suffering in cosmically significant amounts. By “significant”, we mean significant relative to expected future suffering.
Note that it may turn out that the amount of suffering that we can influence is dwarfed by suffering that we can’t influence. By “expectation of suffering in the future” we mean “expectation of action-relevant suffering in the future”.
I find myself someone confused by s-risks as defined here; it’s easy to generate clearly typical cases that very few would want, and hard to figure out where the boundaries are, and thus hard to figure out how much I should imagine this motivation impacting the research.
That is, consider the “1950s sci-fi prediction,” where a slightly-more-competent version of humanity manages to colonize lots of different planets in ways that make them sort of duplicates of Earth. This seems like it would count as an s-risk if each planet has comparable levels of suffering to modern Earth and there are vastly more such planets. While this feels to me like “much worse than is possible,” I’m not yet sold it’s below the “ok” bar in the maxipok sense, but also it wouldn’t seem too outlandish to think it’s below that bar (depending on how bad you think life on Earth is now).
Do you think focusing on s-risks leads to meaningfully different technical goals than focusing on other considerations? I don’t get that sense from the six headings, but I can imagine how it might add different constraints or different focus for some of them. For example, on the point of AI strategy and governance, it seems easiest to encourage cooperation when there are no external forces potentially removing participants from a coalition, but adding in particular ethical views possibly excludes people who could have been included. You might imagine, say, a carnivorous TAI developer who wants factory farming to make it to the stars.
This isn’t necessarily a point against this view, according to me; it definitely is the case that focusing on alignment at all implies having some sort of ethical view or goal you want to implement, and it may be the case that being upfront about those goals simplifies or directs the technical work, as opposed to saying “we’ll let future-us figure out what the moral goals are, first let’s figure out how to implement any goals at all.” But it does make me interested in how much disagreement you think there is on the desirability of future outcomes, weighted by their likelihood or something, between people primarily motivated by continued existence of human civilization and people primarily motivated by avoiding filling the universe with suffering or whatever other categories you think are worth considering.
I think it definitely leads to a difference in prioritization among the things one could study under the broad heading of AI safety. Hopefully this will be clear in the body of the agenda. And, some considerations around possible downsides of certain alignment work might be more salient to those focused on s-risk; the possibility that attempts at alignment with human values could lead to very bad “near misses” is an example. (I think some other EAF researchers have more developed views on this than myself.) But, in this document and my own current research I’ve tried to choose directions that are especially important from the s-risk perspective but which are also valuable by the lights of non-s-risk-focused folks working in the area.
[Just speaking for myself here]
For what it’s worth, EAF is currently deliberating about this definition and it might change soon.
Thanks, that helps!
Cool; if your deliberations include examples, it might be useful to include them if you end up writing an explanation somewhere.
We are now using a new definition of s-risks. I’ve edited this post to reflect the change.
New definition:
S-risks are risks of events that bring about suffering in cosmically significant amounts. By “significant”, we mean significant relative to expected future suffering.
Note that it may turn out that the amount of suffering that we can influence is dwarfed by suffering that we can’t influence. By “expectation of suffering in the future” we mean “expectation of action-relevant suffering in the future”.