Very useful post and discussion! Let’s ignore the issue that someone in capabilities research might be underestimating the risk and assume they have appropriately assessed the risk. Let’s also simplify to two outcomes of bliss expanding in our lightcone and extinction (no value). Let’s also assume that very low values of risk are possible but we have to wait a long time. It would be very interesting to me to hear how different people (maybe with a poll) would want the probability of extinction to be below before activating the AGI. Below are my super rough guesses:
1x10^-10: strong longtermist
1x10^-5: weak longtermist
1x10^-2 = 1%: average person (values a few centuries?)
1x10^-1 = 10%: person affecting: currently alive people will get to live indefinitely if successful
30%: selfish researcher
90%: fame/power loving older selfish researcher
I was surprised that my estimate was not more different for a selfish person. With climate change, if an altruistic person affecting individual thinks the carbon tax should be $100 per ton carbon, a selfish person should act as if the carbon tax is about 10 billion times lower, so ten orders of magnitude different versus ~one order for AGI. So it appears that AGI is a different case in that the risk is more internalized to the actors. Most of the variance for AGI appears to be from how longtermist one is vs whether one is selfish or altruistic.
Very useful post and discussion! Let’s ignore the issue that someone in capabilities research might be underestimating the risk and assume they have appropriately assessed the risk. Let’s also simplify to two outcomes of bliss expanding in our lightcone and extinction (no value). Let’s also assume that very low values of risk are possible but we have to wait a long time. It would be very interesting to me to hear how different people (maybe with a poll) would want the probability of extinction to be below before activating the AGI. Below are my super rough guesses:
1x10^-10: strong longtermist
1x10^-5: weak longtermist
1x10^-2 = 1%: average person (values a few centuries?)
1x10^-1 = 10%: person affecting: currently alive people will get to live indefinitely if successful
30%: selfish researcher
90%: fame/power loving older selfish researcher
I was surprised that my estimate was not more different for a selfish person. With climate change, if an altruistic person affecting individual thinks the carbon tax should be $100 per ton carbon, a selfish person should act as if the carbon tax is about 10 billion times lower, so ten orders of magnitude different versus ~one order for AGI. So it appears that AGI is a different case in that the risk is more internalized to the actors. Most of the variance for AGI appears to be from how longtermist one is vs whether one is selfish or altruistic.