When it is all over, we will either have succeed or failed. (The pay-off set is strongly bimodal.)
The magnitude of the pay-off is irrelevant to the optimal strategy. Suppose research program X has a 1% chance of FAI in 10 years, a 1% chance of UFAI in 10 years, and 98% chance of nothing. Is it a good option? That depends on our P(FAI | no AI in 10 years). If FAI will probably arive in 15 years, X is bad. UFAI in 15 years and X may be good. Endorse only those research programms such that you think P(FAI | that research program makes AI) > P(FAI | no one else makes AI before the research program has a chance to). Actually, this assumes unlimited research talent.
Avoiding avenues with small chances of UFAI corresponds to optimism about the outcome.
I think that it should be fairly clear whether an AI project would produce UFAI before it is run. P(friendly) <5% or >80% usually. Probability on serious assesment by competent researchers. So say that future Miri can tell if any given design is friendly in a few months. If some serious safety people study the design and become confident that its FAI then run it. If they think its UFAI, they won’t run it. If someone with limited understanding and lack of knowledge of their ignorance manages to build an AI, its UFAI, and they don’t know that, so they run it. I am assuming that there are people who don’t realise the problem is hard, those that know they can’t solve it, and those who can solve it, in order of increasing expertise. Most of the people reading this will be in the middle category. (not counting 10^30 posthuman historians ;-) People in the middle category won’t build any ASI. Those in the first category will usually produce nothing, but might produce a UFAI, those in the third might produce a FAI.
I’m unclear if you are disagreeing with something or not, but to me your comment reads largely as saying you think there’s a lot of probability mass that can be assigned before we reach the frontier and that this is what you think is most important for reasoning about the risks associated with attempts to build human-aligned AI.
I agree that we can learn a lot before we reach the frontier, but I also think that most of the time we should be thinking as if we are already along the frontier and not much expect the sudden development of resolutions to questions that would let us get more of everything. For example, to return to one of my examples, we shouldn’t expect to suddenly learn info that would let us make Pareto improvements to our assumptions about moral facts given how long this question has been studied, so we should instead mostly be concerned with marginal trade offs about the assumptions we make under uncertainty.
When it is all over, we will either have succeed or failed. (The pay-off set is strongly bimodal.)
The magnitude of the pay-off is irrelevant to the optimal strategy. Suppose research program X has a 1% chance of FAI in 10 years, a 1% chance of UFAI in 10 years, and 98% chance of nothing. Is it a good option? That depends on our P(FAI | no AI in 10 years). If FAI will probably arive in 15 years, X is bad. UFAI in 15 years and X may be good. Endorse only those research programms such that you think P(FAI | that research program makes AI) > P(FAI | no one else makes AI before the research program has a chance to). Actually, this assumes unlimited research talent.
Avoiding avenues with small chances of UFAI corresponds to optimism about the outcome.
I think that it should be fairly clear whether an AI project would produce UFAI before it is run. P(friendly) <5% or >80% usually. Probability on serious assesment by competent researchers. So say that future Miri can tell if any given design is friendly in a few months. If some serious safety people study the design and become confident that its FAI then run it. If they think its UFAI, they won’t run it. If someone with limited understanding and lack of knowledge of their ignorance manages to build an AI, its UFAI, and they don’t know that, so they run it. I am assuming that there are people who don’t realise the problem is hard, those that know they can’t solve it, and those who can solve it, in order of increasing expertise. Most of the people reading this will be in the middle category. (not counting 10^30 posthuman historians ;-) People in the middle category won’t build any ASI. Those in the first category will usually produce nothing, but might produce a UFAI, those in the third might produce a FAI.
I’m unclear if you are disagreeing with something or not, but to me your comment reads largely as saying you think there’s a lot of probability mass that can be assigned before we reach the frontier and that this is what you think is most important for reasoning about the risks associated with attempts to build human-aligned AI.
I agree that we can learn a lot before we reach the frontier, but I also think that most of the time we should be thinking as if we are already along the frontier and not much expect the sudden development of resolutions to questions that would let us get more of everything. For example, to return to one of my examples, we shouldn’t expect to suddenly learn info that would let us make Pareto improvements to our assumptions about moral facts given how long this question has been studied, so we should instead mostly be concerned with marginal trade offs about the assumptions we make under uncertainty.