Though the world this points at is pretty scary (a powerful AI system ready to go, only held back by the implementors buying safety concerns), the intervention does seem cheap and good.
I wonder whether 1 will be easy. I think it relies on the first AI systems being made by one of a small selection of easily-identifiable orgs
Though the world this points at is pretty scary (a powerful AI system ready to go, only held back by the implementors buying safety concerns), the intervention does seem cheap and good.
By scary, do you mean (or mean to imply) unlikely?
I think that if AI happens soon (<10 years) it’ll likely happen at an org we already know about, so 1 is feasible. If AI doesn’t happen soon, all bets are off and 1 will be very difficult.
By scary, do you mean (or mean to imply) unlikely?
No. Sorry, I suspect starting with “Though” was confusing. I think I meant ‘this seems like one of the harder worlds to get a win in, but given that world, this seems like a good intervention’.
I think I have an intuition where (a) we may only win if we stop things getting as bad as this situation and (b) extra expected utility is mostly cheaply purchased by plans that condition on worlds that are not this bad.
I dunno whether that’s true though. I haven’t thought about it a bunch.
Interesting. I’d love to hear more about the sorts of worlds conditioned on in your (b). For my part, the worlds I described in the original post seem both the most likely and also not completely hopeless—maybe with a month of extra effort we can actually come up with a solution, or else a convincing argument that we need another month, etc. Or maybe we already have a mostly-working solution by the time The Talk happens and with another month we can iron out the bugs.
Thanks, that was an illuminating answer. I feel like those three worlds are decently likely, but that if those worlds occur purchasing additional expected utility in them will be hard, precisely because things will be so much easier. For example, if safety concerns are part of mainstream AI research, then safety research won’t be neglected anymore.
You can purchase additional EU by pumping up their probability as well EDIT: I know I originally said to condition on these worlds, but I guess that’s not what I actually do. Instead, I think I condition on not-doomed worlds
Ah, that sounds much better to me. Yeah, maybe the cheapest EU lies in trying to make these worlds more likely. I doubt we have much control over which paradigms overtake ML, and I think that the intervention I’m proposing might help make the first and second kinds of world more likely (because maybe with a month of extra time to analyze their system, the relevant people will become convinced that the problem is real)
Though the world this points at is pretty scary (a powerful AI system ready to go, only held back by the implementors buying safety concerns), the intervention does seem cheap and good.
I wonder whether 1 will be easy. I think it relies on the first AI systems being made by one of a small selection of easily-identifiable orgs
By scary, do you mean (or mean to imply) unlikely?
I think that if AI happens soon (<10 years) it’ll likely happen at an org we already know about, so 1 is feasible. If AI doesn’t happen soon, all bets are off and 1 will be very difficult.
No. Sorry, I suspect starting with “Though” was confusing. I think I meant ‘this seems like one of the harder worlds to get a win in, but given that world, this seems like a good intervention’.
I think I have an intuition where (a) we may only win if we stop things getting as bad as this situation and (b) extra expected utility is mostly cheaply purchased by plans that condition on worlds that are not this bad.
I dunno whether that’s true though. I haven’t thought about it a bunch.
Interesting. I’d love to hear more about the sorts of worlds conditioned on in your (b). For my part, the worlds I described in the original post seem both the most likely and also not completely hopeless—maybe with a month of extra effort we can actually come up with a solution, or else a convincing argument that we need another month, etc. Or maybe we already have a mostly-working solution by the time The Talk happens and with another month we can iron out the bugs.
I just wanted to say that this is a good question, but I’m not sure I know the answer yet.
Worlds that appear most often in my musings (but I’m not sure they’re likely enough to count) are:
an aligned group getting a decisive strategic advantage
safety concerns being clearly demonstrated and part of mainstream AI research
Perhaps general reasoning about agents and intelligence improves, and we can apply these techniques to AI designs
Perhaps things contiguous with alignment concerns cause failures in capable AI systems early on
A more alignable paradigm overtaking ML
This seems like a fantasy
Could be because ML gets bottlenecked or a different approach makes rapid progress
Thanks, that was an illuminating answer. I feel like those three worlds are decently likely, but that if those worlds occur purchasing additional expected utility in them will be hard, precisely because things will be so much easier. For example, if safety concerns are part of mainstream AI research, then safety research won’t be neglected anymore.
You can purchase additional EU by pumping up their probability as well EDIT: I know I originally said to condition on these worlds, but I guess that’s not what I actually do. Instead, I think I condition on not-doomed worlds
Ah, that sounds much better to me. Yeah, maybe the cheapest EU lies in trying to make these worlds more likely. I doubt we have much control over which paradigms overtake ML, and I think that the intervention I’m proposing might help make the first and second kinds of world more likely (because maybe with a month of extra time to analyze their system, the relevant people will become convinced that the problem is real)