Can we assume, that since I’ve been working all this time on AI safety, that I’m not an idiot? When presenting a scenario (“assume AI contained, and truthful”) I’m investigating whether we have safety within the terms of that scenario. Which here we don’t, so we can reject attempts aimed at that scenario without looking further. If/when we find a safe way to do that within the scenario, then we can investigate whether that scenario is achievable in the first place.
Ah. Then here’s the difference in assumptions: I don’t believe a contained, truthful UFAI is safe in the first place. I just have an incredibly low prior on that. So low, in fact, that I didn’t think anyone would take it seriously enough to imagine scenarios which prove it’s unsafe, because it’s just so bloody obvious that you do not build UFAI for any reason, because it will go wrong in some way you didn’t plan for.
See the point on Paul Christiano’s design. The problem I discussed applies not only to UFAIs but to other designs that seek to get round it, but use potentially unrestricted search.
Can we assume, that since I’ve been working all this time on AI safety, that I’m not an idiot? When presenting a scenario (“assume AI contained, and truthful”) I’m investigating whether we have safety within the terms of that scenario. Which here we don’t, so we can reject attempts aimed at that scenario without looking further. If/when we find a safe way to do that within the scenario, then we can investigate whether that scenario is achievable in the first place.
Ah. Then here’s the difference in assumptions: I don’t believe a contained, truthful UFAI is safe in the first place. I just have an incredibly low prior on that. So low, in fact, that I didn’t think anyone would take it seriously enough to imagine scenarios which prove it’s unsafe, because it’s just so bloody obvious that you do not build UFAI for any reason, because it will go wrong in some way you didn’t plan for.
See the point on Paul Christiano’s design. The problem I discussed applies not only to UFAIs but to other designs that seek to get round it, but use potentially unrestricted search.