I agree. I’m generally okay with the order (oracles do seem marginally safer than agents, for example, and more restrictions should generally be safer than less), but also think the marginal amount of additional safety doesn’t matter much when you consider the total absolute risk. Just to make up some numbers, I think of it like choosing between options that are 99.6%, 99.7%, 99.8%, and 99.9% likely to result in disaster. I mean of course I’ll pick the one with a 0.4% chance of success, but I’d much rather do something radically different that is orders of magnitude safer.
Yeah, so I guess opinions on this would differ depending on how likely people think existential risk from AGI is. Personally, it’s clear to me that agentic misaligned superintelligences are bad news—but I’m much less persuaded by descriptions of how long-term maximising behaviour arises in something like an oracle. The prospect of an AGI that’s much more intelligent than humans and much less agentic seems quite plausible—even, perhaps, in a RL agent.
I agree. I’m generally okay with the order (oracles do seem marginally safer than agents, for example, and more restrictions should generally be safer than less), but also think the marginal amount of additional safety doesn’t matter much when you consider the total absolute risk. Just to make up some numbers, I think of it like choosing between options that are 99.6%, 99.7%, 99.8%, and 99.9% likely to result in disaster. I mean of course I’ll pick the one with a 0.4% chance of success, but I’d much rather do something radically different that is orders of magnitude safer.
Yeah, so I guess opinions on this would differ depending on how likely people think existential risk from AGI is. Personally, it’s clear to me that agentic misaligned superintelligences are bad news—but I’m much less persuaded by descriptions of how long-term maximising behaviour arises in something like an oracle. The prospect of an AGI that’s much more intelligent than humans and much less agentic seems quite plausible—even, perhaps, in a RL agent.