It is critical for guaranteed safe AI and many non-prosaic alignment agendas. I agree it has risks, since all AI capabilities and advances pose control risks, but it seems better than most types of general capabilities investments.
Do you have a more specific model of why it might be negative?
Plausibly, yes. But so does programming capability, which is actually a bigger deal. (And it’s unclear that a traditionally envisioned intelligence explosion is possible with systems built on LLMs, though I’m certainly not convinced by that argument.)
I think the “guaranteed safe AI” framework is just super speculative. Enough to basically not matter as an argument given any other salient points.
This leaves us with the baseline, which is that this kind of prize re-directs potentially a lot of brainpower from more math-adjacent people towards thinking about AI capabilities. Even worse, I expect it’s mostly going to attract the un-reflective “full-steam-ahead” type of people.
Mostly, I’m not sure it matters at all except maybe slightly accelerating some inevitable development before e.g. deep mind takes another shot at it to finish things off.
It is speculative in the sense that any new technology being developed is speculative—but closely related approaches are already used for assurance in practice, so provable safety isn’t actually just speculative, there are concrete benefits in the near term. And I would challenge you to name a different and less speculative framework that actually deals with any issues of ASI risks that isn’t pure hopium.
Uncharitably, but I think not entirely inaccurately, these include: “maybe AI can’t be that much smarter than humans anyways,” “let’s get everyone to stop forever,” “we’ll use AI to figure it out, even though we have no real ideas,” “we just will trust that no-one makes it agentic,” “the agents will be able to be supervised by other AI which will magically be easier to align,” “maybe multiple AIs will compete in ways that isn’t a disaster,” “maybe we can just rely on prosaic approaches forever and nothing bad happens,” “maybe it will be better than humans at having massive amounts of unchecked power by default.” These all certainly seem to rely far more on speculative claims, with far less concrete ideas about how to validate or ensure them.
I’m not saying that it’s not worth pursuing as an agenda, but I also am not convinced it is promising enough to justify pursuing math related AI capabilities, compared to e.g. creating safety guarantees into which you can plug in AI capabilities once they arise anyway.
But “creating safety guarantees into which you can plug in AI capabilities once they arise anyway” is the point, and it requires at least some non-trivial advances in AI capabilities.
It is critical for guaranteed safe AI and many non-prosaic alignment agendas. I agree it has risks, since all AI capabilities and advances pose control risks, but it seems better than most types of general capabilities investments.
Do you have a more specific model of why it might be negative?
Well, does this improve automated ML research and kick off an intelligence explosion sooner?
Plausibly, yes. But so does programming capability, which is actually a bigger deal. (And it’s unclear that a traditionally envisioned intelligence explosion is possible with systems built on LLMs, though I’m certainly not convinced by that argument.)
I think the “guaranteed safe AI” framework is just super speculative. Enough to basically not matter as an argument given any other salient points.
This leaves us with the baseline, which is that this kind of prize re-directs potentially a lot of brainpower from more math-adjacent people towards thinking about AI capabilities. Even worse, I expect it’s mostly going to attract the un-reflective “full-steam-ahead” type of people.
Mostly, I’m not sure it matters at all except maybe slightly accelerating some inevitable development before e.g. deep mind takes another shot at it to finish things off.
It is speculative in the sense that any new technology being developed is speculative—but closely related approaches are already used for assurance in practice, so provable safety isn’t actually just speculative, there are concrete benefits in the near term. And I would challenge you to name a different and less speculative framework that actually deals with any issues of ASI risks that isn’t pure hopium.
Uncharitably, but I think not entirely inaccurately, these include: “maybe AI can’t be that much smarter than humans anyways,” “let’s get everyone to stop forever,” “we’ll use AI to figure it out, even though we have no real ideas,” “we just will trust that no-one makes it agentic,” “the agents will be able to be supervised by other AI which will magically be easier to align,” “maybe multiple AIs will compete in ways that isn’t a disaster,” “maybe we can just rely on prosaic approaches forever and nothing bad happens,” “maybe it will be better than humans at having massive amounts of unchecked power by default.” These all certainly seem to rely far more on speculative claims, with far less concrete ideas about how to validate or ensure them.
I’m not saying that it’s not worth pursuing as an agenda, but I also am not convinced it is promising enough to justify pursuing math related AI capabilities, compared to e.g. creating safety guarantees into which you can plug in AI capabilities once they arise anyway.
But “creating safety guarantees into which you can plug in AI capabilities once they arise anyway” is the point, and it requires at least some non-trivial advances in AI capabilities.
You should probably read the current programme thesis.