You can think of it as “dangerous capabilities in everyone’s hands”, but I prefer to think of it as “everyone in the world can work on alignment in a hands-on way, and millions of people are exposed to the problem in a much more intuitive and real way than we ever foresaw”.
Ordinary people without PhDs are learning what capabilities and limitations LLMs have. They are learning what capabilities you can and cannot trust an LLM with. They are coming up with creative jailbreaks we never thought of. And they’re doing so with toy models that don’t have superhuman powers of reasoning, and don’t pose X-risks.
It was always hubris to think only a small sect of people in the SF bay area could be trusted with the reins to AI. I’ve never been one to bet against human ingenuity, and I’m not about to bet against them now that I’ve seen the open source community use LLaMa to blaze past every tech company.
“Everyone in the world can work on alignment in a hands-on way” only helps us if they actually do choose to work on alignment, instead of capabilities and further accelerating the problem. Right now, people seem to be heavily focusing on the latter and not the former, and I expect it to continue that way.
I would like to turn that around and suggest that what is hubristic is a small, technical segment of the population imposing these risks on the rest of the population without them ever having had any say in it.
Except “aligned” AI (or at least corrugibility) benefits folks who are doing even shady things (say trying to scam people)
So any gains in those areas that are easily implemented will be widely spread and quickly.
And altruistic individuals already use their own compute and GPU for things like seti@home (if youre old enough to remember) and to protein folding projects for medical research. Those same people will become aware of AI safety and do the same and maybe more.
The cats out of the bag , you can’t “regulate” AI use at home , I can run models on a smartphone.
What we can do is try and steer things toward a beneficial nash equilibrium.
I guess where we differ is I don’t believe we get an equilibrium.
I believe that the offense-defense balance will strongly favour the attacker, such that one malicious actor could cause vast damage. One illustrative example: it doesn’t matter how many security holes you patch, an attacker only needs one to compromise your system. A second: it is much easier to cause a pandemic than to defend against one.
You can think of it as “dangerous capabilities in everyone’s hands”, but I prefer to think of it as “everyone in the world can work on alignment in a hands-on way, and millions of people are exposed to the problem in a much more intuitive and real way than we ever foresaw”.
Ordinary people without PhDs are learning what capabilities and limitations LLMs have. They are learning what capabilities you can and cannot trust an LLM with. They are coming up with creative jailbreaks we never thought of. And they’re doing so with toy models that don’t have superhuman powers of reasoning, and don’t pose X-risks.
It was always hubris to think only a small sect of people in the SF bay area could be trusted with the reins to AI. I’ve never been one to bet against human ingenuity, and I’m not about to bet against them now that I’ve seen the open source community use LLaMa to blaze past every tech company.
“Everyone in the world can work on alignment in a hands-on way” only helps us if they actually do choose to work on alignment, instead of capabilities and further accelerating the problem. Right now, people seem to be heavily focusing on the latter and not the former, and I expect it to continue that way.
I would like to turn that around and suggest that what is hubristic is a small, technical segment of the population imposing these risks on the rest of the population without them ever having had any say in it.
Except “aligned” AI (or at least corrugibility) benefits folks who are doing even shady things (say trying to scam people)
So any gains in those areas that are easily implemented will be widely spread and quickly.
And altruistic individuals already use their own compute and GPU for things like seti@home (if youre old enough to remember) and to protein folding projects for medical research. Those same people will become aware of AI safety and do the same and maybe more.
The cats out of the bag , you can’t “regulate” AI use at home , I can run models on a smartphone.
What we can do is try and steer things toward a beneficial nash equilibrium.
I guess where we differ is I don’t believe we get an equilibrium.
I believe that the offense-defense balance will strongly favour the attacker, such that one malicious actor could cause vast damage. One illustrative example: it doesn’t matter how many security holes you patch, an attacker only needs one to compromise your system. A second: it is much easier to cause a pandemic than to defend against one.