Just thinking through simple stuff for myself, very rough, posting in the spirit of quick takes
At present, we are making progress on the Technical Alignment Problem[2] and like probably could solve it within 50 years.
Humanity is on track to build ~lethal superpowerful AI in more like 5-15 years.
Working on technical alignment (direct or meta) only matters if we can speed up overall progress by 10x (or some lesser factor if AI capabilities is delayed from its current trajectory). Improvements of 2x are not likely to get us to an adequate technical solution in time.
Working on slowing things down is only helpful if it results in delays of decades.
Shorter delays are good in so far as they give you time to buy further delays.
There is technical research that is useful for persuading people to slow down (and maybe also solving alignment, maybe not). This includes anything that demonstrates scary capabilities or harmful proclivities, e.g. a bunch of mech interp stuff, all the evals stuff.
AI is in fact super powerful and people who perceive there being value to be had aren’t entirely wrong[3]. This results in a very strong motivation to pursue AI and resist efforts to be stopped
These motivations apply to both businesses and governments.
People are also developing stances on AI along ideological, political, and tribal lines, e.g. being anti-regulation. This generates strong motivations for AI topics even separate from immediate power/value to be gained.
Efforts to agentically slow down the development of AI capabilities are going to be matched by agentic efforts to resist those efforts and push in the opposite direction.
Efforts to convince people that we ought to slow down will be matched by people arguing that we must speed up.
Efforts to regulate will be matched by efforts to block regulation. There will be efforts to repeal or circumvent any passed regulation.
If there are chip controls or whatever, there will be efforts to get around that. If there are international agreements, there will be efforts to clandestinely hide.
If there are successful limitations on compute, people will compensate and focus on algorithmic progress.
Many people are going to be extremely resistant to being swayed on topics of AI, no matter what evidence is coming in. Much rationalization will be furnished to justify proceeding no matter the warning signs.
By and large, our civilization has a pretty low standard of reasoning.
In other words, there’s going to be an epistemic war and the other side is going to fight dirty[5], I think even a lot of clear evidence will have a hard time against people’s motivations/incentives and bad arguments.
When there are two strongly motivated sides, seems likely we end up in a compromise state, e.g. regulation passes but it’s not the regulation originally designed that even in its original form was only maybe actually enough.
It’s unclear to me whether “compromise regulation” will be adequate. Or that any regulation adequate to cost people billions in anticipated profit will conclude with them giving up.
Further Thoughts
People aren’t thinking or talking enough about nationalization.
I think it’s interesting because I expect that a lot of regulation about what you can and can’t do stops being enforceable once the development is happening in the context of the government performing it.
What I Feel Motivated To Work On
Thinking through the above, I feel less motivated to work on things that feel like they’ll only speed up technical alignment problem research by amounts < 5x. In contrast, maybe there’s more promise in:
Cyborgism or AI-assisted research that gets up 5x speedups but applies differentially to technical alignment research
Things that convince people that we need to radically slow down
good writing
getting in front of people
technical demonstrations
research that shows the danger
why the whole paradigm isn’t safe
evidence of deception, etc.
Development of good (enforceable) “if-then” policy that will actually result in people stopping in response to various triggers, and not just result in rationalization for why actually it’s okay to continue (ignore signs) or just a bandaid solution
Figuring out how to overcome people’s rationalization
Developing robust policy stuff that’s set up to withstand lots of optimization pressure to overcome it
Things that cut through the bad arguments of people who wish to say there’s no risk and discredit the concerns
Stuff that prevents national arms races / gets into national agreements
By “slowing down”, I mean all activities and goals which are about preventing people from building lethal superpowerful AI, be it via getting them to stop, getting to go slower because they’re being more cautious, limiting what resources they can use, setting up conditions for stopping, etc.
Also some people arguing for AI slowdown will fight dirty too, eroding trust in AI slowdown people, because some people think that when the stakes are high you just have to do anything to win, and are bad at consequentialist reasoning.
“Cyborgism or AI-assisted research that gets up 5x speedups but applies differentially to technical alignment research”
How do you do you make meaningful progress and ensure it does not speed up capabilities?
It seems unlikely that a technique exists that is exclusively useful for alignment research and can’t be tweaked to help OpenMind develop better optimization algorithms etc.
People who want to speed up AI will use falsehoods and bad logic to muddy the waters, and many people won’t be able to see through it
In other words, there’s going to be an epistemic war and the other side is going to fight dirty, I think even a lot of clear evidence will have a hard time against people’s motivations/incentives and bad arguments.
But I’d be more pessimistic than that, in that I honestly think pretty much every side will fight quite dirty in order to gain power over AI, and we already have seen examples of straight up lies and bad faith.
From the anti-regulation side, I remember Martin Casado straight up lying about mechanistic interpretability rendering AI models completely understood and white box, and I’m very sure that mechanistic interpretability cannot do what Martin Casado claimed.
I also remembered a16z lying a lot about SB1047.
From the pro-regulation side, I remembered Zvi incorrectly claiming that Sakana AI did instrumental convergence/recursive self-improvement, and as it turned out, the reality was far more mundane than that:
Zvi then misrepresented what Apollo actually did, and attempted to claim that o1 was actually deceptively aligned/lying, when it did a capability eval to see if it was capable of lying/deceptively aligned, and straight up lied in claiming that this was proof of Yudkowsky’s proposed AI alignment problems being here, and inevitable, which is taken down in 2 comments:
Overall, this has made me update in pretty negative directions concerning the epistemics of every side.
There’s a core of people who have reasonable epistemics IMO on every side, but they are outnumbered and lack the force of those that don’t have good epistemics.
The reason I can remain optimistic despite it is that I believe we are progressing faster than that:
At present, we are making progress on the Technical Alignment Problem[2] and like probably could solve it within 50 years.
I think that thankfully, I think we could probably solve it in 5-10 years, primarily because I believe 0 remaining insights are necessary to align AI, and the work that needs to be done is in making large datasets about human values, because AIs are deeply affected by what their data sources are, and thus whoever controls the dataset controls the values of the AI.
Though I am working on technical alignment (and perhaps because I know it is hard) I think the most promising route may be to increase human and institutional rationality and coordination ability. This may be more tractable than “expected” with modern theory and tools.
Also, I don’t think we are on track to solve technical alignment in 50 years without intelligence augmentation in some form, at least not to the point where we could get it right on a “first critical try” if such a thing occurs. I am not even sure there is a simple and rigorous technical solution that looks like something I actually want, though there is probably a decent engineering solution out there somewhere.
I think this can be true, but I don’t think it needs to be true:
“I expect that a lot of regulation about what you can and can’t do stops being enforceable once the development is happening in the context of the government performing it.”
I suspect that if the government is running the at-all-costs-top-national-priority Project, you will see some regulations stop being enforceable. However, we also live in a world where you can easily find many instances of government officials complaining in their memoirs that laws and regulations prevented them from being able to go as fast or as freely as they’d want on top-priority national security issues. (For example, DoD officials even after 9-11 famously complained that “the lawyers” restricted them too much on top-priority counterterrorism stuff.)
Yes, this is a good point. We need a more granular model than a binary ‘all the same laws will apply to high priority national defense projects as apply to tech companies’ versus ‘no laws at all will apply’.
Interestingly, Terence Tao has recently started thinking about AI, and his (publicly stated) opinions on it are … very conservative?
I find he mostly focuses on the capabilities that are already here and doesn’t really extrapolate from it in any significant way.
Really? He seems pretty bullish. He thinks it will co author math papers pretty soon. I think he just doesn’t think or at least state his thoughts on implications outside of math.
“1. This is great work, shifting once again our expectations of which benchmark challenges are within reach of either #AI-assisted or fully autonomous methods”
Money helps. I could probably buy a lot of dignity points for a billion dollars. With a trillion variance definitely goes up because you could try crazy stuff and could backfire. (I mean true for a billion too). But EV of such a world is better.
I don’t think there’s anything that’s as simple as writing a check though.
US Congress gives money to specific things. I do not have a specific plan for a trillion dollars.
I’d bet against Terrance Tao being some kind of amazing breakthrough researcher who changes the playing field.
My answer (and I think Ruby’s) answer to most of these questions is “no”, but What Money Cannot Buy reasons, as well as “geniuses don’t often actually generalize and are hard to motivate with money.”
I really like the observation in your Further Thoughts point. I do think that is a problem people need to look at as I would guess many will view the government involvement from a acting in public interests view rather than acting in either self interest (as problematic as that migh be when the players keep changing) or from a special interest/public choice perspective.
Probably some great historical analysis already written about events in the past that might serve as indicators of the pros and cons here. Any historians in the group here?
Just thinking through simple stuff for myself, very rough, posting in the spirit of quick takes
At present, we are making progress on the Technical Alignment Problem[2] and like probably could solve it within 50 years.
Humanity is on track to build ~lethal superpowerful AI in more like 5-15 years.
Working on technical alignment (direct or meta) only matters if we can speed up overall progress by 10x (or some lesser factor if AI capabilities is delayed from its current trajectory). Improvements of 2x are not likely to get us to an adequate technical solution in time.
Working on slowing things down is only helpful if it results in delays of decades.
Shorter delays are good in so far as they give you time to buy further delays.
There is technical research that is useful for persuading people to slow down (and maybe also solving alignment, maybe not). This includes anything that demonstrates scary capabilities or harmful proclivities, e.g. a bunch of mech interp stuff, all the evals stuff.
AI is in fact super powerful and people who perceive there being value to be had aren’t entirely wrong[3]. This results in a very strong motivation to pursue AI and resist efforts to be stopped
These motivations apply to both businesses and governments.
People are also developing stances on AI along ideological, political, and tribal lines, e.g. being anti-regulation. This generates strong motivations for AI topics even separate from immediate power/value to be gained.
Efforts to agentically slow down the development of AI capabilities are going to be matched by agentic efforts to resist those efforts and push in the opposite direction.
Efforts to convince people that we ought to slow down will be matched by people arguing that we must speed up.
Efforts to regulate will be matched by efforts to block regulation. There will be efforts to repeal or circumvent any passed regulation.
If there are chip controls or whatever, there will be efforts to get around that. If there are international agreements, there will be efforts to clandestinely hide.
If there are successful limitations on compute, people will compensate and focus on algorithmic progress.
Many people are going to be extremely resistant to being swayed on topics of AI, no matter what evidence is coming in. Much rationalization will be furnished to justify proceeding no matter the warning signs.
By and large, our civilization has a pretty low standard of reasoning.
People who want to speed up AI will use falsehoods and bad logic to muddy the waters, and many people won’t be able to see through it[4]. No matter the evals or other warning signs, there will be people arguing it can be fixed without too much trouble and we must proceed.
In other words, there’s going to be an epistemic war and the other side is going to fight dirty[5], I think even a lot of clear evidence will have a hard time against people’s motivations/incentives and bad arguments.
When there are two strongly motivated sides, seems likely we end up in a compromise state, e.g. regulation passes but it’s not the regulation originally designed that even in its original form was only maybe actually enough.
It’s unclear to me whether “compromise regulation” will be adequate. Or that any regulation adequate to cost people billions in anticipated profit will conclude with them giving up.
Further Thoughts
People aren’t thinking or talking enough about nationalization.
I think it’s interesting because I expect that a lot of regulation about what you can and can’t do stops being enforceable once the development is happening in the context of the government performing it.
What I Feel Motivated To Work On
Thinking through the above, I feel less motivated to work on things that feel like they’ll only speed up technical alignment problem research by amounts < 5x. In contrast, maybe there’s more promise in:
Cyborgism or AI-assisted research that gets up 5x speedups but applies differentially to technical alignment research
Things that convince people that we need to radically slow down
good writing
getting in front of people
technical demonstrations
research that shows the danger
why the whole paradigm isn’t safe
evidence of deception, etc.
Development of good (enforceable) “if-then” policy that will actually result in people stopping in response to various triggers, and not just result in rationalization for why actually it’s okay to continue (ignore signs) or just a bandaid solution
Figuring out how to overcome people’s rationalization
Developing robust policy stuff that’s set up to withstand lots of optimization pressure to overcome it
Things that cut through the bad arguments of people who wish to say there’s no risk and discredit the concerns
Stuff that prevents national arms races / gets into national agreements
Thinking about how to get 30 year slowdowns
By “slowing down”, I mean all activities and goals which are about preventing people from building lethal superpowerful AI, be it via getting them to stop, getting to go slower because they’re being more cautious, limiting what resources they can use, setting up conditions for stopping, etc.
How to build a superpowerful AI that does what we want.
They’re wrong about their ability to safely harness the power, but not if you could harness, you’d have a lot of very valuable stuff.
My understanding is a lot of falsehoods were used to argue against SB1047 by e.g. a16z
Also some people arguing for AI slowdown will fight dirty too, eroding trust in AI slowdown people, because some people think that when the stakes are high you just have to do anything to win, and are bad at consequentialist reasoning.
“Cyborgism or AI-assisted research that gets up 5x speedups but applies differentially to technical alignment research”
How do you do you make meaningful progress and ensure it does not speed up capabilities?
It seems unlikely that a technique exists that is exclusively useful for alignment research and can’t be tweaked to help OpenMind develop better optimization algorithms etc.
I basically agree with this:
But I’d be more pessimistic than that, in that I honestly think pretty much every side will fight quite dirty in order to gain power over AI, and we already have seen examples of straight up lies and bad faith.
From the anti-regulation side, I remember Martin Casado straight up lying about mechanistic interpretability rendering AI models completely understood and white box, and I’m very sure that mechanistic interpretability cannot do what Martin Casado claimed.
I also remembered a16z lying a lot about SB1047.
From the pro-regulation side, I remembered Zvi incorrectly claiming that Sakana AI did instrumental convergence/recursive self-improvement, and as it turned out, the reality was far more mundane than that:
https://www.lesswrong.com/posts/ppafWk6YCeXYr4XpH/danger-ai-scientist-danger#AtXXgsws5DuP6Jxzx
Zvi then misrepresented what Apollo actually did, and attempted to claim that o1 was actually deceptively aligned/lying, when it did a capability eval to see if it was capable of lying/deceptively aligned, and straight up lied in claiming that this was proof of Yudkowsky’s proposed AI alignment problems being here, and inevitable, which is taken down in 2 comments:
https://www.lesswrong.com/posts/zuaaqjsN6BucbGhf5/gpt-o1#YRF9mcTFN2Zhne8Le
https://www.lesswrong.com/posts/zuaaqjsN6BucbGhf5/gpt-o1#AWXuFxjTkH2hASXPx
Overall, this has made me update in pretty negative directions concerning the epistemics of every side.
There’s a core of people who have reasonable epistemics IMO on every side, but they are outnumbered and lack the force of those that don’t have good epistemics.
The reason I can remain optimistic despite it is that I believe we are progressing faster than that:
I think that thankfully, I think we could probably solve it in 5-10 years, primarily because I believe 0 remaining insights are necessary to align AI, and the work that needs to be done is in making large datasets about human values, because AIs are deeply affected by what their data sources are, and thus whoever controls the dataset controls the values of the AI.
Though I am working on technical alignment (and perhaps because I know it is hard) I think the most promising route may be to increase human and institutional rationality and coordination ability. This may be more tractable than “expected” with modern theory and tools.
Also, I don’t think we are on track to solve technical alignment in 50 years without intelligence augmentation in some form, at least not to the point where we could get it right on a “first critical try” if such a thing occurs. I am not even sure there is a simple and rigorous technical solution that looks like something I actually want, though there is probably a decent engineering solution out there somewhere.
I think this can be true, but I don’t think it needs to be true:
I suspect that if the government is running the at-all-costs-top-national-priority Project, you will see some regulations stop being enforceable. However, we also live in a world where you can easily find many instances of government officials complaining in their memoirs that laws and regulations prevented them from being able to go as fast or as freely as they’d want on top-priority national security issues. (For example, DoD officials even after 9-11 famously complained that “the lawyers” restricted them too much on top-priority counterterrorism stuff.)
Yes, this is a good point. We need a more granular model than a binary ‘all the same laws will apply to high priority national defense projects as apply to tech companies’ versus ‘no laws at all will apply’.
I have a few questions.
Can you save the world in time without a slowdown in AI development if you had a billion dollars?
Can you do it with a trillion dollars?
If so, why aren’t you trying to ask the US Congress for a trillion dollars?
If it’s about a lack of talent, do you think Terrance Tao can make significant progress on AI alignment if he actually tried?
Do you think he would be willing to work on AI alignment if you offered him a trillion dollars?
Interestingly, Terence Tao has recently started thinking about AI, and his (publicly stated) opinions on it are … very conservative? I find he mostly focuses on the capabilities that are already here and doesn’t really extrapolate from it in any significant way.
Really? He seems pretty bullish. He thinks it will co author math papers pretty soon. I think he just doesn’t think or at least state his thoughts on implications outside of math.
He’s clearly not completely discounting that there’s progress, but overall it doesn’t feel like he’s “updating all the way”:
This is a recent post about the deepmind math olympiad results: https://mathstodon.xyz/@tao/112850716240504978
“1. This is great work, shifting once again our expectations of which benchmark challenges are within reach of either #AI-assisted or fully autonomous methods”
Money helps. I could probably buy a lot of dignity points for a billion dollars. With a trillion variance definitely goes up because you could try crazy stuff and could backfire. (I mean true for a billion too). But EV of such a world is better.
I don’t think there’s anything that’s as simple as writing a check though.
US Congress gives money to specific things. I do not have a specific plan for a trillion dollars.
I’d bet against Terrance Tao being some kind of amazing breakthrough researcher who changes the playing field.
My answer (and I think Ruby’s) answer to most of these questions is “no”, but What Money Cannot Buy reasons, as well as “geniuses don’t often actually generalize and are hard to motivate with money.”
I really like the observation in your Further Thoughts point. I do think that is a problem people need to look at as I would guess many will view the government involvement from a acting in public interests view rather than acting in either self interest (as problematic as that migh be when the players keep changing) or from a special interest/public choice perspective.
Probably some great historical analysis already written about events in the past that might serve as indicators of the pros and cons here. Any historians in the group here?
Not an original observation but yeah, separate from whether it’s desirable, I think we need to be planning for it.