I agree much of the community (including me) was wrong or directionally wrong in the past about the level of AI regulation and how quickly it would come.
Regarding the recommendations made in the post for going forward given that there will be some regulation, I feel confused in a few ways.
Can you provide examples of interventions that meet your bar for not being done by default? It’s hard to understand the takeaways from your post because the negative examples are made much more concrete than the proposed positive ones
You argue that we perhaps shouldn’t invest as much in preventing deceptive alignment because “regulators will likely adapt, adjusting policy as the difficulty of the problem becomes clearer”
If we are assuming that regulators will adapt and adjust regarding deception, can you provide examples of interventions that policymakers will not be able to solve themselves and why they will be less likely to notice and deal with them than deception?
You say “we should question how plausible it is that society will fail to adequately address such an integral part of the problem”. What things aren’t integral parts of the problem but that should be worked on?
I feel we would need much better evidence of things being handled competently to invest significantly less into integral parts of the problem.
You say: ‘Of course, it may still be true that AI deception is an extremely hard problem that reliably resists almost all attempted solutions in any “normal” regulatory regime, even as concrete evidence continues to accumulate about its difficulty—although I consider that claim unproven, to say the least’
If we expect some problems in AI risk to be solved by default mostly by people outside the community, it feels to me like one takeaway would be that we should shift resources to portions of the problem that we expect to be the hardest
To me, intuitively, deceptive alignment might be one of the hardest parts of the problem as we scale to very superhuman systems, even if we condition on having time to build model organisms of misalignment and experiment with them for a few years. So I feel confused about why you claim a high level of difficulty is “unproven” as a dismissal; of course it’s unproven but you would need to argue that in worlds where the AI risk problem is fairly hard, there’s not much of a chance of it being very hard.
As someone who is relatively optimistic about concrete evidence of deceptive alignment increasing substantially before a potential takeover, I think I still put significantly lower probability on it than you do due to the possibility of fairly fast takeoff.
I feel like this post is to some extent counting our chickens before they hatch (tbc I agree with the directional update as I said above). I’m not an expert on what’s going on here but I imagine any of the following happening (non-exhaustive list) that make the current path to potentially sensible regulation in the US and internationally harder:
The EO doesn’t lead to as many resources dedicated to AI-x-risk-reducing things as we might hope. I haven’t read it myself, just the fact sheet and Zvi’s summary but Zvi says “If you were hoping for or worried about potential direct or more substantive action, then the opposite applies – there is very little here in the way of concrete action, only the foundation for potential future action.”
A Republican President comes in power in the US and reverses a lot of the effects in the EO
Rishi Sunak gets voted out in the UK (my sense is that this is likely) and the new Prime Minister is much less gung-ho on AI risk
I don’t have strong views on the value of AI advocacy, but this post seems overconfident in calling it out as being basically not useful based on recent shifts.
It seems likely that much stronger regulations will be important, e.g. the model reporting threshold in the EO was set relatively high and many in the AI risk community have voiced support for an international pause if it were politically feasible, which the EO is far from.
Can you provide examples of interventions that meet your bar for not being done by default? It’s hard to understand the takeaways from your post because the negative examples are made much more concrete than the proposed positive ones
I have three things to say here:
Several months ago I proposed general, long-term value drift as a problem that I think will be hard to solve by default. I currently think that value drift is a “hard bit” of the problem that we do not appear to be close to seriously addressing, perhaps because people expect easier problems won’t be solved either without heroic effort. I’m also sympathetic to Dan Hendrycks’ arguments about AI evolution. I will add these points to the post.
I mostly think people should think harder about what the hard parts of AI risk are in the first place. It would not be surprising if the “hard bits” will be things that we’ve barely thought about, or are hard to perceive as major problems, since their relative hiddenness would be a strong reason to believe that they will not be solved by default.
The problem of “make sure policies are well-targeted, informed by the best evidence, and mindful of social/political difficulties” seems like a hard problem that societies have frequently failed to get right historically, and the relative value of solving this problem seems to get higher as you become more optimistic about the technical problems being solved.
I feel like this post is to some extent counting our chickens before they hatch (tbc I agree with the directional update as I said above). [...] I don’t have strong views on the value of AI advocacy, but this post seems overconfident in calling it out as being basically not useful based on recent shifts.
I want to emphasize that the current policies were crafted in an environment in which AI still has a tiny impact on the world. My expectation is that policies will get much stricter as AI becomes a larger part of our life. I am not making the claim that current policies are sufficient; instead I am making a claim about the trajectory, i.e. how well we should expect society to respond at a time, given the evidence and level of AI capabilities at that time. I believe that current evidence supports my interpretation of our general trajectory, but I’m happy to hear someone explain why they disagree and highlight concrete predictions that could serve to operationalize this disagreement.
Several months ago I proposed general, long-term value drift as a problem that I think will be hard to solve by default. I currently think that value drift is a “hard bit” of the problem that we do not appear to be close to seriously addressing, perhaps because people expect easier problems won’t be solved either without heroic effort. I’m also sympathetic to Dan Hendrycks’ arguments about AI evolution. I will add these points to the post.
Don’t have a strong opinion here, but intuitively feels like it would be hard to find tractable angles for work on this now.
I mostly think people should think harder about what the hard parts of AI risk are in the first place. It would not be surprising if the “hard bits” will be things that we’ve barely thought about, or are hard to perceive as major problems, since their relative hiddenness would be a strong reason to believe that they will not be solved by default.
Maybe. In general, I’m excited about people who have the talent for it to think about previously neglected angles.
The problem of “make sure policies are well-targeted, informed by the best evidence, and mindful of social/political difficulties” seems like a hard problem that societies have frequently failed to get right historically, and the relative value of solving this problem seems to get higher as you become more optimistic about the technical problems being solved.
I agree this is important and it was in your post but it seems like a decent description of what the majority of AI x-risk governance people are already working on, or at least not obviously a bad one. This is the phrase that I was hoping would get made more concrete.
I want to emphasize that the current policies were crafted in an environment in which AI still has a tiny impact on the world. My expectation is that policies will get much stricter as AI becomes a larger part of our life. I am not making the claim that current policies are sufficient; instead I am making a claim about the trajectory, i.e. how well we should expect society to respond at a time, given the evidence and level of AI capabilities at that time.
I understand this (sorry if wasn’t clear), but I think it’s less obvious than you do that this trend will continue without intervention from AI x-risk people. I agree with other commenters that AI x-risk people should get a lot of the credit for the recent push. I also provided example reasons that the trend might not continue smoothly or even reverse in my point (3).
There might also be disagreements around:
Not sharing your high confidence in slow, continuous takeoff.
The strictness of regulation needed to make a dent in AI risk, e.g. if substantial international coordination is required it seems optimistic to me to assume that the trajectory will by default lead to this.
The value in things getting done faster than they would have done otherwise, even if they would have been done either way. This indirectly provides more time to iterate and get to better, more nuanced policy.
I believe that current evidence supports my interpretation of our general trajectory, but I’m happy to hear someone explain why they disagree and highlight concrete predictions that could serve to operationalize this disagreement.
Operationalizing disagreements well is hard and time-consuming especially when we’re betting on “how things would go without intervention from a community that is intervening a lot”, but a few very rough forecasts, all conditional on no TAI before resolve date:
75%: In Jan 2028, less than 10% of Americans will consider AI the most important problem.
60%: In Jan 2030, Evan Hubinger will believe that if x-risk-motivated people had not worked on deceptive alignment at all, risk from deceptive alignment would be at least 50% higher, compared to a baseline of no work at all (i.e. if risk is 5% and it would be 9% with no work from anyone, it needs to have been >7% if no work from x-risk people had been done to resolve yes).
35%: In Jan 2028, conditional on a Republican President being elected in 2024, regulations on AI in the US will be generally less stringent than they were when the previous president left office. Edit:Crossed out because not operationalized well, more want to get at the vibe of how strict the President and legislature are being on AI, and since my understanding is a lot of the stuff from the EO might not come into actual force for a while.
I agree this is important and it was in your post but it seems like a decent description of what the majority of AI x-risk governance people are already working on, or at least not obviously a bad one.
I agree. I’m not criticizing the people who are trying to make sure that policies are well-targeted and grounded in high-quality evidence. I’m arguing in favor of their work. I’m mainly arguing against public AI safety advocacy work, which was recently upvoted highly on the EA Forum. [ETA, rewording: To the extent I was arguing against a single line of work, I was primarily arguing against public AI safety advocacy work, which was recently upvoted highly on the EA Forum. However, as I wrote in the post, I also think that we should re-evaluate which problems will be solved by default, which means I’m not merely letting other AI governance people off the hook.]
Operationalizing disagreements well is hard and time-consuming especially when we’re betting on “how things would go without intervention from a community that is intervening a lot”, but a few very rough forecasts, all conditional on no TAI before resolve date:
I appreciate these predictions, but I am not as interested in predicting personal of public opinions. I’m more interested in predicting regulatory stringency, quality, and scope.
Even if fewer than 10% of Americans consider AI to be the most important issue in 2028, I don’t think that necessarily indicates that regulations will have low stringency, low quality, or poor scope. Likewise, I’m not sure whether I want to predict on Evan Hubinger’s opinion, since I’d probably need to understand more about how he thinks to get it right, and I’d prefer to focus the operationalization instead on predictions about large, real world outcomes. I’m not really sure what disagreement the third prediction is meant to operationalize, although I find it to be an interesting question nonetheless.
I had the impression that it was more than just that, given the line: “In light of recent news, it is worth comprehensively re-evaluating which sub-problems of AI risk are likely to be solved without further intervention from the AI risk community (e.g. perhaps deceptive alignment), and which ones will require more attention.” and the further attention devoted to deceptive alignment.
I appreciate these predictions, but I am not as interested in predicting personal of public opinions. I’m more interested in predicting regulatory stringency, quality, and scope.
If you have any you think faithfully represent a possible disagreement between us go ahead. I personally feel it will be very hard to operationalize objective stuff about policies in a satisfying way. For example, a big issue with the market you’ve made is that it is about what will happen in the world, not what will happen without intervention from AI x-risk people. Furthermore it has all the usual issues with forecasting on complex things 12 years in advance, regarding the extent to which it operationalizes any disagreement well (I’ve bet yes on it, but think it’s likely that evaluating and fixing deceptive alignment will remain mostly unsolved in 2035 conditional on no superintelligence, especially if there were no intervention from x-risk people).
I had the impression that it was more than just that
Yes, the post was about more than that. To the extent I was arguing against a single line of work, it was mainly intended as a critique of public advocacy. Separately, I asked people to re-evaluate which problems will be solved by default, to refocus our efforts on the most neglected, important problems, and went into detail about what I currently expect will be solved by default.
If you have any you think faithfully represent a possible disagreement between us go ahead.
I offered a concrete prediction in the post. If people don’t think my prediction operationalizes any disagreement, then I think (1) either they don’t disagree with me, in which case maybe the post isn’t really aimed at them, or (2) they disagree with me in some other way that I can’t predict, and I’d prefer they explain where they disagree exactly.
a big issue with the market you’ve made is that it is about what will happen in the world, not what will happen without intervention from AI x-risk people.
It seems relatively valueless to predict on what will happen without intervention, since AI x-risk people will almost certainly intervene.
Furthermore it has all the usual issues with forecasting on complex things 12 years in advance, regarding the extent to which it operationalizes any disagreement well (I’ve bet yes on it, but think it’s likely that evaluating and fixing deceptive alignment will remain mostly unsolved in 2035, especially if there were no intervention from x-risk people).
I mostly agree. But I think it’s still better to offer a precise prediction than to only offer vague predictions, which I perceive as the more common and more serious failure mode in discussions like this one.
I agree much of the community (including me) was wrong or directionally wrong in the past about the level of AI regulation and how quickly it would come.
Regarding the recommendations made in the post for going forward given that there will be some regulation, I feel confused in a few ways.
Can you provide examples of interventions that meet your bar for not being done by default? It’s hard to understand the takeaways from your post because the negative examples are made much more concrete than the proposed positive ones
You argue that we perhaps shouldn’t invest as much in preventing deceptive alignment because “regulators will likely adapt, adjusting policy as the difficulty of the problem becomes clearer”
If we are assuming that regulators will adapt and adjust regarding deception, can you provide examples of interventions that policymakers will not be able to solve themselves and why they will be less likely to notice and deal with them than deception?
You say “we should question how plausible it is that society will fail to adequately address such an integral part of the problem”. What things aren’t integral parts of the problem but that should be worked on?
I feel we would need much better evidence of things being handled competently to invest significantly less into integral parts of the problem.
You say: ‘Of course, it may still be true that AI deception is an extremely hard problem that reliably resists almost all attempted solutions in any “normal” regulatory regime, even as concrete evidence continues to accumulate about its difficulty—although I consider that claim unproven, to say the least’
If we expect some problems in AI risk to be solved by default mostly by people outside the community, it feels to me like one takeaway would be that we should shift resources to portions of the problem that we expect to be the hardest
To me, intuitively, deceptive alignment might be one of the hardest parts of the problem as we scale to very superhuman systems, even if we condition on having time to build model organisms of misalignment and experiment with them for a few years. So I feel confused about why you claim a high level of difficulty is “unproven” as a dismissal; of course it’s unproven but you would need to argue that in worlds where the AI risk problem is fairly hard, there’s not much of a chance of it being very hard.
As someone who is relatively optimistic about concrete evidence of deceptive alignment increasing substantially before a potential takeover, I think I still put significantly lower probability on it than you do due to the possibility of fairly fast takeoff.
I feel like this post is to some extent counting our chickens before they hatch (tbc I agree with the directional update as I said above). I’m not an expert on what’s going on here but I imagine any of the following happening (non-exhaustive list) that make the current path to potentially sensible regulation in the US and internationally harder:
The EO doesn’t lead to as many resources dedicated to AI-x-risk-reducing things as we might hope. I haven’t read it myself, just the fact sheet and Zvi’s summary but Zvi says “If you were hoping for or worried about potential direct or more substantive action, then the opposite applies – there is very little here in the way of concrete action, only the foundation for potential future action.”
A Republican President comes in power in the US and reverses a lot of the effects in the EO
Rishi Sunak gets voted out in the UK (my sense is that this is likely) and the new Prime Minister is much less gung-ho on AI risk
I don’t have strong views on the value of AI advocacy, but this post seems overconfident in calling it out as being basically not useful based on recent shifts.
It seems likely that much stronger regulations will be important, e.g. the model reporting threshold in the EO was set relatively high and many in the AI risk community have voiced support for an international pause if it were politically feasible, which the EO is far from.
The public still doesn’t consider AI risk to be very important. <1% of the American public considers it the most important problem to deal with. So to the extent that raising that number was good before, it still seems pretty good now, even if slightly worse.
I have three things to say here:
Several months ago I proposed general, long-term value drift as a problem that I think will be hard to solve by default. I currently think that value drift is a “hard bit” of the problem that we do not appear to be close to seriously addressing, perhaps because people expect easier problems won’t be solved either without heroic effort. I’m also sympathetic to Dan Hendrycks’ arguments about AI evolution. I will add these points to the post.
I mostly think people should think harder about what the hard parts of AI risk are in the first place. It would not be surprising if the “hard bits” will be things that we’ve barely thought about, or are hard to perceive as major problems, since their relative hiddenness would be a strong reason to believe that they will not be solved by default.
The problem of “make sure policies are well-targeted, informed by the best evidence, and mindful of social/political difficulties” seems like a hard problem that societies have frequently failed to get right historically, and the relative value of solving this problem seems to get higher as you become more optimistic about the technical problems being solved.
I want to emphasize that the current policies were crafted in an environment in which AI still has a tiny impact on the world. My expectation is that policies will get much stricter as AI becomes a larger part of our life. I am not making the claim that current policies are sufficient; instead I am making a claim about the trajectory, i.e. how well we should expect society to respond at a time, given the evidence and level of AI capabilities at that time. I believe that current evidence supports my interpretation of our general trajectory, but I’m happy to hear someone explain why they disagree and highlight concrete predictions that could serve to operationalize this disagreement.
Thanks for clarifying.
Don’t have a strong opinion here, but intuitively feels like it would be hard to find tractable angles for work on this now.
Maybe. In general, I’m excited about people who have the talent for it to think about previously neglected angles.
I agree this is important and it was in your post but it seems like a decent description of what the majority of AI x-risk governance people are already working on, or at least not obviously a bad one. This is the phrase that I was hoping would get made more concrete.
I understand this (sorry if wasn’t clear), but I think it’s less obvious than you do that this trend will continue without intervention from AI x-risk people. I agree with other commenters that AI x-risk people should get a lot of the credit for the recent push. I also provided example reasons that the trend might not continue smoothly or even reverse in my point (3).
There might also be disagreements around:
Not sharing your high confidence in slow, continuous takeoff.
The strictness of regulation needed to make a dent in AI risk, e.g. if substantial international coordination is required it seems optimistic to me to assume that the trajectory will by default lead to this.
The value in things getting done faster than they would have done otherwise, even if they would have been done either way. This indirectly provides more time to iterate and get to better, more nuanced policy.
Operationalizing disagreements well is hard and time-consuming especially when we’re betting on “how things would go without intervention from a community that is intervening a lot”, but a few very rough forecasts, all conditional on no TAI before resolve date:
75%: In Jan 2028, less than 10% of Americans will consider AI the most important problem.
60%: In Jan 2030, Evan Hubinger will believe that if x-risk-motivated people had not worked on deceptive alignment at all, risk from deceptive alignment would be at least 50% higher, compared to a baseline of no work at all (i.e. if risk is 5% and it would be 9% with no work from anyone, it needs to have been >7% if no work from x-risk people had been done to resolve yes).
35%: In Jan 2028, conditional on a Republican President being elected in 2024, regulations on AI in the US will be generally less stringent than they were when the previous president left office.Edit:I agree. I’m not criticizing the people who are trying to make sure that policies are well-targeted and grounded in high-quality evidence. I’m arguing in favor of their work.
I’m mainly arguing against public AI safety advocacy work, which wasrecently upvoted highly on the EA Forum.[ETA, rewording: To the extent I was arguing against a single line of work, I was primarily arguing against public AI safety advocacy work, which was recently upvoted highly on the EA Forum. However, as I wrote in the post, I also think that we should re-evaluate which problems will be solved by default, which means I’m not merely letting other AI governance people off the hook.]I appreciate these predictions, but I am not as interested in predicting personal of public opinions. I’m more interested in predicting regulatory stringency, quality, and scope.
Even if fewer than 10% of Americans consider AI to be the most important issue in 2028, I don’t think that necessarily indicates that regulations will have low stringency, low quality, or poor scope. Likewise, I’m not sure whether I want to predict on Evan Hubinger’s opinion, since I’d probably need to understand more about how he thinks to get it right, and I’d prefer to focus the operationalization instead on predictions about large, real world outcomes. I’m not really sure what disagreement the third prediction is meant to operationalize, although I find it to be an interesting question nonetheless.
I had the impression that it was more than just that, given the line: “In light of recent news, it is worth comprehensively re-evaluating which sub-problems of AI risk are likely to be solved without further intervention from the AI risk community (e.g. perhaps deceptive alignment), and which ones will require more attention.” and the further attention devoted to deceptive alignment.
If you have any you think faithfully represent a possible disagreement between us go ahead. I personally feel it will be very hard to operationalize objective stuff about policies in a satisfying way. For example, a big issue with the market you’ve made is that it is about what will happen in the world, not what will happen without intervention from AI x-risk people. Furthermore it has all the usual issues with forecasting on complex things 12 years in advance, regarding the extent to which it operationalizes any disagreement well (I’ve bet yes on it, but think it’s likely that evaluating and fixing deceptive alignment will remain mostly unsolved in 2035 conditional on no superintelligence, especially if there were no intervention from x-risk people).
Yes, the post was about more than that. To the extent I was arguing against a single line of work, it was mainly intended as a critique of public advocacy. Separately, I asked people to re-evaluate which problems will be solved by default, to refocus our efforts on the most neglected, important problems, and went into detail about what I currently expect will be solved by default.
I offered a concrete prediction in the post. If people don’t think my prediction operationalizes any disagreement, then I think (1) either they don’t disagree with me, in which case maybe the post isn’t really aimed at them, or (2) they disagree with me in some other way that I can’t predict, and I’d prefer they explain where they disagree exactly.
It seems relatively valueless to predict on what will happen without intervention, since AI x-risk people will almost certainly intervene.
I mostly agree. But I think it’s still better to offer a precise prediction than to only offer vague predictions, which I perceive as the more common and more serious failure mode in discussions like this one.