I think that using a broader definition (or the de re reading) would also be defensible, but I like it less because it includes many subproblems that I think (a) are much less urgent, (b) are likely to involve totally different techniques than the urgent part of alignment.
I think it would be helpful for understanding your position and what you mean by “AI alignment” to have a list or summary of those other subproblems and why you think they’re much less urgent. Can you link to or give one here?
Also, do you have a prefered term for the broader definition, or the de re reading? What should we call those things if not “AI alignment”?
I think it would be helpful for understanding your position and what you mean by “AI alignment” to have a list or summary of those other subproblems and why you think they’re much less urgent. Can you link to or give one here?
Other problems related to alignment, which would be included by the broadest definition of “everything related to making the future good.”
We face a bunch of problems other than AI alignment (e.g. other destructive technologies, risk of value drift), and depending on the competencies of our AI systems they may be better or worse than humans at helping handle those problems (relative to accelerating the kinds of progress that force us to confront those problems). So we’d like AI to be better at (helping us with) {diplomacy, reflection, institution design, philosophy...} relative to {physical technology, social manipulation, logistics...}
Beyond alignment, AI may provide new advantages to actors who are able to make their values more explicit, or who have explicit norms for bargaining/aggregation, and so we may want to figure out how to make more things more explicit.
AI could facilitate social control, manipulation, or lock-in, which may make it more important for us to have more robust or rapid forms of deliberation (that are robust to control/manipulation, or that can run their course fast enough to prevent someone from making a mistake). This also may increase the incentives for ordinary conflict amongst actors with differing long-term values.
AI will tend to empower groups with few people (but lots of resources), making it easier for someone to destroy the world and so requiring stronger enforcement/stabilization.
AI may be an unusually good opportunity for world stabilization, e.g. because its associated with a disruptive transition, in which case someone may want to take that opportunity. (Though I’m concerned about this because, in light of disagreement/conflict about stabilization itself, someone attempting to do this or being expected to attempt to do this could undermine our ability to solve alignment.)
That’s a very partial list. This is for the broadest definition of “everything about AI that is relevant to making the future good,” which I don’t think is particularly defensible. I’d say the first three could be included in defensible definitions of alignment, and there are plenty of others.
My basic position on most of these problems is: “they are fine problems and you might want to work on them, but if someone is going to claim they are important they need to give a separate argument, it’s not at all implied by the normal argument for the importance of alignment.” I can explain in particular cases why I think other problems are less important, and I feel like we’ve had a lot of back and forth on some of these, but the only general argument is that I think there are strong reasons to care about alignment in particular that don’t extend to these other problems (namely, a failure to solve alignment has predictable really bad consequences in the short term, and currently it looks very tractable in expectation).
Also, do you have a preferred term for the broader definition, or the de re reading? What should we call those things if not “AI alignment”?
Which broader definition? There are tons of possibilities. I think the one given in this post is the closest to a coherent definition that matches existing usage.
The other common definition seems to be more along the lines of “everything related to make AI go well” which I don’t think really deserves a word—just call that “AI trajectory change” if you want to distinguish it from “AI speedup”, or “pro-social AI” if you want to distinguish from “AI as an intellectual curiosity,” or just “AI” if you don’t care about those distinctions.
For the de re reading, I don’t see much motive to lump the competence and alignment parts of the problem into a single heading, I would just call them “alignment” and “value learning” separately. But I can see how this might seem like a value judgment, since someone who thought that these two problems were the very most important problems might want to put them under a single heading even if they didn’t think there would be particular technical overlap.
(ETA: I’d also be OK with saying “de dicto alignment” or “de re alignment,” since they really are just importantly different concepts both of which are used relatively frequently—there is a big difference between an employee who de dicto wants the same things their boss wants, and an employee who de re wants to help their boss get what they want, those feel like two species of alignment.)
I think it would be helpful for understanding your position and what you mean by “AI alignment” to have a list or summary of those other subproblems and why you think they’re much less urgent. Can you link to or give one here?
Also, do you have a prefered term for the broader definition, or the de re reading? What should we call those things if not “AI alignment”?
Other problems related to alignment, which would be included by the broadest definition of “everything related to making the future good.”
We face a bunch of problems other than AI alignment (e.g. other destructive technologies, risk of value drift), and depending on the competencies of our AI systems they may be better or worse than humans at helping handle those problems (relative to accelerating the kinds of progress that force us to confront those problems). So we’d like AI to be better at (helping us with) {diplomacy, reflection, institution design, philosophy...} relative to {physical technology, social manipulation, logistics...}
Beyond alignment, AI may provide new advantages to actors who are able to make their values more explicit, or who have explicit norms for bargaining/aggregation, and so we may want to figure out how to make more things more explicit.
AI could facilitate social control, manipulation, or lock-in, which may make it more important for us to have more robust or rapid forms of deliberation (that are robust to control/manipulation, or that can run their course fast enough to prevent someone from making a mistake). This also may increase the incentives for ordinary conflict amongst actors with differing long-term values.
AI will tend to empower groups with few people (but lots of resources), making it easier for someone to destroy the world and so requiring stronger enforcement/stabilization.
AI may be an unusually good opportunity for world stabilization, e.g. because its associated with a disruptive transition, in which case someone may want to take that opportunity. (Though I’m concerned about this because, in light of disagreement/conflict about stabilization itself, someone attempting to do this or being expected to attempt to do this could undermine our ability to solve alignment.)
That’s a very partial list. This is for the broadest definition of “everything about AI that is relevant to making the future good,” which I don’t think is particularly defensible. I’d say the first three could be included in defensible definitions of alignment, and there are plenty of others.
My basic position on most of these problems is: “they are fine problems and you might want to work on them, but if someone is going to claim they are important they need to give a separate argument, it’s not at all implied by the normal argument for the importance of alignment.” I can explain in particular cases why I think other problems are less important, and I feel like we’ve had a lot of back and forth on some of these, but the only general argument is that I think there are strong reasons to care about alignment in particular that don’t extend to these other problems (namely, a failure to solve alignment has predictable really bad consequences in the short term, and currently it looks very tractable in expectation).
Which broader definition? There are tons of possibilities. I think the one given in this post is the closest to a coherent definition that matches existing usage.
The other common definition seems to be more along the lines of “everything related to make AI go well” which I don’t think really deserves a word—just call that “AI trajectory change” if you want to distinguish it from “AI speedup”, or “pro-social AI” if you want to distinguish from “AI as an intellectual curiosity,” or just “AI” if you don’t care about those distinctions.
For the de re reading, I don’t see much motive to lump the competence and alignment parts of the problem into a single heading, I would just call them “alignment” and “value learning” separately. But I can see how this might seem like a value judgment, since someone who thought that these two problems were the very most important problems might want to put them under a single heading even if they didn’t think there would be particular technical overlap.
(ETA: I’d also be OK with saying “de dicto alignment” or “de re alignment,” since they really are just importantly different concepts both of which are used relatively frequently—there is a big difference between an employee who de dicto wants the same things their boss wants, and an employee who de re wants to help their boss get what they want, those feel like two species of alignment.)