The actual discussion on that Arbital page strongly suggests that alignment is about pointing an AI in a direction
But the page includes:
“AI alignment theory” is meant as an overarching term to cover the whole research field associated with this problem, including, e.g., the much-debated attempt to estimate how rapidly an AI might gain in capability once it goes over various particular thresholds.
which seems to be outside of just “pointing an AI in a direction”
Is this intended (/ do you understand this) to include things like “make your AI better at predicting the world,” since we expect that agents who can make better predictions will achieve better outcomes?
I think so, at least for certain kinds of predictions that seem especially important (i.e., may lead to x-risk if done badly), see this Arbital page which is under AI Alignment:
Vingean reflection is reasoning about cognitive systems, especially cognitive systems very similar to yourself (including your actual self), under the constraint that you can’t predict the exact future outputs. We need to make predictions about the consequence of operating an agent in an environment via reasoning on some more abstract level, somehow.
If this definition doesn’t distinguish alignment from capabilities, then that seems like a non-starter to me which is neither useful nor captures the typical usage.
It seems to me that Rohin’s proposal of distinguishing between “motivation” and “capabilities” is a good one, and then we can keep using “alignment” for the set of broader problems that are in line with the MIRI/Arbital definition and examples.
In general, the alternative broader usage of AI alignment is broad enough to capture lots of problems that would exist whether or not we built AI. That’s not so different from using the term to capture (say) physics problems that would exist whether or not we built AI, both feel bad to me.
It seems fine to me to include 1) problems that are greatly exacerbated by AI and 2) problems that aren’t caused by AI but may be best solved/ameliorated by some element of AI design, since these are problems that AI researchers have a responsibility over and/or can potentially contribute to. If there’s a problem that isn’t exacerbated by AI and does not seem likely to have a solution within AI design then I’d not include that.
Independently of this issue, it seems like “the kinds of problems you are talking about in this thread” need better descriptions whether or not they are part of alignment (since even if they are part of alignment, they will certainly involve totally different techniques/skills/impact evaluations/outcomes/etc.).
But the page includes:
which seems to be outside of just “pointing an AI in a direction”
I think so, at least for certain kinds of predictions that seem especially important (i.e., may lead to x-risk if done badly), see this Arbital page which is under AI Alignment:
It seems to me that Rohin’s proposal of distinguishing between “motivation” and “capabilities” is a good one, and then we can keep using “alignment” for the set of broader problems that are in line with the MIRI/Arbital definition and examples.
It seems fine to me to include 1) problems that are greatly exacerbated by AI and 2) problems that aren’t caused by AI but may be best solved/ameliorated by some element of AI design, since these are problems that AI researchers have a responsibility over and/or can potentially contribute to. If there’s a problem that isn’t exacerbated by AI and does not seem likely to have a solution within AI design then I’d not include that.
Sure, agreed.