This is the culmination of a series of post on “formal alignment”, where I start out saying “what it would mean to formally state what it would mean to build aligned AI” and then from that try to figure out what we’d have to figure out in order to achieve that.
Over the last year I’ve gotten pulled in other directions so not pushed this line of research forward much, plus I reached a point with it where it was clear it required different specialization than I have to make additional progress, but I still think it presents a different approach to what others are doing in the space of work towards AI alignment and think you might find it interesting to review (along with the preceding posts in the series) for that reason.
We want to go through the different research agendas (and I already knew about yours), as they give different views/paradigms on AI Alignment. Yet I’m not sure how relevant a review of such posts are. In a sense, the “reviewable” part is the actual research that underlies the agenda, right?
I don’t see a good reason to exclude agenda-style posts, but I do think it’d be important to treat them differently from more here-is-a-specific-technical-result posts.
Broadly, we’d want to be improving the top-level collective AI alignment research ‘algorithm’. With that in mind, I don’t see an area where more feedback/clarification/critique of some kind wouldn’t be helpful. The questions seem to be: What form should feedback/review… take in a given context? Where is it most efficient to focus our efforts?
Productive feedback/clarification on high-level agendas seems potentially quite efficient. My worry would be to avoid excessive selection pressure towards paths that are clear and simply justified. However, where an agenda does use specific assumptions and arguments to motivate its direction, early ‘review’ seems useful.
Another post of mine I’ll recommend you:
https://www.lesswrong.com/posts/k8F8TBzuZtLheJt47/deconfusing-human-values-research-agenda-v1
This is the culmination of a series of post on “formal alignment”, where I start out saying “what it would mean to formally state what it would mean to build aligned AI” and then from that try to figure out what we’d have to figure out in order to achieve that.
Over the last year I’ve gotten pulled in other directions so not pushed this line of research forward much, plus I reached a point with it where it was clear it required different specialization than I have to make additional progress, but I still think it presents a different approach to what others are doing in the space of work towards AI alignment and think you might find it interesting to review (along with the preceding posts in the series) for that reason.
Thanks for the suggestion!
We want to go through the different research agendas (and I already knew about yours), as they give different views/paradigms on AI Alignment. Yet I’m not sure how relevant a review of such posts are. In a sense, the “reviewable” part is the actual research that underlies the agenda, right?
I don’t see a good reason to exclude agenda-style posts, but I do think it’d be important to treat them differently from more here-is-a-specific-technical-result posts.
Broadly, we’d want to be improving the top-level collective AI alignment research ‘algorithm’. With that in mind, I don’t see an area where more feedback/clarification/critique of some kind wouldn’t be helpful.
The questions seem to be:
What form should feedback/review… take in a given context?
Where is it most efficient to focus our efforts?
Productive feedback/clarification on high-level agendas seems potentially quite efficient. My worry would be to avoid excessive selection pressure towards paths that are clear and simply justified. However, where an agenda does use specific assumptions and arguments to motivate its direction, early ‘review’ seems useful.