I think it’s probably more of a spectrum than two distinct groups, and I tried to pick two extremes. On one end, there are the empirical alignment people, like Anthropic and Redwood; on the other, pure conceptual researchers and the LLM whisperers like Janus, and there are shades in between, like MIRI and Paul Christiano. I’m not even sure this fits neatly on one axis, but probably the biggest divide is empirical vs. conceptual. There are other splits too, like rigor vs. exploration or legibility vs. ‘lore,’ and the preferences kinda seem correlated.
Whenever I try to “learn what’s going on with AI alignment” I wind up on some article about whether dogs know enough words to have thoughts or something. I don’t really want to kill off the theoretical term (it can peek into the future a little later and function more independent of technology, basically) but it seems like kind of a poor way to answer stuff like: what’s going on now, or if all the AI companies allowed me to write their 6 month goals, what would I put on it.
I’m curious about what people disagree with regarding this comment. Also, I guess since people upvoted and agreed with the first one, they do have two groups in mind, but they’re not quite the same as the ones I was thinking about (which is interesting and mildly funny!). So, what was your slicing up of the alignment research x LW scene that’s consistent with my first comment but different from my description in the second comment?
What are the two groups in question here?
I think it’s probably more of a spectrum than two distinct groups, and I tried to pick two extremes. On one end, there are the empirical alignment people, like Anthropic and Redwood; on the other, pure conceptual researchers and the LLM whisperers like Janus, and there are shades in between, like MIRI and Paul Christiano. I’m not even sure this fits neatly on one axis, but probably the biggest divide is empirical vs. conceptual. There are other splits too, like rigor vs. exploration or legibility vs. ‘lore,’ and the preferences kinda seem correlated.
Whenever I try to “learn what’s going on with AI alignment” I wind up on some article about whether dogs know enough words to have thoughts or something. I don’t really want to kill off the theoretical term (it can peek into the future a little later and function more independent of technology, basically) but it seems like kind of a poor way to answer stuff like: what’s going on now, or if all the AI companies allowed me to write their 6 month goals, what would I put on it.
I’m curious about what people disagree with regarding this comment. Also, I guess since people upvoted and agreed with the first one, they do have two groups in mind, but they’re not quite the same as the ones I was thinking about (which is interesting and mildly funny!). So, what was your slicing up of the alignment research x LW scene that’s consistent with my first comment but different from my description in the second comment?