People often work on alignment proposals without having a clear idea of what they actually want an aligned system to do. Eliezer thinks this is bad.
...
Is it never useful to have a better understanding of the mechanics, even if we don’t have a clear target in mind?
Not sure how relevant this is in Eliezer’s model, but I’d note that there’s a difference between alignment proposals vs understanding the mechanics. When working to understand things in general, I usually want to have multiple different use-cases in mind, to drive the acquisition of generalizable knowledge. For an actual proposal, it makes more sense to have a particular use-case.
That said, even under this argument one should have multiple concrete use-cases in mind, as opposed to zero concrete use-cases.
I started out nodding along with Eliezer in this post, then read Mark’s take and was like “yeah, that seems fair.”
But I do quite like the notion of “have a couple concrete examples of how the whole thing will get used end-to-end.”
This feels related to how I feel about a lot of voting theory, or economics mechanism design. I’m often frustrated by people working hard on fleshing out the edge cases for some voting process, when the actual bottleneck is a much simpler voting system that anyone is actually likely to use, and a use-case to bootstrap it to more popularity. Or for mechanism design, the actual bottleneck is someone who is good at UX and product design.
In the case of AI we have a different set of problems. We probably actually do need the complicated theory-heavy solutions. But those solutions need to be relevant to stuff that have a chance of actually helping, which requires thinking through things concretely.
...
On the other hand, this all feels a bit distinct from what (I think?) Eliezer’s point is, which is less about how to design things generally, and more about “The AGI is actually gonna kill you tho and the road to hell is paved with people bouncing off the hard problem to work on more tractable things.” I think there’s some kind of deeply different macrostrategy that Paul and Eliezer and Critch are pointing. (I’m a bit confused about Critch/Paul, I had been bucketing them as in essentially the same strategic camp but then last year they had some significant disagreement I was confused by). Where Eliezer is like “you definitely just need a pivotal act” and Critch is like “you’re not gonna get a safe pivotal act and also it harms the coordination commons” and Paul is like “you’re not gonna get a safe pivotal act you need to figure out how to make alignment competitive.”
I think this disagreement is nontrivial to resolve.
Not sure how relevant this is in Eliezer’s model, but I’d note that there’s a difference between alignment proposals vs understanding the mechanics. When working to understand things in general, I usually want to have multiple different use-cases in mind, to drive the acquisition of generalizable knowledge. For an actual proposal, it makes more sense to have a particular use-case.
That said, even under this argument one should have multiple concrete use-cases in mind, as opposed to zero concrete use-cases.
I started out nodding along with Eliezer in this post, then read Mark’s take and was like “yeah, that seems fair.”
But I do quite like the notion of “have a couple concrete examples of how the whole thing will get used end-to-end.”
This feels related to how I feel about a lot of voting theory, or economics mechanism design. I’m often frustrated by people working hard on fleshing out the edge cases for some voting process, when the actual bottleneck is a much simpler voting system that anyone is actually likely to use, and a use-case to bootstrap it to more popularity. Or for mechanism design, the actual bottleneck is someone who is good at UX and product design.
In the case of AI we have a different set of problems. We probably actually do need the complicated theory-heavy solutions. But those solutions need to be relevant to stuff that have a chance of actually helping, which requires thinking through things concretely.
...
On the other hand, this all feels a bit distinct from what (I think?) Eliezer’s point is, which is less about how to design things generally, and more about “The AGI is actually gonna kill you tho and the road to hell is paved with people bouncing off the hard problem to work on more tractable things.” I think there’s some kind of deeply different macrostrategy that Paul and Eliezer and Critch are pointing. (I’m a bit confused about Critch/Paul, I had been bucketing them as in essentially the same strategic camp but then last year they had some significant disagreement I was confused by). Where Eliezer is like “you definitely just need a pivotal act” and Critch is like “you’re not gonna get a safe pivotal act and also it harms the coordination commons” and Paul is like “you’re not gonna get a safe pivotal act you need to figure out how to make alignment competitive.”
I think this disagreement is nontrivial to resolve.
A reply by me: https://www.lesswrong.com/posts/d4YGxMpzmvxknHfbe/conversation-with-eliezer-what-do-you-want-the-system-to-do?commentId=mc9A77bvmNpNyoqYf#comments