This comment made me notice a kind of duality: - Paul wants to focus on finding concrete problems, and claims that Nate/Eliezer aren’t being very concrete with their proposed problems. - Nate/Eliezer want to focus on finding concrete solutions, and claim that Paul/other alignment researchers aren’t being very concrete with their proposed solutions.
It seems like “how well do we understand the problem” is one a crux here. I disagree with John’s comment because it feels like he’s assuming too much about our understanding of the problem. If you follow his strategy, then you can spend arbitrarily long trying to find a faithful concrete operationalization of a part of the problem that doesn’t exist.
I don’t feel like this is right (though I think this duality feels like a real thing that is important sometimes and is interesting to think about, so appreciated the comment).
ARC is spending its time right now (i) trying to write down concrete algorithms that solve ELK using heuristic arguments, and then trying to produce concrete examples in which they do the wrong thing, (ii) trying to write down concrete formalizations of heuristic arguments that have the desiderata needed for those algorithms to work, and trying to identify cases in which our algorithms don’t yet meet those desiderata or they may be unachievable. The output is just actual code which is purported to solve major difficulties in alignment.
And on the flip side, I spend a significant amount of my time looking at the algorithms we are proposing (and the bigger plans into which they would fit if successful) and trying to find the best arguments I can that these plans will fail.
I think that the disagreement is more about what kind of concreteness is possible or desirable in this domain.
Put differently: I’m not saying that Nate and Eliezer are vague about problems but concrete about solutions, I’m saying they are vague about everything. And I don’t think they are saying that I’m concrete about problems but vague about solutions, they would say that I’m concrete about parts of the solution/problem that don’t matter while systematically pushing all the difficulty into the parts I’m still vague about.
I do think “how well do we understand the problem” seems like a pretty big crux; that leads Nate and Eliezer to think that I’m avoiding the predictably-important difficulty, and it leads me to think that Nate and Eliezer need to get more concrete in order to have an accurate picture of what’s going on.
Yeah, my comment was sloppily phrased; I agree with “I think that the disagreement is more about what kind of concreteness is possible or desirable in this domain.”
If you follow his strategy, then you can spend arbitrarily long trying to find a faithful concrete operationalization of a part of the problem that doesn’t exist.
I don’t think that’s how this works? The strategy I’m recommending explicitly contains two parts where we gain evidence about whether a part of the problem actually exists:
noticing an intuitive pattern in the failure-modes of some strategies
attempting to formalize (which presumably includes backpropagating our mathematics into our intuitions)
… so if a part of the problem doesn’t exist, then (a) we probably don’t notice a pattern in the first place, but even if our notoriously unreliable human pattern-matchers over-match, then (b) while we’re attempting to formalize we we have plenty of opportunity to notice that maybe the pattern doesn’t actually exist the way we thought it did.
It feels like you’re looking for a duality which does not exist. I mean, the duality between “look for concrete solutions” and “look for concrete problems” I buy (and that would indeed cause one side to be over-optimistic and the other over-pessimistic in exactly the pattern we actually see between Paul and Nate/Eliezer). But it feels like you’re also looking for a duality between how-Paul’s-recommended-search-order-just-fails and how-mine-just-fails. And the reason that duality does not exist is because my recommended search order is using strictly more evidence; Paul is basically advocating ignoring a whole class of very useful evidence, and that makes his strategy straightforwardly suboptimal. If we were both picking different points on a pareto frontier, then yeah, there’d be a trade-off. But Paul just isn’t on the pareto frontier.
I feel confused about the difference between your “attempt to formalize” step and Paul’s “attempt to concretize” step. It feels like you can view either as a step towards the other—if you successfully formalize, then presumably you’ll be able to concretize; but also one valuable step towards formalizing is by finding concrete examples and then generalizing from them. I think everyone agrees that it’d be great to end up with a formalism for the problem, and then disagrees on how much that process should involve “finding concrete examples of the problem”. My own view is that since it’s so incredibly easy for people to get lost in abstractions, people should try to concretize much more when talking about highly abstract domains. (Even when people are confident that they’re not lost in abstractions, like Eliezer and Nate are, that’s still really useful for conveying ideas to other people.)
This comment made me notice a kind of duality:
- Paul wants to focus on finding concrete problems, and claims that Nate/Eliezer aren’t being very concrete with their proposed problems.
- Nate/Eliezer want to focus on finding concrete solutions, and claim that Paul/other alignment researchers aren’t being very concrete with their proposed solutions.
It seems like “how well do we understand the problem” is one a crux here. I disagree with John’s comment because it feels like he’s assuming too much about our understanding of the problem. If you follow his strategy, then you can spend arbitrarily long trying to find a faithful concrete operationalization of a part of the problem that doesn’t exist.
I don’t feel like this is right (though I think this duality feels like a real thing that is important sometimes and is interesting to think about, so appreciated the comment).
ARC is spending its time right now (i) trying to write down concrete algorithms that solve ELK using heuristic arguments, and then trying to produce concrete examples in which they do the wrong thing, (ii) trying to write down concrete formalizations of heuristic arguments that have the desiderata needed for those algorithms to work, and trying to identify cases in which our algorithms don’t yet meet those desiderata or they may be unachievable. The output is just actual code which is purported to solve major difficulties in alignment.
And on the flip side, I spend a significant amount of my time looking at the algorithms we are proposing (and the bigger plans into which they would fit if successful) and trying to find the best arguments I can that these plans will fail.
I think that the disagreement is more about what kind of concreteness is possible or desirable in this domain.
Put differently: I’m not saying that Nate and Eliezer are vague about problems but concrete about solutions, I’m saying they are vague about everything. And I don’t think they are saying that I’m concrete about problems but vague about solutions, they would say that I’m concrete about parts of the solution/problem that don’t matter while systematically pushing all the difficulty into the parts I’m still vague about.
I do think “how well do we understand the problem” seems like a pretty big crux; that leads Nate and Eliezer to think that I’m avoiding the predictably-important difficulty, and it leads me to think that Nate and Eliezer need to get more concrete in order to have an accurate picture of what’s going on.
Yeah, my comment was sloppily phrased; I agree with “I think that the disagreement is more about what kind of concreteness is possible or desirable in this domain.”
I don’t think that’s how this works? The strategy I’m recommending explicitly contains two parts where we gain evidence about whether a part of the problem actually exists:
noticing an intuitive pattern in the failure-modes of some strategies
attempting to formalize (which presumably includes backpropagating our mathematics into our intuitions)
… so if a part of the problem doesn’t exist, then (a) we probably don’t notice a pattern in the first place, but even if our notoriously unreliable human pattern-matchers over-match, then (b) while we’re attempting to formalize we we have plenty of opportunity to notice that maybe the pattern doesn’t actually exist the way we thought it did.
It feels like you’re looking for a duality which does not exist. I mean, the duality between “look for concrete solutions” and “look for concrete problems” I buy (and that would indeed cause one side to be over-optimistic and the other over-pessimistic in exactly the pattern we actually see between Paul and Nate/Eliezer). But it feels like you’re also looking for a duality between how-Paul’s-recommended-search-order-just-fails and how-mine-just-fails. And the reason that duality does not exist is because my recommended search order is using strictly more evidence; Paul is basically advocating ignoring a whole class of very useful evidence, and that makes his strategy straightforwardly suboptimal. If we were both picking different points on a pareto frontier, then yeah, there’d be a trade-off. But Paul just isn’t on the pareto frontier.
I feel confused about the difference between your “attempt to formalize” step and Paul’s “attempt to concretize” step. It feels like you can view either as a step towards the other—if you successfully formalize, then presumably you’ll be able to concretize; but also one valuable step towards formalizing is by finding concrete examples and then generalizing from them. I think everyone agrees that it’d be great to end up with a formalism for the problem, and then disagrees on how much that process should involve “finding concrete examples of the problem”. My own view is that since it’s so incredibly easy for people to get lost in abstractions, people should try to concretize much more when talking about highly abstract domains. (Even when people are confident that they’re not lost in abstractions, like Eliezer and Nate are, that’s still really useful for conveying ideas to other people.)