Jozdien comments on My AI Model Delta Compared To Christiano

Jozdien 12 Jun 2024 19:22 UTC
23 points
6
I was surprised to read the delta propagating to so many different parts of your worldviews (organizations, goods, markets, etc), and that makes me think that it’d be relatively easier to ask questions today that have quite different answers under your worldviews. The air conditioner one seems like one, but it seems like we could have many more, and some that are even easier than that. Plausibly you know of some because you’re quite confident in your position; if so, I’d be interested to hear about them^[1].
At a meta level, I find it pretty funny that so many smart people seem to disagree on the question of whether questions usually have easily verifiable answers.
1. ^
  I realize that part of your position is that this is just really hard to actually verify, but as in the example of objects in your room it feels like there should be examples where this is feasible with moderate amounts of effort. Of course, a lack of consensus on whether something is actually bad if you dive in further could also be evidence for hardness of verification, even if it’d be less clean.
- johnswentworth 12 Jun 2024 21:50 UTC
  12 points
  0
  Parent
  Yeah, I think this is very testable, it’s just very costly to test—partly because it requires doing deep dives on a lot of different stuff, and partly because it’s the sort of model which makes weak claims about lots of things rather than very precise claims about a few things.
- Lorxus 16 Jun 2024 0:40 UTC
  8 points
  1
  Parent
  At a meta level, I find it pretty funny that so many smart people seem to disagree on the question of whether questions usually have easily verifiable answers.
  And at a twice-meta level, that’s strong evidence for questions not generically having verifiable answers (though not for them generically not having those answers).
  - Jozdien 16 Jun 2024 9:51 UTC
    2 points
    2
    Parent
    (That’s what I meant, though I can see how I didn’t make that very clear.)
  - Morpheus 18 Jul 2024 14:53 UTC
    1 point
    0
    Parent
    So on the $Ω$ -meta-level you need to correct weakly in the other direction again.
- Caspar Oesterheld 14 Sep 2024 23:18 UTC
  6 points
  0
  Parent
  To some extent, this is all already in Jozdien’s comment, but:
  
  It seems that the closest thing to AIs debating alignment (or providing hopefully verifiable solutions) that we can observe is human debate about alignment (and perhaps also related questions about the future). Presumably John and Paul have similar views about the empirical difficulty of reaching agreement in the human debate about alignment, given that they both observe this debate a lot. (Perhaps they disagree about what people’s level of (in)ability to reach agreement / verify arguments implies for the probability of getting alignment right. Let’s ignore that possibility...) So I would have thought that even w.r.t. this fairly closely related debate, the disagreement is mostly about what happens as we move from human to superhuman-AI discussants. In particular, I would expect Paul to concede that the current level of disagreement in the alignment community is problematic and to argue that this will improve (enough) if we have superhuman debaters. If even this closely related form of debate/delegation/verification process isn’t taken to be very informative (by at least one of Paul and John), then it’s hard to imagine that much more distant delegation processes (such as those behind making computer monitors) are very informative to their disagreement.