Eli Tyre comments on My AI Model Delta Compared To Christiano

Eli Tyre 18 Jun 2024 2:31 UTC
4 points
0
Under my models, if I pick one of these objects at random and do a deep dive researching that object, it will usually turn out to be bad in ways which were either nonobvious or nonsalient to me, but unambiguously make my life worse and would unambiguously have been worth-to-me the cost to make better.
Crucially, this is true only because you’re relatively smart for a human: smarter than many of the engineers that designed those objects, and smarter than most or all of the committee-of-engineers that designed those objects. You can come up with better solutions then they did, if you have a similar level of context.

But that’s not true of most humans. Most humans, if they did a deep dive into those objects wouldn’t notice the many places where there is substantial room for improvement. Just like most humans don’t spontaneously recognized blatant-to-me incentive problems in government design (and virtually every human institution), and just as I often wouldn’t be able to tell that a software solution was horrendously badly architected, at least without learning learning a bunch of software engineering in addition to doing a deep dive into this particular program.
- philh 29 Jun 2024 23:08 UTC
  4 points
  0
  Parent
  Note that to the extent this is true, it suggests verification is even harder than John thinks.
  - ryan_greenblatt 30 Jun 2024 19:38 UTC
    2 points
    0
    Parent
    Hmm, not exactly. Our verification ability only needs to be sufficiently good relative to the AIs.
- johnswentworth 19 Jun 2024 1:49 UTC
  4 points
  2
  Parent
  Ehh, yes and no. I maybe buy that a median human doing a deep dive into a random object wouldn’t notice the many places where there is substantial room for improvement; hanging around with rationalists does make it easy to forget just how low the median-human bar is.
  But I would guess that a median engineer is plenty smart enough to see the places where there is substantial room for improvement, at least within their specialty. Indeed, I would guess that the engineers designing these products often knew perfectly well that they were making tradeoffs which a fully-informed customer wouldn’t make. The problem, I expect, is mostly organizational dysfunction (e.g. the committee of engineers is dumber than one engineer, and if there are any nontechnical managers involved then the collective intelligence nosedives real fast), and economic selection pressure.
  For instance, I know plenty of software engineers who work at the big tech companies. The large majority of them (in my experience) know perfectly well that their software is a trash fire, and will tell you as much, and will happily expound in great detail the organizational structure and incentives which lead to the ongoing trash fire.