Thomas Kwa comments on My AI Model Delta Compared To Christiano

Thomas Kwa 16 Sep 2024 23:59 UTC
15 points
1
I disagree with this curation because I don’t think this post will stand the test of time. While Wentworth’s delta to Yudkowsky has a legible takeaway—ease of ontology translation—that is tied to his research on natural latents, it is less clear what John means here and what to take away. Simplicity is not a virtue when the issue is complex and you fail to actually simplify it.
- Verification vs generation has an extremely wide space of possible interpretations, and as stated here the claim is incredibly vague. The argument for why difficulty of verification implies difficulty of delegation is not laid out, and the examples do not go in much depth. John says that convincing people is not the point of this post, but this means we also don’t really have gears behind the claims.
  - The comments didn’t really help—most of the comments here are expressing confusion, wanting more specificity, or disagreeing whereupon John doesn’t engage. Also, Paul didn’t reply. I don’t feel any more enlightened after reading them except to disagree with some extremely strong version of this post...
- Vanilla HCH is an 8-year-old model of delegation to AIs which Yudkowsky convinced me was not aligned in like 2018. Why not engage with the limiting constructions in 11 Proposals, the work in the ELK report, recent work by ARC, recent empirical work on AI debate?
- Ben Pace 17 Sep 2024 2:51 UTC
  6 points
  0
  Parent
  I agree that this pointer to a worldview-difference is pretty high-level / general, and the post would be more valuable with a clearer list of some research disagreements or empirical disagreements. Perhaps I made a mistake to curate a relatively loose pointer. I think I assign at least 35% to “if we’re all still alive in 5 years and there’s a much stronger public understanding of Christiano’s perspective on the world, this post will in retrospect be a pretty good high-level pointer to where he differs from many others (slash a mistake he was making)”, but I still appreciate the datapoint that you (and Mark and Rohin) did not find it helpful nor agree with it, and it makes me think it more probable that I made a mistake.