Seth Herd comments on If we solve alignment, do we die anyway?

Seth Herd 7 Jan 2025 17:15 UTC
4 points
0
Oh hey—I just stumbled back on this comment and realized: it’s the primary reason I wrote

Intent alignment as a stepping-stone to value alignment

On not giving up on value alignment, while acknowledging that instruction-following is a much safer first alignment target.
- sweenesm 7 Jan 2025 20:23 UTC
  1 point
  0
  Parent
  Thanks. I guess I’d just prefer it if more people were saying, “Hey, even though it seems difficult, we need to go hard after conscience guard rails (or ‘value alignment’) for AI now and not wait until we have AI’s that could help us figure this out. Otherwise, some of us we might not make it until we have AI’s that could help us figure this out.” But I also realize that I’m just generally much more optimistic about the tractability of this problem than most people appear to be, although Shane Legg seemed to say it wasn’t “too hard,” haha.^[1]
  1. ^
    Legg was talking about something different than I am, though—he was talking about “fairly normal” human values and ethics, or what most people value, while I’m basically talking about what most people would value if they were wiser.