Seth Herd comments on Kabir Kumar’s Shortform

Seth Herd 4 Nov 2024 16:13 UTC
5 points
0
I’d say you’re addressing the question of goalcrafting or selecting alignment targets.

I think you’ve got the right answer for technical alignment goals; but the question remains of what human would control that AGI. See my “if we solve alignment, do we all die anyway” for the problems with that scenario.

Spoiler alert; we do all die anyway if really selfish people get control of AGIs. And selfish people tend to work harder at getting power.

But I do think your goal defintion is a good alignment target for the technical work. I don’t think there’s a better one. I do prefer instruction following or corriginlilty by the definitions in the posts I linked above because they’re less rigid, but they’re both very similar to your definition.
- Kabir Kumar 5 Nov 2024 1:27 UTC
  1 point
  0
  Parent
  I pretty much agree. I prefer rigid definitions because they’re less ambiguous to test and more robust to deception. And this field has a lot of deception.