Kabir Kumar comments on Kabir Kumar’s Shortform

Kabir Kumar 4 Nov 2024 14:49 UTC
3 points
0
Yup, those are hard. Was just thinking of a definition for the alignment problem, since I’ve not really seen any good ones.
- Seth Herd 4 Nov 2024 16:13 UTC
  5 points
  0
  Parent
  I’d say you’re addressing the question of goalcrafting or selecting alignment targets.
  
  I think you’ve got the right answer for technical alignment goals; but the question remains of what human would control that AGI. See my “if we solve alignment, do we all die anyway” for the problems with that scenario.
  
  Spoiler alert; we do all die anyway if really selfish people get control of AGIs. And selfish people tend to work harder at getting power.
  
  But I do think your goal defintion is a good alignment target for the technical work. I don’t think there’s a better one. I do prefer instruction following or corriginlilty by the definitions in the posts I linked above because they’re less rigid, but they’re both very similar to your definition.
  - Kabir Kumar 5 Nov 2024 1:27 UTC
    1 point
    0
    Parent
    I pretty much agree. I prefer rigid definitions because they’re less ambiguous to test and more robust to deception. And this field has a lot of deception.