faul_sname comments on Why “AI alignment” would better be renamed into “Artificial Intention research”

faul_sname 15 Jun 2023 17:22 UTC
8 points
5
I think the issues you point out with the “alignment” name are real issues. That said, the word “intent” comes with its own issues.

Intention doesn’t have to be conscious or communicable. It is just a preference for some futures over others, inferred as an explanation for behavior that chooses some future over others. Like, even single celled organisms have basic intentions if they move towards nutrients or away from bad temperatures.

I don’t think “intention” is necessarily the best word for this unless you go full POSIWID. A goose does not “intend” to drag all vaguely egg-shaped objects to her nest and sit on them, in the sense that I don’t think geese prefer sitting on a clutch of eggs over a clutch of eggs, a lightbulb, and a wooden block. And yet that is the expressed behavior anyway, because lightbulbs were rare and eggs that rolled out of the nest common in the ancestral environment.

I think “artificial system fitness-for-purpose” might come closer to gesturing about what “AI alignment” is pointing at (including being explicit about the bit that it’s a 2-place term), but at the cost of being extremely not catchy.
- Seth Herd 15 Jun 2023 18:07 UTC
  6 points
  0
  Parent
  I agree that intention comes with its own baggage, but I think that baggage is mostly appropriate. Intention usually refers to explicit goals. And those are the ones we’re mostly worried about. I think it’s unhelpful tomix concerns about goal-directed AI with concerns about implicit biases and accidental side effects. So I’d call this another step in the right direction.
  
  I am going to try adopting this terminology, at least in some cases.
  - faul_sname 15 Jun 2023 20:22 UTC
    11 points
    4
    Parent
    So the idea is to use “Artificial Intention” to specifically speak of the subset of concerns about what outcomes an artificial system will try to steer for, rather than the concerns about the world-states that will result in practice from the interaction of that artificial system’s steering plus the steering of everything else in the world?
    
    Makes sense. I expect it’s valuable to also have a term for the bit where you can end up in a situation that nobody was steering for due to the interaction of multiple systems, but explicitly separating those concerns is probably a good idea.