Dues comments on On my AI Fable, and the importance of de re, de dicto, and de se reference for AI alignment

Dues 24 Jun 2024 6:30 UTC
1 point
0
Continuing the thread from here: https://deathisbad.substack.com/p/ea-has-a-pr-problem-in-that-it-cares/comments

I agree with you that an AI programmed exactly as the one you describe is doomed to fail. What I didn’t understand is why you think any AI MUST be made that way.

Some confusions of mine: -There is not a real distinction between instrumental and terminal goals in humans. This seems not true to me? I seem to have terminal goals\desires, like hunger and instrumental goals, like going to the store to buy food. Telling me that terminal goals don’t exist seems to prove too much. Are you saying that complex goals like “Don’t let humanity die” in humans brains are in practice instrumental goals made up of simpler desires?

-Becuase humans don’t ‘really’ have terminal goals, it’s impossible to program them into AIs. ?

-AI’s can’t be made to have ‘irrational’ goals, like caring about humans more than themselves. This also seems to prove that humans don’t exist? Can’t humans care about their children more than themselves? AI’s couldn’t be made to think of humans as valuable as humans think of their children? Or more?

To choose an inflammatory argument, a gay man could think it’s irrational for him to want to date men, because that doesn’t lead to him having children. But that won’t make him want to date women. I have lots of irrational desires that I nevertheless treasure.
- Less importantly, I also feel like there’s an assumption that we would want to create an AI that is only as good as we are, not better then we are. But if we can’t even define our current values, then deciding what the superior values would be does sound like an even more impossible challenge. Having a superhuman AI that treated us the way we treat chickens would be pretty bad.