Noosphere89 comments on People care about each other even though they have imperfect motivational pointers?

Noosphere89 9 Nov 2022 15:31 UTC
1 point
0

Also taken too far you run into problems with Eudaimonia. We probably wouldn’t want AI to remove all challenge.

I usually don’t consider this a problem, since I have different atomic building blocks for my value set.

However, if I was going to criticize it, I’d criticize the fact that inner-alignment issues incentivize it to deceive us.

It’s still an advance. If the core claims are correct, then it solves the entire outer alignment problem in one go, including Goodhart problems.

Now I get the skepticalness of this solution, because from the outside view, someone (solving a major problem with pet theory) almost never happens, and a lot of the efforts have turned out not to work.
- jacob_cannell 9 Nov 2022 18:47 UTC
  2 points
  0
  Parent
  Now I get the skepticalness of this solution, because from the outside view, someone (solving a major problem with pet theory) almost never happens, and a lot of the efforts have turned out not to work.
  
  If you are talking about external empowerment I wasn’t the first to write up that concept—that credit goes to Franzmeyer et al.^[1] Admittedly my conception is a little different and my writeup focuses more on the longer term consequences, but they have the core idea there.
  
  If you are talking about how empowerment arises naturally from just using correct decision making under uncertainty in situations where you have future value of information that improves subsequent future value estimates—that idea may be more novel and I’ll probably write it up if it isn’t so novel that it has non-epsilon AI capability value. (Some quick google searches reveals some related ‘soft’ decision RL approaches that seem similar)
  ↩︎
  Franzmeyer, Tim, Mateusz Malinowski, and João F. Henriques. “Learning Altruistic Behaviours in Reinforcement Learning without External Rewards.” arXiv preprint arXiv:2107.09598 (2021).