Mart_Korz answers What will happen when an all-reaching AGI starts attempting to fix human character flaws?

Mart_Korz 1 Jun 2022 21:20 UTC
9 points
If one believes the orthogonality thesis (and we only need a very weak version of it), just knowing that there is an AGI trying to improve the world is not enough to predict how exactly it would reason about the more quirky aspects about human character and values. It seems to me that something that could be called “AGI-humans” is quite possible, but a more alien-to-us “total hedonistic utility maximizing AGI” also seems possible.

From how I understood arguments of Eliezer Yudkowsky here, the way that we are selecting for AI models will favour models with consequentialist decision making (we do select the models that give good results), which tends towards the latter.
Because of this, I would expect an AGI to be more on the far-reaching/utilitarian end of affecting our lives.
With regards to
[...] accept some of the human character flaws and limitations, or will it strip it all away at the risk of hurting the human until the singularity of what is considered acceptable is achieved
if we are talking about an AGI that is aiming for good in a sufficiently aligned sense, it is not obvious that a significant “risk of hurting the human” is necessary to reach a value-optimal state.
But of course a utilitarian-leaning AGI will be more willing to risk actively doing harm if it thinks that the total expected outcome is improved.
- Michael Bright 2 Jun 2022 20:18 UTC
  1 point
  Parent
  If one believes the orthogonality thesis
  Yes, I do.
  I would expect an AGI to be more on the far-reaching/utilitarian end of affecting our lives.
  Me too.
  But I’m adopting the term “AGI-humans” from today.
  But of course a utilitarian-leaning AGI will be more willing to risk actively doing harm if it thinks that the total expected outcome is improved.
  ...