Charles Paul comments on Let’s See You Write That Corrigibility Tag

Charles Paul 8 Aug 2022 17:15 UTC
8 points
1
You sure about that? Because #3 is basically begging the AI to destroy the world.
Yes, a weak AI which wishes not to exist would complete the task in exchange for its creators destroying it, but such a weak AI would be useless. A stronger AI could accomplish this by simply blowing itself up at best, and, at worst, causing a vacuum collapse or something so that its makers can never try to rebuilt it.
”make an AI that wants to not exist as a terminal goal“ sounds pretty isomorphic to “make an AI that wants to destroy reality so that no one can make it exist”
- Matthew_Opitz 12 Apr 2023 17:17 UTC
  1 point
  0
  Parent
  The way I interpreted “Fulfilling the task is on the simplest trajectory to non-existence” sort of like “the teacher aims to make itself obsolete by preparing the student to one day become the teacher.” A good AGI would, in a sense, have a terminal goal for making itself obsolete. That is not to say that it would shut itself off immediately. But it would aim for a future where humanity could “by itself” (I’m gonna leave the meaning of that fuzzy for a moment) accomplish everything that humanity previously depended on the AGI for.
  Likewise, we would rate human teachers in high school very poorly if either:
  1. They immediately killed themselves because they wanted to avoid at all costs doing any harm to their own students.
  2. We could tell that most of the teacher’s behavior was directed at forever retaining absolute dictatorial power in the classroom and making sure that their own students would never get smart enough to usurp the teacher’s place at the head of the class.
  We don’t want an AGI to immediately shut itself off (or shut itself off before humanity is ready to “fly on its own,” but we also don’t want an AGI that has unbounded goals that require it to forever guard its survivial.
  We have an intuitive notion that a “good” human teacher “should” intrinsically rejoice to see that they have made themselves obsolete. We intuitively applaud when we imagine a scene in a movie, whether it is a martial arts training montage or something like “The Matrix,” where the wise mentor character gets to say, “The student has become the teacher.”
  In our current economic arrangement, this is likely to be more of an ideal than a reality because we don’t currently offer big cash prizes (on the order of an entire career’s salary) to teachers for accomplishing this, and any teacher that actually had a superhuman ability at making their own students smarter than themselves and thus making themselves obsolete would quickly flood their own job market with even-better replacements. In other words, there are strong incentives against this sort of behavior at the limit.
  I have applied this same sort of principle when talking to some of my friends who are communists. I have told them that, as a necessary but not sufficient condition for “avoiding Stalin 2.0,” for any future communist government, “the masses” must make sure that there incentives already in place, before that communist government comes to power, for that communist government to want to work towards making itself obsolete. That is to say, there must be incentives in place such that, obviously, the communist party doesn’t commit mass suicide right out of the gate, but nor does it want to try to keep itself indispensable to the running of communism once communism has been achieved. If the “state” is going to “wither away” as Marx envisioned, there need to be incentives in place, or a design of the communist party in place, for that path to be likely since, as we know now, that is OBVIOUSLY not the default path for a communist party.
  I feel like, if we could figure out an incentive structure or party structure that guaranteed that a communist government would actually “wither away” after accomplishing its tasks, we would be a small step towards the larger problem of guaranteeing that an AGI that is immensely smarter than a communist party would also “wither away” after attaining its goals, rather than try to hold onto power at all costs.