MichaelStJules comments on AGI Safety FAQ / all-dumb-questions-allowed thread

MichaelStJules 8 Jun 2022 6:40 UTC
6 points
0
Is it possible to ensure an AGI effectively acts according to a bounded utility function, with “do nothing” always a safe/decent option?

The goal would be to increase risk aversion enough that practical external deterrence is enough to keep that AGI from killing us all.

Maybe some more hardcoding or hand engineering in the designs?
- lc 8 Jun 2022 19:24 UTC
  2 points
  0
  Parent
  Maybe, but we don’t have a particularly good understanding of how we would do that. This is sometimes termed “strawberry alignment”. Also, again, you have to figure out how to take “strawberry alignment” and use it to solve the problem that someone is eventually gonna do “not strawberry alignment”.
- [ ]
  [deleted]