amc comments on Plausible cases for HRAD work, and locating the crux in the “realism about rationality” debate

amc 17 Aug 2020 23:13 UTC
1 point

a mathematical formalization of alignment

I can barely see how this is possible if we’re talking about alignment to humans, even with a hypothetical formal theory of embedded agency. Do you imagine human values are cleanly represented and extractable, and that we can (potentially very indirectly) reference those values formally? Do you mean something else by “formalization of alignment” that doesn’t involve formal descriptions of human minds?
- Vanessa Kosoy 18 Aug 2020 8:04 UTC
  2 points
  Parent
  For examples of what a formalization of alignment could look like, see this and this.