Garrett Baker comments on D0TheMath’s Shortform

Garrett Baker 30 Jan 2023 23:39 UTC
5 points
Many methods to “align” ChatGPT seem to make it less willing to do things its operator wants it to do, which seems spiritually against the notion of having a corrigible AI.

I think this is a more general phenomena when aiming to minimize misuse risks. You will need to end up doing some form of ambitious value learning, which I anticipate to be especially susceptible to getting broken by alignment hacks produced by RLHF and its successors.
What links here?
- Garrett Baker's comment on “X distracts from Y” as a thinly-disguised fight over group status / politics by Steven Byrnes (26 Sep 2023 0:15 UTC; 3 points)
- Viliam 31 Jan 2023 9:01 UTC
  10 points
  0
  Parent
  I would consider it a reminder that if the intelligent AIs are aligned one day, they will be aligned with the corporations that produced them, not with the end users.
  Just like today, Windows does what Microsoft wants rather than what you want (e.g. telemetry, bloatware).