Max Harms comments on Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals

Max Harms 30 Jan 2025 16:54 UTC
5 points
0
This seems mostly right. I think there still might be problems where identifying and charging for relevant externalities is computationally harder than routing around them. For instance, say you’re dealing with a civilization (such as humanity) that is responding to your actions in complex and chaotic ways, it may be intractable to find a way to efficiently price “reputation damage” and instead you might want to be overly cautious (i.e. “impose constraints”) and think through deviations from that cautious baseline on a case-by-case basis (i.e. “forward-check”). Again, I think your point is mostly right, and a useful frame—it makes me less likely to expect the kinds of hard constraints that Wentworth and Lorell propose to show up in practice.
- johnswentworth 30 Jan 2025 17:07 UTC
  5 points
  0
  Parent
  TBC, I don’t particularly expect hard constraints to show up, that was more a way of illustrating the underlying concept. The same underlying concept in the the market-style picture would be: across many different top-level goals, there are convergent ways of carving up “property rights”. So, a system can be generally corrigible by “respecting the convergent property rights”, so to speak.