No77e comments on All AGI Safety questions welcome (especially basic ones) [~monthly thread]

No77e 12 Jan 2023 10:10 UTC
6 points
0
I’m going to re-ask all my questions that I don’t think have received a satisfactory answer. Some of them are probably basic, some other maybe less so:
- lc 26 Jan 2023 23:26 UTC
  4 points
  0
  Parent
  For “1. Why would CEV be difficult to learn?”: I’m not an alignment researcher, so someone might be cringing at my answers. That said, responding to some aspects of the initial comment:
  
  Humans are relatively dumb, so why can’t even a relatively dumb AI learn the same ability to distinguish utopias from dystopias?
  
  The problem is not building AIs that are capable of distinguishing human utopias from dystopias—that’s largely a given if you have general intelligence. The problem is building AIs that target human utopia safely first-try. It’s not a matter of giving AIs some internal module native to humans that lets them discern good outcomes from bad outcomes, it’s having them care about that nuance at all.
  
  if CEV is impossible to learn first try, why not shoot for something less ambitious? Value is fragile, OK, but aren’t there easier utopias?
  
  I would suppose (as aforementioned, being empirically bad at this kind of analysis) that the problem is inherent to giving AIs open-ended goals that require wresting control of the Earth and its resources from humans, which is what “shooting for utopia” would involve. Strawberry tasks, being something that naively seems more amenable to things like power-seeking penalties and oversight via interpretability tools, sound easier to perform safely than strict optimization of any particular target.
- Lone Pine 27 Jan 2023 9:02 UTC
  2 points
  0
  Parent
  On the topic of decision theories, is there a decision theory that is “least weird” from a “normal human” perspective? Most people don’t factor alternate universes and people who actually don’t exist into their everyday decision making process, and it seems reasonable that there should be a decision theory that resembles humans in that way.
  - Anonymous 27 Jan 2023 15:47 UTC
    3 points
    2
    Parent
    Normal, standard causal decision theory is probably it. You can make a case that people sometimes intuitively use evidential decision theory (“Do it. You’ll be glad you did.”) but if asked to spell out their decision making process, most would probably describe causal decision theory.
    - Throwaway2367 27 Jan 2023 16:06 UTC
      1 point
      2
      Parent
      People also sometimes use fdt: “don’t throw away that particular piece of trash onto the road! If everyone did that we would live among trash heaps!” Of course throwing away one piece of trash would not directly (mostly) cause others to throw away their trash, the reasoning is using the subjunctive dependence between one’s action and others’ action mediated through human morality and comparing the possible future states’ desirability.