Humans can, to some extent, be pointed to complicated external things. This suggests that using natural selection on biology can get you mesa-optimizers that can be pointed to particular externally specifiable complicated things. Doesn’t prove it (or, doesn’t prove you can do it again), but you only asked for a suggestion.
I don’t think I understand what, exactly, is being discussed. Are “dogs” or “flowers” or “people you meet face-to-face” examples of “complicated external things”?
Right, but the goal is to make AGI you can point at things, not to make AGI you can point at things using some particular technique.
(Tangentially, I also think the jury is still out on whether humans are bad fitness maximizers, and if we’re ultimately particularly good at it—e.g. let’s say, barring AGI disaster, we’d eventually colonise the galaxy—that probably means AGI alignment is harder, not easier)
To my eye, this seems like it mostly establishes ‘it’s not impossible in principle for an optimizer to have a goal that relates to the physical world’. But we had no reason to doubt this in the first place, and it doesn’t give us a way to reliably pick in advance which physical things the optimizer cares about. “It’s not impossible” is a given for basically everything in AI, in principle, if you have arbitrary amounts of time and arbitrarily deep understanding.
As I said (a few times!) in the discussion about orthogonality, indifference about the measure of “agents” that have particular properties seems crazy to me. Having an example of “agents” that behave in a particular way is a enormously different to having an unproven claim that such agents might be mathematically possible.
Humans can, to some extent, be pointed to complicated external things. This suggests that using natural selection on biology can get you mesa-optimizers that can be pointed to particular externally specifiable complicated things. Doesn’t prove it (or, doesn’t prove you can do it again), but you only asked for a suggestion.
Humans can be pointed at complicated external things by other humans on their own cognitive level, not by their lower maker of natural selection.
I don’t think I understand what, exactly, is being discussed. Are “dogs” or “flowers” or “people you meet face-to-face” examples of “complicated external things”?
Right, but the goal is to make AGI you can point at things, not to make AGI you can point at things using some particular technique.
(Tangentially, I also think the jury is still out on whether humans are bad fitness maximizers, and if we’re ultimately particularly good at it—e.g. let’s say, barring AGI disaster, we’d eventually colonise the galaxy—that probably means AGI alignment is harder, not easier)
To my eye, this seems like it mostly establishes ‘it’s not impossible in principle for an optimizer to have a goal that relates to the physical world’. But we had no reason to doubt this in the first place, and it doesn’t give us a way to reliably pick in advance which physical things the optimizer cares about. “It’s not impossible” is a given for basically everything in AI, in principle, if you have arbitrary amounts of time and arbitrarily deep understanding.
As I said (a few times!) in the discussion about orthogonality, indifference about the measure of “agents” that have particular properties seems crazy to me. Having an example of “agents” that behave in a particular way is a enormously different to having an unproven claim that such agents might be mathematically possible.