I basically agree with you. I think you go too far in saying Lethailty 19 is solved, though. Using the 3 feats from your linked comment, which I’ll summarise as “produce a mind that...”:
cares about something
cares about something external (not shallow function of local sensory data)
cares about something specific and external
(clearly each one is strictly harder than the previous) I recognise that Lethality 19 concerns feat 3, though it is worded as if being about both feat 2 and feat 3.
I think I need to distinguish two versions of feat 3:
there is a reliable (and maybe predictable) mapping between the specific targets of caring and the mind-producing process
there is a principal who gets to choose what the specific targets of caring are (and they succeed)
Humans show that feat 2 at least has been accomplished, but also 3a, as I take you to be pointing out. I maintain that 3b is not demonstrated by humans and is probably something we need.
Hm. I feel confused about the importance of 3b as opposed to 3a. Here’s my first guess: Because we need to target the AI’s motivation in particular ways in order to align it with particular desired goals, it’s important for there not just to be a predictable mapping, but a flexibly steerable one, such that we can choose to steer towards “dog” or “rock” or “cheese wheels” or “cooperating with humans.”
I basically agree with you. I think you go too far in saying Lethailty 19 is solved, though. Using the 3 feats from your linked comment, which I’ll summarise as “produce a mind that...”:
cares about something
cares about something external (not shallow function of local sensory data)
cares about something specific and external
(clearly each one is strictly harder than the previous) I recognise that Lethality 19 concerns feat 3, though it is worded as if being about both feat 2 and feat 3.
I think I need to distinguish two versions of feat 3:
there is a reliable (and maybe predictable) mapping between the specific targets of caring and the mind-producing process
there is a principal who gets to choose what the specific targets of caring are (and they succeed)
Humans show that feat 2 at least has been accomplished, but also 3a, as I take you to be pointing out. I maintain that 3b is not demonstrated by humans and is probably something we need.
Hm. I feel confused about the importance of 3b as opposed to 3a. Here’s my first guess: Because we need to target the AI’s motivation in particular ways in order to align it with particular desired goals, it’s important for there not just to be a predictable mapping, but a flexibly steerable one, such that we can choose to steer towards “dog” or “rock” or “cheese wheels” or “cooperating with humans.”
Is this close?
Yes that sounds right to me.