My interpretation of this interaction (which is fascinating to read, btw, because both of you are eloquently defending a cogent and interesting theory as far as I can tell) is that you’ve indirectly proposed Robot-1 as the initial model of an agent (which is clearly not a full model of a person and fails to capture many features of humans) in the first of a series of articles. I think Richard is objecting to the connections he presumes that you will eventually draw between Robot-1 and actual humans, and you’re getting confused because you’re just trying to talk about the thing you actually said, not the eventual conclusions he expects you to draw from your example.
If he’s expecting you to verbally zig when you’re actually planning to zag and you don’t notice that he’s trying to head you off at a pass you’re not even heading towards, its entirely reasonable for you to be confused by what he’s saying. (And if some of the audience also thinks you’re going to zig they’ll also see the theory he’s arguing against, and see that his arguments against “your predicted eventual conclusions” are valid, and upvote his criticism of something you haven’t yet said. And both of you are quite thoughtful and polite and educated so its good reading even if there is some confusion mixed into the back and forth.)
The place I think you were ambiguous enough to be misinterpreted was roughly here:
Suppose the robot had human level intelligence in some side module, but no access to its own source code; that it could learn about itself only through observing its own actions. The robot might come to the same conclusions we did: that it is a blue-minimizer, set upon a holy quest to rid the world of the scourge of blue objects.
You use the phrase “human level intelligence” and talk about the robot making the same fuzzy inferential leap that outside human observer’s might make. Also, this is remarkably close to how some humans with very poor impulse control actually seem to function, modulo some different reflexes and an moderately unreasonable belief in their own deliberative agency (a la Blindsight with the “Jubyr fcrpvrf vf ntabfvp ol qrsnhyg” line and so on).
If you had said up front that you’re using this as a toy model which has (for example) too few layers and no feedback from the “meta-observer” module to be a honestly plausible model of “properly functioning cohesively agentive mammals” I think Richard would not have made the mistake that I think he’s making about what you’re about to say. He keeps talking about a robust and vastly more complex model than Robot-1 (that being a multi-layer purposive control system) and talking about how not just hypothetical PCT algorithms but actual humans function and you haven’t directly answered these concerns by saying clearly “I am not talking about humans yet, I’m just building conceptual vocabulary by showing how something clearly simpler might function to illustrate mechanistic thinking about mental processes”.
It might have helped if you were clear about the possibility that Robot-1 would emit words more like we might expect someone to emit several years after a serious brain lesion that severed some vital connections in their brain, after they’re verbal reasoning systems had updated on the lack of a functional connection between their conscious/verbal brain parts and their deeper body control systems. Like Robot-1 seems likely to me to end up saying something like “Watch out, I’m not just having a mental breakdown but I’ve never had any control over my body+brainstems’s actions in the first place! I have no volitional control over my behavior! If you’re wearing blue then take off the shirt or run away before I happen to turn around and see you and my reflex kicks in and my body tries to kill you. Dear god this sucks! Oh how I wish my mental architecture wasn’t so broken...”
For what its worth, I think the Robot-1 example is conceptually useful and I’m really looking forward to seeing how the whole sequence plays out :-)
My interpretation of this interaction (which is fascinating to read, btw, because both of you are eloquently defending a cogent and interesting theory as far as I can tell) is that you’ve indirectly proposed Robot-1 as the initial model of an agent (which is clearly not a full model of a person and fails to capture many features of humans) in the first of a series of articles. I think Richard is objecting to the connections he presumes that you will eventually draw between Robot-1 and actual humans, and you’re getting confused because you’re just trying to talk about the thing you actually said, not the eventual conclusions he expects you to draw from your example.
If he’s expecting you to verbally zig when you’re actually planning to zag and you don’t notice that he’s trying to head you off at a pass you’re not even heading towards, its entirely reasonable for you to be confused by what he’s saying. (And if some of the audience also thinks you’re going to zig they’ll also see the theory he’s arguing against, and see that his arguments against “your predicted eventual conclusions” are valid, and upvote his criticism of something you haven’t yet said. And both of you are quite thoughtful and polite and educated so its good reading even if there is some confusion mixed into the back and forth.)
The place I think you were ambiguous enough to be misinterpreted was roughly here:
You use the phrase “human level intelligence” and talk about the robot making the same fuzzy inferential leap that outside human observer’s might make. Also, this is remarkably close to how some humans with very poor impulse control actually seem to function, modulo some different reflexes and an moderately unreasonable belief in their own deliberative agency (a la Blindsight with the “Jubyr fcrpvrf vf ntabfvp ol qrsnhyg” line and so on).
If you had said up front that you’re using this as a toy model which has (for example) too few layers and no feedback from the “meta-observer” module to be a honestly plausible model of “properly functioning cohesively agentive mammals” I think Richard would not have made the mistake that I think he’s making about what you’re about to say. He keeps talking about a robust and vastly more complex model than Robot-1 (that being a multi-layer purposive control system) and talking about how not just hypothetical PCT algorithms but actual humans function and you haven’t directly answered these concerns by saying clearly “I am not talking about humans yet, I’m just building conceptual vocabulary by showing how something clearly simpler might function to illustrate mechanistic thinking about mental processes”.
It might have helped if you were clear about the possibility that Robot-1 would emit words more like we might expect someone to emit several years after a serious brain lesion that severed some vital connections in their brain, after they’re verbal reasoning systems had updated on the lack of a functional connection between their conscious/verbal brain parts and their deeper body control systems. Like Robot-1 seems likely to me to end up saying something like “Watch out, I’m not just having a mental breakdown but I’ve never had any control over my body+brainstems’s actions in the first place! I have no volitional control over my behavior! If you’re wearing blue then take off the shirt or run away before I happen to turn around and see you and my reflex kicks in and my body tries to kill you. Dear god this sucks! Oh how I wish my mental architecture wasn’t so broken...”
For what its worth, I think the Robot-1 example is conceptually useful and I’m really looking forward to seeing how the whole sequence plays out :-)