Well, it takes two things: (1) Self-knowledge (“I wrote ‘0’ into register X”, “I am thinking about turtles”, etc. being in the world-model) and (2) knowledge of things causal consequences of that (the programmers see the 1 in register X and then change their behavior). With both of those, the system can learn causal links between its own decisions and the rest of the world, and can therefore effect real-world consequences.
Out of these two options, I think you’re proposing to cut off path (2), which I agree is very challenging. I am proposing to cut off path (1) instead, and not worry about path (2). Thus it’s a cybersecurity-type hardware/software design challenge, not a data sanitation challenge.
Given how I’m now thinking of what you mean by self-unawareness (in that it includes a lack of optimization and learning), (I) seems uninteresting here, since this seems to me to be suggesting that we build “oracles” that are not AI but instead regular software.
Well, it takes two things: (1) Self-knowledge (“I wrote ‘0’ into register X”, “I am thinking about turtles”, etc. being in the world-model) and (2) knowledge of things causal consequences of that (the programmers see the 1 in register X and then change their behavior). With both of those, the system can learn causal links between its own decisions and the rest of the world, and can therefore effect real-world consequences.
Out of these two options, I think you’re proposing to cut off path (2), which I agree is very challenging. I am proposing to cut off path (1) instead, and not worry about path (2). Thus it’s a cybersecurity-type hardware/software design challenge, not a data sanitation challenge.
Given how I’m now thinking of what you mean by self-unawareness (in that it includes a lack of optimization and learning), (I) seems uninteresting here, since this seems to me to be suggesting that we build “oracles” that are not AI but instead regular software.