Overall I don’t see how we could tell in advance whether a system would be self-unaware or not.
A sufficient condition here should be a lack of feedback loops that include information about the agent. I’m not sure that this is necessary, though, and there may be some more lax criteria we could live with.
This is mostly all theoretically, though, because it’s going to be very hard to create any system which is actually embedded in the world and present it information that is not causally influenced by itself since that seems theoretically impossible, though you might be able to achieve practically good enough acausality via some kind of data “scrubbing” procedure, though I don’t hold out much hope there given how hard it is to achieve this even in narrow cases*.
*I speak from firsthand experience here. I used to work at a company that sold ad insights and required collecting PII about users. We scrubbed the data for EU export, and we met the legal standard, but we also later figured out we could still triangulate the identity for any customer given their anonymized data to within an accuracy of 3 individuals on average.
Well, it takes two things: (1) Self-knowledge (“I wrote ‘0’ into register X”, “I am thinking about turtles”, etc. being in the world-model) and (2) knowledge of things causal consequences of that (the programmers see the 1 in register X and then change their behavior). With both of those, the system can learn causal links between its own decisions and the rest of the world, and can therefore effect real-world consequences.
Out of these two options, I think you’re proposing to cut off path (2), which I agree is very challenging. I am proposing to cut off path (1) instead, and not worry about path (2). Thus it’s a cybersecurity-type hardware/software design challenge, not a data sanitation challenge.
Given how I’m now thinking of what you mean by self-unawareness (in that it includes a lack of optimization and learning), (I) seems uninteresting here, since this seems to me to be suggesting that we build “oracles” that are not AI but instead regular software.
A sufficient condition here should be a lack of feedback loops that include information about the agent. I’m not sure that this is necessary, though, and there may be some more lax criteria we could live with.
This is mostly all theoretically, though, because it’s going to be very hard to create any system which is actually embedded in the world and present it information that is not causally influenced by itself since that seems theoretically impossible, though you might be able to achieve practically good enough acausality via some kind of data “scrubbing” procedure, though I don’t hold out much hope there given how hard it is to achieve this even in narrow cases*.
*I speak from firsthand experience here. I used to work at a company that sold ad insights and required collecting PII about users. We scrubbed the data for EU export, and we met the legal standard, but we also later figured out we could still triangulate the identity for any customer given their anonymized data to within an accuracy of 3 individuals on average.
Agreed. Also agreed that this seems very difficult, both in theory and in practice.
Well, it takes two things: (1) Self-knowledge (“I wrote ‘0’ into register X”, “I am thinking about turtles”, etc. being in the world-model) and (2) knowledge of things causal consequences of that (the programmers see the 1 in register X and then change their behavior). With both of those, the system can learn causal links between its own decisions and the rest of the world, and can therefore effect real-world consequences.
Out of these two options, I think you’re proposing to cut off path (2), which I agree is very challenging. I am proposing to cut off path (1) instead, and not worry about path (2). Thus it’s a cybersecurity-type hardware/software design challenge, not a data sanitation challenge.
Given how I’m now thinking of what you mean by self-unawareness (in that it includes a lack of optimization and learning), (I) seems uninteresting here, since this seems to me to be suggesting that we build “oracles” that are not AI but instead regular software.