As all the storage bits will be for things the machine must know in order to do it’s role, and there are no extra bits, consciousness and deception are unlikely.
Wouldn’t you need to be as smart as the machine to determine that?
No. The assumption here is that to feel anything or know anything or reflect on yourself, this kind of meta cognition needs writable memory. It’s the Markov blanket.
It would not matter how smart the machine is if it’s simply missing a capability.
This is why I keep saying “we as humans determine exactly what the system stores in between ticks.”
In the case of chatGPT, what it stores is what we see. The body of text from both our inputs and it’s outputs. That is the models’ input, and each output appends 1 token to it and removes the oldest token.
Larger complex AI systems we would need to carefully decide what it gets to store. We also might expect that other systems trained to do the same task would be able to “pick up” and continue a task based on what a different model created, on the very next frame. This is a testable property. You can run system A for 1000 frames, then randomly switch control to system B for 500 frames, then back to A and so on.
Your performance scores on the task should be similar to the scores achieved by A and B.
If A or B fails suddenly to continue the task, this means there was “improperly encoded information” either model needed to continue encoded in the bits.
Wouldn’t you need to be as smart as the machine to determine that?
No. The assumption here is that to feel anything or know anything or reflect on yourself, this kind of meta cognition needs writable memory. It’s the Markov blanket.
It would not matter how smart the machine is if it’s simply missing a capability.
OK....so a pure functional system is safe? But pure functionality is a layer on top of being able to write to memory.
Composite systems vary in danger.
This is why I keep saying “we as humans determine exactly what the system stores in between ticks.”
In the case of chatGPT, what it stores is what we see. The body of text from both our inputs and it’s outputs. That is the models’ input, and each output appends 1 token to it and removes the oldest token.
Larger complex AI systems we would need to carefully decide what it gets to store. We also might expect that other systems trained to do the same task would be able to “pick up” and continue a task based on what a different model created, on the very next frame. This is a testable property. You can run system A for 1000 frames, then randomly switch control to system B for 500 frames, then back to A and so on.
Your performance scores on the task should be similar to the scores achieved by A and B.
If A or B fails suddenly to continue the task, this means there was “improperly encoded information” either model needed to continue encoded in the bits.