Consciousness therefore only happens if it improves performance at the task we have assigned. And some tasks like interacting directly with humans it might improve performance.
I don’t think this is necessarily true. Consciousness could be a side effect of other processes that do improve performance.
The way I’ve heard this put: a polar bear has thick hair so that it doesn’t get too cold, and this is good for its evolutionary fitness. The fact that the hair is extremely heavy is simply a side effect of this. Consciousness could possibly me similar.
I checked and what I am proposing is called a “Markov Blanket”.
It makes consciousness and all the other failures of the same category unlikely. Not impossible, but it may in practice make them unlikely enough they will never happen.
It’s simple: we as humans determine exactly what the system stores in between ticks. As all the storage bits will be for things the machine must know in order to do it’s role, and there are no extra bits, consciousness and deception are unlikely.
Consciousness is a subjective experience, meaning you must have memory to actually reflect on the whole I think therefore I am. If you have no internal memory, you can’t have an internal narrative. All your bits are dedicated to tracking which shelf in the warehouse you are trying to reach and the parameters of the item you are carrying, as an example.
It makes deception also difficult, maybe impossible. At a minimum, to have a plan to deceive, it means that when you get input “A”, when it’s not time to do your evil plans, you do approved action X. You have a bit “TrueColors” that gets set when some conditions are met “Everything is in place”, and when the bit is set, on input “A”, you are going to do evil action Y.
Deception of all types is this: you’re doing something else on the same input.
Obviously, if there are no spaces in memory to know when to do a bad thing, when you get A you have no choice but to do the good thing. Even stochastically deciding to do bad will probably get caught in training.
As all the storage bits will be for things the machine must know in order to do it’s role, and there are no extra bits, consciousness and deception are unlikely.
Wouldn’t you need to be as smart as the machine to determine that?
No. The assumption here is that to feel anything or know anything or reflect on yourself, this kind of meta cognition needs writable memory. It’s the Markov blanket.
It would not matter how smart the machine is if it’s simply missing a capability.
This is why I keep saying “we as humans determine exactly what the system stores in between ticks.”
In the case of chatGPT, what it stores is what we see. The body of text from both our inputs and it’s outputs. That is the models’ input, and each output appends 1 token to it and removes the oldest token.
Larger complex AI systems we would need to carefully decide what it gets to store. We also might expect that other systems trained to do the same task would be able to “pick up” and continue a task based on what a different model created, on the very next frame. This is a testable property. You can run system A for 1000 frames, then randomly switch control to system B for 500 frames, then back to A and so on.
Your performance scores on the task should be similar to the scores achieved by A and B.
If A or B fails suddenly to continue the task, this means there was “improperly encoded information” either model needed to continue encoded in the bits.
I don’t think this is necessarily true. Consciousness could be a side effect of other processes that do improve performance.
The way I’ve heard this put: a polar bear has thick hair so that it doesn’t get too cold, and this is good for its evolutionary fitness. The fact that the hair is extremely heavy is simply a side effect of this. Consciousness could possibly me similar.
I checked and what I am proposing is called a “Markov Blanket”.
It makes consciousness and all the other failures of the same category unlikely. Not impossible, but it may in practice make them unlikely enough they will never happen.
It’s simple: we as humans determine exactly what the system stores in between ticks. As all the storage bits will be for things the machine must know in order to do it’s role, and there are no extra bits, consciousness and deception are unlikely.
Consciousness is a subjective experience, meaning you must have memory to actually reflect on the whole I think therefore I am. If you have no internal memory, you can’t have an internal narrative. All your bits are dedicated to tracking which shelf in the warehouse you are trying to reach and the parameters of the item you are carrying, as an example.
It makes deception also difficult, maybe impossible. At a minimum, to have a plan to deceive, it means that when you get input “A”, when it’s not time to do your evil plans, you do approved action X. You have a bit “TrueColors” that gets set when some conditions are met “Everything is in place”, and when the bit is set, on input “A”, you are going to do evil action Y.
Deception of all types is this: you’re doing something else on the same input.
Obviously, if there are no spaces in memory to know when to do a bad thing, when you get A you have no choice but to do the good thing. Even stochastically deciding to do bad will probably get caught in training.
Wouldn’t you need to be as smart as the machine to determine that?
No. The assumption here is that to feel anything or know anything or reflect on yourself, this kind of meta cognition needs writable memory. It’s the Markov blanket.
It would not matter how smart the machine is if it’s simply missing a capability.
OK....so a pure functional system is safe? But pure functionality is a layer on top of being able to write to memory.
Composite systems vary in danger.
This is why I keep saying “we as humans determine exactly what the system stores in between ticks.”
In the case of chatGPT, what it stores is what we see. The body of text from both our inputs and it’s outputs. That is the models’ input, and each output appends 1 token to it and removes the oldest token.
Larger complex AI systems we would need to carefully decide what it gets to store. We also might expect that other systems trained to do the same task would be able to “pick up” and continue a task based on what a different model created, on the very next frame. This is a testable property. You can run system A for 1000 frames, then randomly switch control to system B for 500 frames, then back to A and so on.
Your performance scores on the task should be similar to the scores achieved by A and B.
If A or B fails suddenly to continue the task, this means there was “improperly encoded information” either model needed to continue encoded in the bits.