I think there’s a pretty strong argument to be more wary about uploading. It’s been stated a few times on LW, originally by Wei Dai if I remember right, but maybe worth restating here.
Imagine the uploading goes according to plan, the map of your neurons and connections has been copied into a computer, and simulating it leads to a person who talks, walks in a simulated world, and answers questions about their consciousness. But imagine also that the upload is being run on a computer that can apply optimizations on the fly. For example, it could watch the input-output behavior of some NN fragment, learn a smaller and faster NN fragment with the same input-output behavior, and substitute it for the original. Or it could skip executing branches that don’t make a difference to behavior at a given time.
Where do we draw the line which optimizations to allow? It seems we cannot allow all behavior-preserving optimizations, because that might lead to a kind of LLM that dutifully says “I’m conscious” without actually being so. (The p-zombie argument doesn’t apply here, because there is indeed a causal chain from human consciousness to an LLM saying “I’m conscious”—which goes through the LLM’s training data.) But we must allow some optimizations, because today’s computers already apply many optimizations, and compilers even more so. For example, skipping unused branches is pretty standard. The company doing your uploading might not even tell you about the optimizations they use, given that the result will behave just like you anyway, and the 10x speedup is profitable. The result could be a kind of apocalypse by optimization, with nobody noticing. A bit unsettling, no?
The key point of this argument isn’t just that some optimizations are dangerous, but that we have no principled way of telling which ones are. We thought we had philosophical clarity with “just upload all my neurons and connections and then run them on a computer”, but that doesn’t seem enough to answer questions like this. I think it needs new ideas.
Yeah, at some point we’ll need a proper theory of consciousness regardless, since many humans will want to radically self-improve and it’s important to know which cognitive enhancements preserve consciousness.
Yeah. My point was, we can’t even be sure which behavior-preserving optimizations (of the kind done by optimizing compilers, say) will preserve consciousness. It’s worrying because these optimizations can happen innocuously, e.g. when your upload gets migrated to a newer CPU with fancier heuristics. And yeah, when self-modification comes into the picture, it gets even worse.
My current-favourite frame on “qualia” is that it refers to the class of objects we can think about (eg, they’re part of what generates what I say rn) for which behaviour is invariant across structure-preserving transformations.
(There’s probably some cool way to say that with category theory or transformations, and it may or may not give clarity, but idk.)
Eg, my “yellow” could map to blue, and “blue” to yellow, and we could still talk together without noticing anything amiss even if your “yellow” mapped to yellow for you.
Both blue and yellow are representational objects, the things we use to represent/refer to other things with, like memory-addresses in a machine. For externally observable behaviour, it just matters what they dereference to, regardless of where in memory you put them. If you swap two representational objects, while ensuring you don’t change anything about how your neurons link up to causal nodes outside the system, your behaviour stays the same.
Note that this isn’t the case for most objects. I can’t swap hand⇄tomato, without obvious glitches like me saying “what a tasty-looking tomato!” and trying to eat my hand. Hands and tomatoes do not commute.
It’s what allows us to (try to) talk about “tomato” as opposed to just tomato, and explains why we get so confused when we try to ground out (in terms of agreed-upon observables) what we’re talking about when we talk about “tomato”.
But how/why do we have representations for our representational objects in the first place? It’s like declaring a var (address₁↦value), and then declaring a var for that var (address₂↦address₁) while being confused about why the second dereferences to something ‘arbitrary’.
Maybe it starts when somebody asks you “what do you mean by ‘X’?”, and now you have to map the internal generators of [you saying “X”] in order to satisfy their question. Or not. Probably not. Napkin out.
It seems we cannot allow all behavior-preserving optimizations
We can use the same thought experiments that Chalmers uses to establish a fine-grain-functionally-isomorphic copy had the same qualia, modify them and show that anything that acts like us has our qualia.
The LLM character (rather than the LLM itself) will be conscious to the extent to which its behavior is I/O identical to the person.
Edit: Oh, sorry, this is an old comment. I got this recommended… somehow...
Well, a thing that acts like us in one particular situation (say, a thing that types “I’m conscious” in chat) clearly doesn’t always have our qualia. Maybe you could say that a thing that acts like us in all possible situations must have our qualia? This is philosophically interesting! It makes a factual question (does the thing have qualia right now?) logically depend on a huge bundle of counterfactuals, most of which might never be realized. What if, during uploading, we insert a bug that changes our behavior in one of these counterfactuals—but then the upload never actually runs into that situation in the course of its life—does the upload still have the same qualia as the original person, in situations that do get realized? What if we insert quite many such bugs?
Moreover, what if we change the situations themselves? We can put the upload in circumstances that lead to more generic and less informative behavior: for example, give the upload a life where they’re never asked to remember a particular childhood experience. Or just a short life, where they’re never asked about anything much. Let’s say the machine doing the uploading is aware of that, and allowed to optimize out parts that the person won’t get to use. If there’s a thought that you sometimes think, but it doesn’t influence your I/O behavior, it can get optimized away; or if it has only a small influence on your behavior, a few bits’ worth let’s say, then it can be replaced with another thought that would cause the same few-bits effect. There’s a whole spectrum of questionable things that people tend to ignore when they say “copy the neurons”, “copy the I/O behavior” and stuff like that.
Well, a thing that acts like us in one particular situation (say, a thing that types “I’m conscious” in chat) clearly doesn’t always have our qualia. Maybe you could say that a thing that acts like us in all possible situations must have our qualia?
Right, that’s what I meant.
This is philosophically interesting!
Thank you!
It makes a factual question (does the thing have qualia right now?) logically depend on a huge bundle of counterfactuals, most of which might never be realized.
The I/O behavior being the same is a sufficient condition for it to be our mind upload. A sufficient condition for it to have some qualia, as opposed for it to have our mind and our qualia, will be weaker.
What if, during uploading, we insert a bug that changes our behavior in one of these counterfactuals
Then it’s, to a very slight extent, another person (with the continuum between me and another person being gradual).
but then the upload never actually runs into that situation in the course of its life—does the upload still have the same qualia as the original person, in situations that do get realized?
Then the qualia would be very slightly different, unless I’m missing something. (To bootstrap the intuition, I would expect my self that chooses vanilla ice-cream over chocolate icecream in one specific situation to have very slightly different feelings and preferences in general, resulting in very slightly different qualia, even if he never encounters that situation.) With many such bugs, it would be the same, but to a greater extent.
If there’s a thought that you sometimes think, but it doesn’t influence your I/O behavior, it can get optimized away
I don’t think such thoughts exist (I can always be asked to say out loud what I’m thinking). Generally, I would say that a thought that never, even in principle, influences my output, isn’t possible. (The same principle should apply to trying to replace a thought just by a few bits.)
Such optimizations are a reason I believe we are not in a simulation. Optimizations are essential for a large sim. I expect them not to be consciousness preserving
Well, even if we reliably know that certain optimizations make copies not conscious, some people may want to run optimized versions of themselves that are not conscious. People are already making LLMs of themselves based on their writings and stuff. I think Age of Em doesn’t discuss this specific case, but collectives of variously modified Ems may perform better (if only for being cheaper) if they are not conscious. Humans Who Are Not Concentrating Are Not General Intelligences and often not conscious. I’m not conscious when I’m deeply immersed in some subject and only hours later realize how much time has passed—and how much I got done. It’s a kind of automation. Why not run it intentionally?
It seems we cannot allow all behavior-preserving optimizations, because that might lead to a kind of LLM that dutifully says “I’m conscious” without actually being so.
Surely ‘you’ are the algorithm, not the implementation. If I get refactored into a giant lookup table, I don’t think that makes the algorithm any less ‘me’.
I think there’s a pretty strong argument to be more wary about uploading. It’s been stated a few times on LW, originally by Wei Dai if I remember right, but maybe worth restating here.
Imagine the uploading goes according to plan, the map of your neurons and connections has been copied into a computer, and simulating it leads to a person who talks, walks in a simulated world, and answers questions about their consciousness. But imagine also that the upload is being run on a computer that can apply optimizations on the fly. For example, it could watch the input-output behavior of some NN fragment, learn a smaller and faster NN fragment with the same input-output behavior, and substitute it for the original. Or it could skip executing branches that don’t make a difference to behavior at a given time.
Where do we draw the line which optimizations to allow? It seems we cannot allow all behavior-preserving optimizations, because that might lead to a kind of LLM that dutifully says “I’m conscious” without actually being so. (The p-zombie argument doesn’t apply here, because there is indeed a causal chain from human consciousness to an LLM saying “I’m conscious”—which goes through the LLM’s training data.) But we must allow some optimizations, because today’s computers already apply many optimizations, and compilers even more so. For example, skipping unused branches is pretty standard. The company doing your uploading might not even tell you about the optimizations they use, given that the result will behave just like you anyway, and the 10x speedup is profitable. The result could be a kind of apocalypse by optimization, with nobody noticing. A bit unsettling, no?
The key point of this argument isn’t just that some optimizations are dangerous, but that we have no principled way of telling which ones are. We thought we had philosophical clarity with “just upload all my neurons and connections and then run them on a computer”, but that doesn’t seem enough to answer questions like this. I think it needs new ideas.
Yeah, at some point we’ll need a proper theory of consciousness regardless, since many humans will want to radically self-improve and it’s important to know which cognitive enhancements preserve consciousness.
Yeah. My point was, we can’t even be sure which behavior-preserving optimizations (of the kind done by optimizing compilers, say) will preserve consciousness. It’s worrying because these optimizations can happen innocuously, e.g. when your upload gets migrated to a newer CPU with fancier heuristics. And yeah, when self-modification comes into the picture, it gets even worse.
[Epistemic status: napkin]
My current-favourite frame on “qualia” is that it refers to the class of objects we can think about (eg, they’re part of what generates what I say rn) for which behaviour is invariant across structure-preserving transformations.
(There’s probably some cool way to say that with category theory or transformations, and it may or may not give clarity, but idk.)
Eg, my “yellow” could map to blue, and “blue” to yellow, and we could still talk together without noticing anything amiss even if your “yellow” mapped to yellow for you.
Both blue and yellow are representational objects, the things we use to represent/refer to other things with, like memory-addresses in a machine. For externally observable behaviour, it just matters what they dereference to, regardless of where in memory you put them. If you swap two representational objects, while ensuring you don’t change anything about how your neurons link up to causal nodes outside the system, your behaviour stays the same.
Note that this isn’t the case for most objects. I can’t swap hand⇄tomato, without obvious glitches like me saying “what a tasty-looking tomato!” and trying to eat my hand. Hands and tomatoes do not commute.
It’s what allows us to (try to) talk about “tomato” as opposed to just tomato, and explains why we get so confused when we try to ground out (in terms of agreed-upon observables) what we’re talking about when we talk about “tomato”.
But how/why do we have representations for our representational objects in the first place? It’s like declaring a var (address₁↦value), and then declaring a var for that var (address₂↦address₁) while being confused about why the second dereferences to something ‘arbitrary’.
Maybe it starts when somebody asks you “what do you mean by ‘X’?”, and now you have to map the internal generators of [you saying “X”] in order to satisfy their question. Or not. Probably not. Napkin out.
We can use the same thought experiments that Chalmers uses to establish a fine-grain-functionally-isomorphic copy had the same qualia, modify them and show that anything that acts like us has our qualia.
The LLM character (rather than the LLM itself) will be conscious to the extent to which its behavior is I/O identical to the person.
Edit: Oh, sorry, this is an old comment. I got this recommended… somehow...
Edit2: Oh, it was curated yesterday.
Well, a thing that acts like us in one particular situation (say, a thing that types “I’m conscious” in chat) clearly doesn’t always have our qualia. Maybe you could say that a thing that acts like us in all possible situations must have our qualia? This is philosophically interesting! It makes a factual question (does the thing have qualia right now?) logically depend on a huge bundle of counterfactuals, most of which might never be realized. What if, during uploading, we insert a bug that changes our behavior in one of these counterfactuals—but then the upload never actually runs into that situation in the course of its life—does the upload still have the same qualia as the original person, in situations that do get realized? What if we insert quite many such bugs?
Moreover, what if we change the situations themselves? We can put the upload in circumstances that lead to more generic and less informative behavior: for example, give the upload a life where they’re never asked to remember a particular childhood experience. Or just a short life, where they’re never asked about anything much. Let’s say the machine doing the uploading is aware of that, and allowed to optimize out parts that the person won’t get to use. If there’s a thought that you sometimes think, but it doesn’t influence your I/O behavior, it can get optimized away; or if it has only a small influence on your behavior, a few bits’ worth let’s say, then it can be replaced with another thought that would cause the same few-bits effect. There’s a whole spectrum of questionable things that people tend to ignore when they say “copy the neurons”, “copy the I/O behavior” and stuff like that.
Right, that’s what I meant.
Thank you!
The I/O behavior being the same is a sufficient condition for it to be our mind upload. A sufficient condition for it to have some qualia, as opposed for it to have our mind and our qualia, will be weaker.
Then it’s, to a very slight extent, another person (with the continuum between me and another person being gradual).
Then the qualia would be very slightly different, unless I’m missing something. (To bootstrap the intuition, I would expect my self that chooses vanilla ice-cream over chocolate icecream in one specific situation to have very slightly different feelings and preferences in general, resulting in very slightly different qualia, even if he never encounters that situation.) With many such bugs, it would be the same, but to a greater extent.
I don’t think such thoughts exist (I can always be asked to say out loud what I’m thinking). Generally, I would say that a thought that never, even in principle, influences my output, isn’t possible. (The same principle should apply to trying to replace a thought just by a few bits.)
Such optimizations are a reason I believe we are not in a simulation. Optimizations are essential for a large sim. I expect them not to be consciousness preserving
Well, even if we reliably know that certain optimizations make copies not conscious, some people may want to run optimized versions of themselves that are not conscious. People are already making LLMs of themselves based on their writings and stuff. I think Age of Em doesn’t discuss this specific case, but collectives of variously modified Ems may perform better (if only for being cheaper) if they are not conscious. Humans Who Are Not Concentrating Are Not General Intelligences and often not conscious. I’m not conscious when I’m deeply immersed in some subject and only hours later realize how much time has passed—and how much I got done. It’s a kind of automation. Why not run it intentionally?
Surely ‘you’ are the algorithm, not the implementation. If I get refactored into a giant lookup table, I don’t think that makes the algorithm any less ‘me’.