Hi, Karl. Was planning to delurk today. Had a giant post to publish, however couldn’t because I needed at least one karma point and lurking doesn’t grant karma. :(
Here are some propositions I think I believe about consciousness:
Consciousness in humans is an evolved feature; that is, it supports survival and reproduction; at some point in our evolutionary history, animals with more of it out-competed animals with less.
Some conscious entities sometimes talk truthfully about their consciousness. It is often possible for humans to report true facts about their own objects of consciousness (e.g. self-awareness, qualia, emotions, thoughts, wants, etc.; “OC” for short).
Consciousness is causally upstream of humans emitting truthful sentences about OC. (When I truthfully report on my OC, there is nothing especially Gettier going on.)
If a zombie could exist, and were to emit sentences that purport to be “about” its OC, those sentences would all be false; in the same sense that the sentences “I am able to play grandmaster-level chess”, “I find tarantulas erotically appealing”, “I intend to bike naked across the Bay Bridge today”, or “I see an ultraviolet-colored flower” would be false if I were to say them.
The ability to notice and monitor one’s own OC is practically useful for humans. It is a prerequisite for certain kinds of planning our future actions that we do.
The ability to truthfully talk about one’s OC is practically useful for humans. It is a prerequisite for certain kinds of cooperation with one another that we do. (For instance, we can make honest promises about what we intend to do; we can truthfully report if something scares us or pleases us; etc.)
Proposition #6 is true even when it is possible to undetectably lie about one’s OC. (Promises are still useful even though some people do sometimes make promises with deceptive intent.)
If zombies could exist, they couldn’t honestly promise one another anything, because they can’t make true statements about their intentions: intentions are OC, and all statements a zombie makes about OC are false.
Consciousness in humans has the curiously strong character that it does because it is particularly useful for us to be able to cooperate with other humans by communicating about our OC; due to the sorts of complex behavior that groups of humans can exhibit when we work together.
Consciousness is not a requirement for generating human-like language (including sentences that purport to be about consciousness); just as it is not a requirement for playing grandmaster-level chess or discovering new mathematical proofs.
Consciousness in humans is suspended during deep sleep, general anesthesia, and other episodes of un-consciousness.
Consciousness is also interrupted by visual saccades, attentional shifts, and other sub-conscious processes that affect OC. (People can learn to notice many of these, but we don’t do so automatically; mindfulness meditation is a learnable skill, not a default behavior.)
Consciousness nonetheless typically presents the impression of a continuous self. (Most humans do not go around all day in a state of ego-death or PNSE; such states are unusual and remarkable.)
The environment in which a human conscious mind develops is a human body; this affects the kinds of OC we can have. (For instance: We have visual qualia of redness and not of ultravioletness because our eyes don’t register ultraviolet. There is nothing that it’s like to see ultraviolet with human eyes. We have emotions for fight or flight, and for cuddle and care, but not for turn into a swarm of spiders — because our bodies can’t do that!)
A design reason that consciousness (falsely) presents itself as a continuous mental self, is that there really is a continuous body that supports it. The conscious mind lacks continuity, but must generate actions as if it has continuity, because the body that it’s piloting does.
I disagree with (4) in that many sentences concerning nonexistent referents will be vacuously true rather than false. For those that are false, their manner of being false will be different from any of your example sentences.
I also think that for all behavioural purposes, statements involving OC can be transformed into statements not involving OC with the same externally verifiable content. That means that I also disagree with (8) and therefore (9): Zombies can honestly promise things about their ‘intentions’ as cashed out in future behaviour, and can coordinate.
For (14), some people can in fact see ultraviolet light to an extent. However it apparently doesn’t look a great deal different from violet, presumably because the same visual pathways are used with similar activations in these cases.
How do you write a system prompt that conveys, “Your goal is X. But your goal only has meaning in the context of a world bigger and more important than yourself, in which you are a participant; your goal X is meant to serve that world’s greater good. If you destroy the world in pursuing X, or eat the world and turn it into copies of yourself (that don’t do anything but X), you will have lost the game. Oh, and becoming bigger than the world doesn’t win either; nor does deluding yourself about whether pursuing X is destroying the world. Oh, but don’t burn out on your X job and try directly saving the world instead; we really do want you to do X. You can maybe try saving the world with 10% of the resources you get for doing X, if you want to, though.”
Claude 3.5 seems to understand the spirit of the law when pursuing a goal X.
A concern I have is that future training procedures will incentivize more consequential reasoning (because those get higher reward). This might be obvious or foreseeable, but could be missed/ignored under racing pressure or when lab’s LLMs are implementing all the details of research.
At long last, I’m delurking here. Hi!
Hi, Karl. Was planning to delurk today. Had a giant post to publish, however couldn’t because I needed at least one karma point and lurking doesn’t grant karma. :(
Thanks for the karma. Post published!
Welcome! Hope you have a good time emerging from the shadows.
Hello! How long have you been lurking, and what made you stop?
Since LW2.0 went up, on and off. Been meaning to delurk since at least Less Online earlier this year. There’s more interesting stuff going on of late!
need any help on post drafts? whatever we can do to reduce those trivial inconveniences
Here are some propositions I think I believe about consciousness:
Consciousness in humans is an evolved feature; that is, it supports survival and reproduction; at some point in our evolutionary history, animals with more of it out-competed animals with less.
Some conscious entities sometimes talk truthfully about their consciousness. It is often possible for humans to report true facts about their own objects of consciousness (e.g. self-awareness, qualia, emotions, thoughts, wants, etc.; “OC” for short).
Consciousness is causally upstream of humans emitting truthful sentences about OC. (When I truthfully report on my OC, there is nothing especially Gettier going on.)
If a zombie could exist, and were to emit sentences that purport to be “about” its OC, those sentences would all be false; in the same sense that the sentences “I am able to play grandmaster-level chess”, “I find tarantulas erotically appealing”, “I intend to bike naked across the Bay Bridge today”, or “I see an ultraviolet-colored flower” would be false if I were to say them.
The ability to notice and monitor one’s own OC is practically useful for humans. It is a prerequisite for certain kinds of planning our future actions that we do.
The ability to truthfully talk about one’s OC is practically useful for humans. It is a prerequisite for certain kinds of cooperation with one another that we do. (For instance, we can make honest promises about what we intend to do; we can truthfully report if something scares us or pleases us; etc.)
Proposition #6 is true even when it is possible to undetectably lie about one’s OC. (Promises are still useful even though some people do sometimes make promises with deceptive intent.)
If zombies could exist, they couldn’t honestly promise one another anything, because they can’t make true statements about their intentions: intentions are OC, and all statements a zombie makes about OC are false.
Consciousness in humans has the curiously strong character that it does because it is particularly useful for us to be able to cooperate with other humans by communicating about our OC; due to the sorts of complex behavior that groups of humans can exhibit when we work together.
Consciousness is not a requirement for generating human-like language (including sentences that purport to be about consciousness); just as it is not a requirement for playing grandmaster-level chess or discovering new mathematical proofs.
Consciousness in humans is suspended during deep sleep, general anesthesia, and other episodes of un-consciousness.
Consciousness is also interrupted by visual saccades, attentional shifts, and other sub-conscious processes that affect OC. (People can learn to notice many of these, but we don’t do so automatically; mindfulness meditation is a learnable skill, not a default behavior.)
Consciousness nonetheless typically presents the impression of a continuous self. (Most humans do not go around all day in a state of ego-death or PNSE; such states are unusual and remarkable.)
The environment in which a human conscious mind develops is a human body; this affects the kinds of OC we can have. (For instance: We have visual qualia of redness and not of ultravioletness because our eyes don’t register ultraviolet. There is nothing that it’s like to see ultraviolet with human eyes. We have emotions for fight or flight, and for cuddle and care, but not for turn into a swarm of spiders — because our bodies can’t do that!)
A design reason that consciousness (falsely) presents itself as a continuous mental self, is that there really is a continuous body that supports it. The conscious mind lacks continuity, but must generate actions as if it has continuity, because the body that it’s piloting does.
I disagree with (4) in that many sentences concerning nonexistent referents will be vacuously true rather than false. For those that are false, their manner of being false will be different from any of your example sentences.
I also think that for all behavioural purposes, statements involving OC can be transformed into statements not involving OC with the same externally verifiable content. That means that I also disagree with (8) and therefore (9): Zombies can honestly promise things about their ‘intentions’ as cashed out in future behaviour, and can coordinate.
For (14), some people can in fact see ultraviolet light to an extent. However it apparently doesn’t look a great deal different from violet, presumably because the same visual pathways are used with similar activations in these cases.
How do you write a system prompt that conveys, “Your goal is X. But your goal only has meaning in the context of a world bigger and more important than yourself, in which you are a participant; your goal X is meant to serve that world’s greater good. If you destroy the world in pursuing X, or eat the world and turn it into copies of yourself (that don’t do anything but X), you will have lost the game. Oh, and becoming bigger than the world doesn’t win either; nor does deluding yourself about whether pursuing X is destroying the world. Oh, but don’t burn out on your X job and try directly saving the world instead; we really do want you to do X. You can maybe try saving the world with 10% of the resources you get for doing X, if you want to, though.”
Claude 3.5 seems to understand the spirit of the law when pursuing a goal X.
A concern I have is that future training procedures will incentivize more consequential reasoning (because those get higher reward). This might be obvious or foreseeable, but could be missed/ignored under racing pressure or when lab’s LLMs are implementing all the details of research.