Part of my research impact model has been something like: LLM knowledge will increasingly be built via dialectic with other LLMs. In dialectics, if you can say One True Thing in a domain, this can function as a diamond-perfect kernel of knowledge that can be used to win arguments against other AIs with, and shape LLM dialectic on this topic (analogy to soft sweeps in genetics).
Alignment research and consciousness research are not the same thing. But they’re not orthogonal, and I think I’ve seen some ways to push consciousness research forward, so I’ve been focused on trying to (1) speedrun what I see as the most viable consciousness research path, while (2) holding a preference for One True Thing type knowledge that LLMs will likely be bad at creating but good at using (E.g., STV, or these threads)
(I don’t care about influencing future LLM dialectics other than giving them true things; or rather I care but I suspect it’s better to be strictly friendly / non-manipulative)
One thing I messed up on was storing important results in pdfs; I just realized today the major training corpuses don’t yet pull from pdfs.
Michael Edward Johnson
I’m really enjoying this series of posts. Perhaps this will be addressed in 4-6, but I’m wondering about prescriptions which might follow from c-risks.
I have a sense that humanity has built a tower of knowledge and capital and capacities, and just above the tower is a rope upwards labeled “AGI”. We don’t know exactly where it leads, aside from upwards.
But the tower we’ve built is also dissolving along multiple dimensions. Civilizations require many sorts of capital and investment, and in some ways we’ve been “eating the seed corn.” Decline is a real possibility, and if we leap for the AGI rope and miss, we might fall fairly hard and have to rebuild for a while.
There might be silver linings to a fall, as long as it’s not too hard and we get a second chance at things. Maybe the second attempt at the AGI rope could be more ‘sane’ in certain ways. My models aren’t good enough here to know what scenario to root for.
At any rate, today it seems like we’re in something like Robin Hanson’s “Dreamtime”, an era of ridiculous surplus, inefficiency, and delusion. Dreamtimes are finite; they end. I think either AGI or civilizational collapse will end our Dreamtime.
What’s worth doing in this Dreamtime, before both surpluses and illusions vanish? My sense is:
If we might reach AGI during this cycle, I think it would be good to make a serious attempt at understanding consciousness. (An open note here that I stepped down from the board of QRI and ended all affiliation with the institution. If you want to collaborate please reach out directly.)
If we’re headed for collapse instead of AGI, it seems wise to use Dreamtime resources to invest in forms of capital that will persist after a collapse and be useful for rebuilding a benevolent civilization.
Investing in solving the dysfunctions of our time and preventing a hard collapse also seems hugely worthwhile, if it’s tractable!
Looking forward to your conclusions.
Dennett talks about Darwin’s theory of evolution being a “universal acid” that flowed everywhere, dissolved many incorrect things, and left everything we ‘thought’ we knew forever changed. Wittgenstein’s Philosophical Investigations, with its description of language-games and the strong thesis that this is actually the only thing language is, was that for philosophy. Before PI it was reasonable to think that words have intrinsic meanings; after, it wasn’t.
“By their fruits you shall know them.”
A frame I trust in these discussions is trying to elucidate the end goal. What does knowledge about consciousness look like under Eliezer’s model? Under Jemist’s? Under QRI’s?
Let’s say you want the answer to this question enough you go into cryosleep with the instruction “wake me up when they solve consciousness.” Now it’s 500, or 5000, or 5 million years in the future and they’ve done it. You wake up. You go to the local bookstore analogue, pull out the Qualia 101 textbook and sit down to read. What do you find in the pages? Do you find essays on how we realized consciousness was merely a linguistic confusion, or equations for how it all works?
As I understand Eliezer’s position, consciousness is both (1) a linguistic confusion (leaky reification) and (2) the seat of all value. There seems a tension here, that would be good to resolve since the goal of consciousness research seems unclear in this case. I notice I’m putting words in peoples’ mouths and would be glad if the principals could offer their own takes on “what future knowledge about qualia looks like.”
My own view is if we opened that hypothetical textbook up we would find crisp equations of consciousness, with deep parallels to the equations of physics; in fact the equations may be the same, just projected differently.
My view on the brand of physicalism I believe in, dual aspect monism, and how it constrains knowledge about qualia: https://opentheory.net/2019/06/taking-monism-seriously/
My arguments against analytic functionalism (which I believe Eliezer’s views fall into): https://opentheory.net/2017/07/why-i-think-the-foundational-research-institute-should-rethink-its-approach/
Goal factoring is another that comes to mind, but people who worked at CFAR or Leverage would know the ins and outs of the list better than I.
Speaking personally, based on various friendships with people within Leverage, attending a Leverage-hosted neuroscience reading group for a few months, and having attended a Paradigm Academy weekend workshop.
I think Leverage 1.0 was a genuine good-faith attempt at solving various difficult coordination problems. I can’t say they succeeded or failed; Leverage didn’t obviously hit it out of the park, but I feel they were at least wrong in interesting, generative ways that were uncorrelated with the standard and more ‘boring’ ways most institutions are wrong. Lots of stories I heard sounded weird to me, but most interesting organizations are weird and have fairly strict IP protocols so I mostly withhold judgment.
The stories my friends shared did show a large focus on methodological experimentation, which has benefits and drawbacks. Echoing some of the points, I do think when experiments are done on people, and they fail, there can be a real human cost. I suspect some people did have substantially negative experiences from this. There’s probably also a very large set of experiments where the result was something like, “I don’t know if it was good, or if was bad, but something feels different.”
There’s quite a lot about Leverage that I don’t know and can’t speak to, for example the internal social dynamics.
One item that my Leverage friends were proud to share is that Leverage organized the [edit: precursor to the] first EA Global conference. I was overall favorably impressed by the content in the weekend workshop I did, and I had the sense that to some degree Leverage 1.0 gets a bad rap simply because they didn’t figure out how to hang onto credit for the good things they did do for the community (organizing EAG, inventing and spreading various rationality techniques, making key introductions). That said I didn’t like the lack of public output.
I’ve been glad to see Leverage 2.0 pivot to progress studies, as it seems to align more closely with Leverage 1.0’s core strength of methodological experimentation, while avoiding the pitfalls of radical self-experimentation.
Would the world have been better if Leverage 1.0 hadn’t existed? My personal answer is a strong no. I’m glad it existed and was unapologetically weird and ambitious in the way it was and I give its leadership serious points for trying to build something new.
Hi Steven,
This is a great comment and I hope I can do it justice (took an overnight bus and am somewhat sleep-deprived).
First I’d say that neither we nor anyone has a full theory of consciousness. I.e. we’re not at the point where we can look at a brain, and derive an exact mathematical representation of what it’s feeling. I would suggest thinking of STV as a piece of this future full theory of consciousness, which I’ve tried to optimize for compatibility by remaining agnostic about certain details.
One such detail is the state space: if we knew the mathematical space consciousness ‘live in’, we could zero in on symmetry metrics optimized for this space. Tononi’s IIT for instance suggests it‘s a vector space — but I think it would be a mistake to assume IIT is right about this. Graphs assume less structure than vector spaces, so it’s a little safer to speak about symmetry metrics in graphs.
Another ’move’ motivated by compatibility is STV’s focus on the mathematical representation of phenomenology, rather than on patterns in the brain. STV is not a neuro theory, but a metaphysical one. I.e. assuming that in the future we can construct a full formalism for consciousness, and thus represent a given experience mathematically, the symmetry in this representation will hold an identity relationship with pleasure.
Appreciate the remarks about Smolensky! I think what you said is reasonable and I’ll have to think about how that fits with e.g. CSHW. His emphasis is of course language and neural representation, very different domains.
>(Also, not to gripe, but if you don’t yet have a precise definition of “symmetry”, then I might suggest that you not describe STV as a “crisp formalism”. I normally think “formalism” ≈ “formal” ≈ “the things you’re talking about have precise unambiguous definitions”. Just my opinion.)
I definitely understand this. On the other hand, STV should basically have zero degrees of freedom once we do have a full formal theory of consciousness. I.e., once we know the state space, have example mathematical representations of phenomenology, have defined the parallels between qualia space and physics, etc, it should be obvious what symmetry metric to use. (My intuition is, we’ll import it directly from physics.) In this sense it is a crisp formalism. However, I get your objection and more precisely it’s a dependent formalism, and dependent upon something that doesn’t yet exist.
>(FWIW, I think that “pleasure”, like “suffering” etc., is a learned concept with contextual and social associations, and therefore won’t necessarily exactly correspond to a natural category of processes in the brain.)
I think one of the most interesting questions in the universe is whether you’re right, or whether I’m right! :) Definitely hope to figure out good ways of ‘making beliefs pay rent’ here. In general I find the question of “what are the universe’s natural kinds?” to be fascinating.
Hi Steven, amazing comment, thank you. I’ll try to address your points in order.
0. I get your Mario example, and totally agree within that context; however, this conclusion may or may not transfer to brains, depending on how e.g. they implement utility functions. If the brain is a ‘harmonic computer’ then it may be doing e.g. gradient descent in such a way that the state of its utility function can be inferred from its large-scale structure.
1. On this question I’ll gracefully punt to lsusr‘s comment :) I endorse both his comment and framing. I’d also offer that dissonance is in an important sense ‘directional’ — if you have a symmetrical network and something breaks its symmetry, the new network pattern is not symmetrical and this break in symmetry allows you to infer where the ‘damage’ is. An analogy might be, a spider’s spiderweb starts as highly symmetrical, but its vibrations become asymmetrical when a fly bumbles along and gets stuck. The spider can infer where the fly is on the web based on the particular ‘flavor’ of new vibrations.
2. Complex question. First I’d say that STV as technically stated is a metaphysical claim, not a claim about brain dynamics. But I don’t want to hide behind this; I think your question deserves an answer. This perhaps touches on lsusr’s comment, but I’d add that if the brain does tend to follow a symmetry gradient (following e.g. Smolensky’s work on computational harmony), it likely does so in a fractal way. It will have tiny regions which follow a local symmetry gradient, it will have bigger regions which span many circuits where a larger symmetry gradient will form, and it will have brain-wide dynamics which follow a global symmetry gradient. How exactly these different scales of gradients interact is a very non-trivial thing, but I think it gives at least a hint as to how information might travel from large scales to small, and from small to large.
3. I think my answer to (2) also addresses this;
4. I think, essentially, that we can both be correct here. STV is intended to be an implementational account of valence; as we abstract away details of implementation, other frames may become relatively more useful. However, I do think that e.g. talk of “pleasure centers” involves potential infinite regress: what ‘makes’ something a pleasure center? A strength of STV is it fundamentally defines an identity relationship.
I hope that helps! Definitely would recommend lsusr’s comments, and just want to thank you again for your careful comment.
Neural Annealing is probably the most current actionable output of this line of research. The actionable point is that the brain sometimes enters high-energy states which are characterized by extreme malleability; basically old patterns ‘melt’ and new ones reform, and the majority of emotional updating happens during these states. Music, meditation, and psychedelics are fairly reliable artificial triggers for entering these states. When in such a malleable state, I suggest the following:
>Off the top of my head, I’d suggest that one of the worst things you could do after entering a high-energy brain state would be to fill your environment with distractions (e.g., watching TV, inane smalltalk, or other ‘low-quality patterns’). Likewise, it seems crucial to avoid socially toxic or otherwise highly stressful conditions. Most likely, going to sleep as soon as possible without breaking flow would be a good strategy to get the most out of a high-energy state- the more slowly you can ‘cool off’ the better, and there’s some evidence annealing can continue during sleep. Avoiding strong negative emotions during such states seems important, as does managing your associations (psychedelics are another way to reach these high-energy states, and people have noticed there’s an ‘imprinting’ process where the things you think about and feel while high can leave durable imprints on how you feel after the trip). It seems plausible that taking certain nootropics could help strengthen (or weaken) the magnitude of this annealing process.
Here’s @lsusr describing the rationale for using harmonics in computation — my research is focused on the brain, but I believe he has a series of LW posts describing how he’s using this frame for implementing an AI system: https://www.lesswrong.com/posts/zcYJBTGYtcftxefz9/neural-annealing-toward-a-neural-theory-of-everything?commentId=oaSQapNfBueNnt5pS&fbclid=IwAR0dpMyxz8rEnunCbLLYUh1l2CrjxRhNsQT1h_qdSgmOLDiVx5-G-auThTc
Symmetry is a (if not ‘the central’) Schelling point if one is in fact using harmonics for computation. I.e., I believe if one actually went and implemented a robot built around the computational principles the brain uses, that gathered apples and avoided tigers, it would tacitly follow a symmetry gradient.
Thank you!
A Primer on the Symmetry Theory of Valence
I think this is very real. Important to also note that non-specific joy exists and can be reliably triggered by certain chemicals.
My inference from this is that preferences are a useful but leaky reification, and if we want to get to ‘ground truth’ about comfort and discomfort, we need a frame that emerges cleanly from the brain’s implementation level.
This is the founding insight behind QRI — see here for a brief summary https://opentheory.net/2021/07/a-primer-on-the-symmetry-theory-of-valence/
Hi Charlie,
To compress a lot of thoughts into a small remark, I think both possibilities (consciousness is like electromagnetism in that it has some deep structure to be formalized, vs consciousness is like elan vital in that it lacks any such deep structure) are live possibilities. What’s most interesting to me is doing the work that will give us evidence which of these worlds we live in. There are a lot of threads mentioned in my first comment that I think can generate value/clarity here; in general I’d recommend brainstorming “what would I expect to see if I lived in a world where consciousness does, vs does not, have a crisp substructure?”
Some sorts of knowledge about consciousness will necessarily be as messy as the brain is messy, but the core question is whether there’s any ‘clean substructure’ to be discovered about phenomenology itself. Here’s what I suggest in Principia Qualia:
--------
>Brains vs conscious systems:
>There are fundamentally two kinds of knowledge about valence: things that are true specifically in brains like ours, and general principles common to all conscious entities. Almost all of what we know about pain and pleasure is of the first type – essentially, affective neuroscience has been synonymous with making maps of the mammalian brain’s evolved, adaptive affective modules and contingent architectural quirks (“spandrels”).
>This paper attempts to chart a viable course for this second type of research: it’s an attempt toward a general theory of valence, a.k.a. universal, substrate-independent principles that apply equally to and are precisely true in all conscious entities, be they humans, non-human animals, aliens, or conscious artificial intelligence (AI).
>In order to generalize valence research in this way, we need to understand valence research as a subset of qualia research, and qualia research as a problem in information theory and/or physics, rather than neuroscience. Such a generalized approach avoids focusing on contingent facts and instead seeks general principles for how the causal organization of a physical system generates or corresponds to its phenomenology, or how it feels to subjectively be that system. David Chalmers has hypothesized about this in terms of “psychophysical laws” (Chalmers 1995), or translational principles which we could use to derive a system’s qualia, much like we can derive the electromagnetic field generated by some electronic gadget purely from knowledge of the gadget’s internal composition and circuitry.
Hi Charlie, I’m glad to point to our announced collaborations with JHU, Harvard, ICL, and some of the other more established centers for neuroscience, as well as our psychophysics toolkit, which you can check out here. I find that many times people operate from cached impressions of what we’re doing and in such cases I try to get people to update their cache, as our work does now encompass what most people might call “normal” neuroscience of consciousness and associated markers of legitimacy (as inspired and guided by our theoretical work).
I highly appreciate Wittgenstein’s notion of language games as a nigh-universal tool for dissolving confusion. However, I would also suggest an alternate framing: “have you tried solving the problem?”—has anyone tried to formalize emotional valence before in a way that could yield results if there is a solution? What could a ‘solution’ here even mean? What would this process look like from the inside? What outputs should we expect to see from the outside? Is there a “fire alarm” for solving this problem? -- In short I think “dissolving confusion” is important for consciousness research, but I don’t think that’s necessarily the only goal. Rather, we should also look for ‘deep structure’ to be formalized, much like electromagnetism and chemistry had ‘deep structure’ to be formalized. I feel skeptics (analytic functionalists) risk premature optimization here—skepticism isn’t a strong position to hold before we’ve ‘actually tried’ to find this structure. (I say more about the problems I see with analytic functionalism / eliminativism as positive theories here.)
QRI is predicated on the assumption that, before we give up on systematizing consciousness, we should apply the same core formalism aesthetics that have led to progress in other fields, to consciousness—i.e. we should ‘actually try’. From both inside-view process and outside-view neuroscience outputs, I’m confident what we’re doing is strongly worthwhile.
Fair enough; the TL;DR pull-quote for this piece would be:
Annealing involves heating a metal above its recrystallization temperature, keeping it there for long enough for the microstructure of the metal to reach equilibrium, then slowly cooling it down, letting new patterns crystallize. This releases the internal stresses of the material, and is often used to restore ductility (plasticity and toughness) on metals that have been ‘cold-worked’ and have become very hard and brittle— in a sense, annealing is a ‘reset switch’ which allows metals to go back to a more pristine, natural state after being bent or stressed. I suspect this is a useful metaphor for brains, in that they can become hard and brittle over time with a build-up of internal stresses, and these stresses can be released by periodically entering high-energy states where a more natural neural microstructure can reemerge.
Furthermore: meditation, music, and psychedelics (and sex and perhaps sleep) ‘heat’ the brain up in this metaphorical sense. Lots of things follow from this—most usefully, if you feel stressed and depressed, make sure you’re “annealing” enough, both in terms of frequency and ‘annealing temperature’ (really intense emotional experiences are crucial for the emotional updating process).
Possibly the most LW-relevant part of this is the comment left by lsusr, which I’ve appended to the bottom of the version on opentheory.net, i.e. the comment that starts out “This makes sense to me because I work full-time on the bleeding edge of applied AI, meditate, and have degree in physics where I taught the acoustical and energy-based models this theory is based upon. Without a solid foundation in all three of these fields this theory might seem less self-evident.”—he said some things clearly that were only tacit in my writeup.
Currently working on a follow-up piece.
This will be a terribly late and very incomplete reply, but regarding your question,
>Is there some mechanism that would allow for evolution to somewhat define the ‘landscape’ of harmonics? Is reframing the harmonics as goals compatible with the model? Something like this seems to be pointed at in the quote
>>Panksepp’s seven core drives (play, panic/grief, fear, rage, seeking, lust, care) might be a decent first-pass approximation for the attractors in this system.
A metaphor that I like to use here is that I see any given brain as a terribly complicated lock. Various stimuli can be thought of as keys. The right key will create harmony in the brain’s harmonics. E.g., if you’re hungry, a nice high-calorie food will create a blast of consonance which will ripple through many different brain systems, updating your tacit drive away from food seeking. If you aren’t hungry—it won’t create this blast of consonance. It’s the wrong key to unlock harmony in your brain.
Under this model, the shape of the connectome is the thing that evolution has built to define the landscape of harmonics and drive adaptive behavior. The success condition is harmony. I.e., the lock is very complex, the ‘key’ that fits a given lock can be either simple or complex, and the success condition (harmony in the brain) is relatively simple.
I really appreciate this newsletter. I wish we had something similar from China.
In defense of David’s point, consciousness research is currently pre-scientific, loosely akin to 1400’s alchemy. Fields become scientific as they settle on a core ontology and methodology for generating predictions from this ontology; consciousness research presently has neither.
Most current arguments about consciousness and uploading are thus ultimately arguments by intuition. Certainly an intuitive story can be told why uploading a brain and running it as a computer program would also simply transfer consciousness, but we can also tell stories where intuition pulls in the opposite direction, e.g. see Scott Aaronson’s piece here https://scottaaronson.blog/?p=1951 ; my former colleague Andres also has a relevant paper arguing against computationalist approaches here https://www.degruyter.com/document/doi/10.1515/opphil-2022-0225/html
Of the attempts to formalize the concept of information flows and its relevance to consciousness, the most notable is probably Tononi’s IIT (currently on version 4.0). However, Tononi himself believes computers could be only minimally conscious and only in a highly fragmented way, for technical reasons relating to his theory. Excerpted from Principia Qualia:
>Tononi has argued that “in sharp contrast to widespread functionalist beliefs, IIT implies that digital computers, even if their behaviour were to be functionally equivalent to ours, and even if they were to run faithful simulations of the human brain, would experience next to nothing” (Tononi and Koch 2015). However, he hasn’t actually published much on why he thinks this. When pressed on this, he justified this assertion by reference to IIT’s axiom of exclusion – thi axiom effectively prevents ’double counting’ a physical element to be part of multiple virtual elements, and when he ran a simple neural simulation on a simple microprocessor and looked at what the hardware was actually doing, a lot of the “virtual neurons” were being run on the same logic gates (in particular, all virtual neurons extensively share the logic gates which run the processor clock). Thus, the virtual neurons don’t exist in the same causal clump (“cause-effect repertoire”) like they do in a real brain. His conclusion was that there might be small fragments of consciousness scattered around a digital computer, but he’s confident that ‘virtual neurons’ emulated on a Von Neumann system wouldn’t produce their original qualia.
At any rate, there are many approaches to formalizing consciousness across the literature, each pointing to a slightly different set of implications for uploads, and no clear winner yet. I assign more probability mass than David or Tononi that computers generate nontrivial amounts of consciousness (see here https://opentheory.net/2022/12/ais-arent-conscious-but-computers-are/) but find David’s thesis entirely reasonable.