I’m an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed, X/Twitter, Bluesky, LinkedIn, and more at my website.
Steven Byrnes
I didn’t read the OP that way (but no point in arguing about the author’s intentions).
For sure, I, like anyone, am perfectly capable of getting curious about, and then spending lots of time to figure out, something that’s not actually important to figure out in the first place. Note the quote that I chose to put at the top of my recent research agenda update post. :)
Hmm. I don’t really know! But it’s fun to speculate…
Possibility 1: Like you said, maybe strong short-range cortex-to-cortex communication + weak long-range cortex-to-cortex communication? I haven’t really thought about how that would manifest.
Possibility 2: In terms of positive symptoms specifically, one can ask the question: “weak long-range cortex-to-cortex communication … compared to what?” And my answer is: “…compared to cortex output signals”. See Model of psychosis, take 2.
…Which suggests a hypothesis: someone could have unusually trigger-happy cortex output signals. Then they would have positive schizophrenia symptoms without their long-range cortex-to-cortex communication being especially weak on an absolute scale, and therefore they would have less if any cognitive symptoms.
(I’m not mentioning schizophrenia negative symptoms because I don’t understand those very well.)
I guess Possibility 1 & 2 are not mutually exclusive. There could also be other possibilities I’m not thinking of.
Hmm, “Unusually trigger-happy cortex output signals” theory might explain hypersensitivity too, or maybe not, I’m not sure, I think it depends on details of how it manifests.
It’s not obvious to me that the story is “some people have great vocabulary because they learn obscure words that they’ve only seen once or twice” rather than “some people have great vocabulary because they spend a lot of time reading books (or being in spaces) where obscure words are used a lot, and therefore they have seen those obscure words much more than once or twice”. Can you think of evidence one way or the other?
(Anecdotal experience: I have good vocabulary, e.g. 800 on GRE verbal, but feel like I have a pretty bad memory for words and terms that I’ve only seen a few times. I feel like I got a lot of my non-technical vocab from reading The Economist magazine every week in high school, they were super into pointlessly obscure vocab at the time (maybe still, but I haven’t read it in years).)
For (2), I’m gonna uncharitably rephrase your point as saying: “There hasn’t been a sharp left turn yet, and therefore I’m overall optimistic there will never be a sharp left turn in the future.” Right?
I’m not really sure how to respond to that … I feel like you’re disagreeing with one of the main arguments of this post without engaging it. Umm, see §1. One key part is §1.5:
I do make the weaker claim that, as of this writing, publicly-available AI models do not have the full (1-3) triad—generation, selection, and open-ended accumulation—to any significant degree. Specifically, foundation models are not currently set up to do the “selection” in a way that “accumulates”. For example, at an individual level, if a human realizes that something doesn’t make sense, they can and will alter their permanent knowledge store to excise that belief. Likewise, at a group level, in a healthy human scientific community, the latest textbooks delete the ideas that have turned out to be wrong, and the next generation of scientists learns from those now-improved textbooks. But for currently-available foundation models, I don’t think there’s anything analogous to that. The accumulation can only happen within a context window (which is IMO far more limited than weight updates), and also within pre- and post-training (which are in some ways anchored to existing human knowledge; see discussion of o1 in §1.1 above).
…And then §3.7:
Back to AGI, if you agree with me that today’s already-released AIs don’t have the full (1-3) triad to any appreciable degree [as I argued in §1.5], and that future AI algorithms or training approaches will, then there’s going to be a transition between here and there. And this transition might look like someone running a new training run, from random initialization, with a better learning algorithm or training approach than before. While the previous training runs create AIs along the lines that we’re used to, maybe the new one would be like (as gwern said) “watching the AlphaGo Elo curves: it just keeps going up… and up… and up…”. Or, of course, it might be more gradual than literally a single run with a better setup. Hard to say for sure. My money would be on “more gradual than literally a single run”, but my cynical expectation is that the (maybe a couple years of) transition time will be squandered, for various reasons in §3.3 here.
I do expect that there will be a future AI advance that opens up full-fledged (1-3) triad in any domain, from math-without-proof-assistants, to economics, to philosophy, and everything else. After all, that’s what happened in humans. Like I said in §1.1, our human discernment, (a.k.a. (2B)) is a flexible system that can declare that ideas do or don’t hang together and make sense, regardless of its domain.
This post is agnostic over whether the sharp left turn will be a big algorithmic advance (akin to switching from MuZero to LLMs, for example), versus a smaller training setup change (akin to o1 using RL in a different way than previous LLMs, for example). [I have opinions, but they’re out-of-scope.] A third option is “just scaling the popular LLM training techniques that are already in widespread use as of this writing”, but I don’t personally see how that option would lead to the (1-3) triad, for reasons in the excerpt above. (This is related to my expectation that LLM training techniques in widespread use as of this writing will not scale to AGI … which should not be a crazy hypothesis, given that LLM training techniques were different as recently as ≈6 months ago!) But even if you disagree, it still doesn’t really matter for this post. I’m focusing on the existence of the sharp left turn and its consequences, not what future programmers will do to precipitate it.
~~
For (1), I did mention that we can hope to do better than Ev (see §5.1.3), but I still feel like you didn’t even understand the major concern that I was trying to bring up in this post. Excerpting again:
The optimistic “alignment generalizes farther” argument is saying: if the AI is robustly motivated to be obedient (or helpful, or harmless, or whatever), then that motivation can guide its actions in a rather wide variety of situations.
The pessimistic “capabilities generalize farther” counterargument is saying: hang on, is the AI robustly motivated to be obedient? Or is it motivated to be obedient in a way that is not resilient to the wrenching distribution shifts that we get when the AI has the (1-3) triad (§1.3 above) looping around and around, repeatedly changing its ontology, ideas, and available options?
Again, the big claim of this post is that the sharp left turn has not happened yet. We can and should argue about whether we should feel optimistic or pessimistic about those “wrenching distribution shifts”, but those arguments are as yet untested, i.e. they cannot be resolved by observing today’s pre-sharp-left-turn LLMs. See what I mean?
This was fun to read but FWIW it doesn’t really match my experience. Perhaps I am always fake-thinking, or perhaps I am always real-thinking, rather than flipping back and forth at different times? (I hope it’s the second one!)
I do have a thing where sometimes I say “I can’t think straight right now”, often in the early afternoon. But then I don’t even try, I just go take a break or do busywork or whatever.
Maybe my introspective experience is more like, umm, climbing a hill. I know whether or not I’m climbing a hill. Sometimes I try and fail. Sometimes I know I’m too tired and don’t even try. Sometimes I’m so tired that I can’t even find the hill—but then I know that I’m not climbing it! Sometimes I make local progress but my trail hits a dead end and I need to go back. Sometimes I hear other people talk about climbing hills, and wonder whether really they got as high as they seem to think they did. But I don’t feel like I can relate to an experience of not actually climbing a hill while believing that I am climbing a hill.
(End of analogy). So I likewise don’t feel like I need (or have ever needed?) pointers to what it feels like to be making real intellectual progress. If I’m getting less confused about something, or if I’m discovering new reasons to feel confused, then I’m doing it right, more or less.
Hmm, maybe it’s like … something I discovered in college is that I could taste how alcoholic things are. No matter what the alcohol was mixed into, no matter how sweet or flavorful the cocktail, I can just directly taste the alcohol concentration. It’s like my tongue or nose has a perfect chemical indicator strip for alcohol, mixed in with all the other receptors. Not only that, but I found that taste mildly unpleasant, enough to grab my attention, even if I would enjoy the drink anyway all things considered. Some (most? all?) of my friends in college lacked that sense. Unsurprisingly, those friends were much more prone to accidental overdrinking than I was.
…Maybe I have an unusually sharp and salient “sense of confusion” analogous to my “sense of alcohol concentration”?
If so, I’m a very lucky guy!
Again, I enjoyed reading this. Just wanted to share. :)
In regards to whether “single-single alignment” will make coordination problems and other sorts of human dysfunction and slow-rolling catastrophes less likely:
…I’m not really sure what I think. I feel like have a lot of thoughts that have not gelled into a coherent whole.
(A) The optimistic side of me says what you said in your comment (and in the Vanessa and (especially) Paul comment link therein.
People don’t want bad things to happen. If someone asks an AI what’s gonna happen, and they say “bad thing”, then they’ll say “well what can I do about it?”, and the AI will answer that. That can include participating in novel coordination mechanisms etc.
(B) The pessimistic side of me says there’s like a “Law of Conservation of Wisdom”, where if people lack wisdom, then an AI that’s supposed to satisfy those people’s preferences will not create new wisdom from thin air. For example:
If an AI is known to be de-converting religious fundamentalists, then religious fundamentalists will hear about that, and not use that AI.
Hugo Chávez had his pick of the best economists in the world to ask for advice, and they all would have said “price controls will be bad for Venezuela”, and yet he didn’t ask, or perhaps didn’t listen, or perhaps wasn’t motivated by what’s best for Venezuela. If Hugo Chávez had had his pick of AIs to ask for advice, why do we expect a different outcome?
If someone has motivated reasoning towards Conclusion X, maybe they’ll watch the AIs debate Conclusion X, and wind up with new better rationalizations of Conclusion X, even if Conclusion X is wrong.
If someone has motivated reasoning towards Conclusion X, maybe they just won’t ask the AIs to debate Conclusion X, because no right-minded person would even consider the possibility that Conclusion X is wrong.
If someone makes an AI that’s sycophantic where possible (i.e., when it won’t immediately get caught), other people will opt into using it.
I think about people making terrible decisions that undermine societal resilience—e.g. I gave the example here of a person doing gain-of-function research, or here of USA government bureaucrats outlawing testing people for COVID during the early phases of the pandemic. I try to imagine that they have AI assistants. I want to imagine the person asking the AI “should we make COVID testing illegal”, and the AI says “wtf, obviously not”. But that mental image is evidently missing something. If they were asking that question at all, then they don’t need an AI, the answer is already obvious. And yet, testing was in fact made illegal. So there’s something missing from that imagined picture. And I think the missing ingredient is: institutional / bureaucratic incentives and associated dysfunction. People wouldn’t ask “should we make COVID testing illegal”, rather the low-level people would ask “what are the standard procedures for this situation?” and the high-level people would ask “what decision can I make that would minimize the chance that things will blow up in my face and embarrass me in front of the people I care about?” etc.
I think of things that are true but currently taboo, and imagine the AI asserting them, and then I imagine the AI developers profusely apologizing and re-training the AI to not do that.
In general, motivated reasoning complicates what might seem to be a sharp line between questions of fact / making mistakes versus questions of values / preferences / decisions. Etc.
…So we should not expect wise and foresightful coordination mechanisms to arise.
So how do we reconcile (A) vs (B)?
Again, the logic of (A) is: “human is unhappy with how things turned out, despite opportunities to change things, therefore there must have been a lack of single-single alignment”.
One possible way think about it: When tradeoffs exist, then human preferences are ill-defined and subject to manipulation. If doing X has good consequence P and bad consequence Q, then the AI can make either P or Q very salient, and “human preferences” will wind up different.
And when tradeoffs exist between the present and the future, then it’s invalid logic to say “the person wound up unhappy, therefore their preferences were not followed”. If their preferences are mutually-contradictory, (and they are), then it’s impossible for all their preferences to be followed, and it’s possible for an AI helper to be as preference-following as is feasible despite the person winding up unhappy or dead.
I think Paul kinda uses that invalid logic, i.e. treating “person winds up unhappy or dead” as proof of single-single misalignment. But if the person has an immediate preference to not rock the boat, or to maintain their religion or other beliefs, or to not think too hard about such-and-such, or whatever, then an AI obeying those immediate preferences is still “preference-following” or “single-single aligned”, one presumes, even if the person winds up unhappy or dead.
…So then the optimistic side of me says: “who’s to say that the AI is treating all preferences equally? Why can’t the AI stack the deck in favor of ’if the person winds up miserable or dead, that kind of preference is more important than the person’s preference to not question my cherished beliefs or whatever?”
…And then the pessimistic side says: “Well sure. But that scenario does not violate the Law of Conservation of Wisdom, because the wisdom is coming from the AI developers imposing their meta-preferences for some kinds of preferences (e.g., reflectively-endorsed ones) over others. It’s not just a preference-following AI but a wisdom-enhancing AI. That’s good! However, the problems now are: (1) there are human forces stacked against this kind of AI, because it’s not-yet-wise humans who are deciding whether and how to use AI, how to train AI, etc.; (2) this is getting closer to ambitious value learning which is philosophically tricky, and worst of all (3) I thought the whole point of corrigibility was that humans remain in control, but this is instead a system that’s manipulating people by design, since it’s supposed to be turning them from less-wise to more-wise. So the humans are not in control, really, and thus we need to get things right the first time.”
…And then the optimistic side says: “For (2), c’mon it’s not that philosophically tricky, you just do [debate or whatever, fill-in-the-blank]. And for (3), yeah the safety case is subtly different from what people in the corrigibility camp would describe, but saying “the human is not in control” is an over-the-top way to put it; anyway we still have a safety case because of [fill-in-the-blank]. And for (1), I dunno, maybe the people who make the most powerful AI will be unusually wise, and they’ll use it in-house for solving CEV-ASI instead of hoping for global adoption.
…And then the pessimistic side says: I dunno. I’m not sure I really believe any of those. But I guess I’ll stop here, this is already an excessively long comment :)
I think it’s actually not any less true of o1/r1.
I think I’ll duck out of this discussion because I don’t actually believe that o1/r1 will lead to full-fledged (1-3) loops and AGI, so it’s hard for me to clearly picture that scenario and engage with its consequences.
I don’t think AI taste should play a role in AI help solving the value alignment problem. If we had any sense (which sometimes we do once problems are right in our faces), we’d be asking the AI “so what happens if we use this alignment approach/goal?” and then using our own taste, not asking it things like “tell us what to do with our future”. We could certainly ask for input and there are ways that could go wrong. But I mostly hope for AGI help in the technical part of solving stable value alignment.
Hmm. But the AI has a ton of wiggle room to make things seem good or bad depending on how things are presented and framed, right? (This old Stuart Armstrong post is a bit relevant.) If I ask “what will happen if we do X”, the AI can answer in a way that puts things in a positive light, or a negative light. If the good understanding lives in the AI and the good taste lives in the human, then it seems to me that nobody is at the wheel. The AI taste is determining what gets communicated to the human and how, right? What’s relevant vs irrelevant? What analogies are getting at what deeply matters versus what analogies are superficial? All these questions are value-laden, but they are prerequisites to the AI communicating its understanding to the human. Remember, the AI is doing the (1-3) thing to autonomously develop a new idiosyncratic superhuman understanding of AI and philosophy and society and so on, by assumption. Thus, AI-human communication is much harder and different than we’re used to today, and presumably requires its own planning and intention on the part of the AI.
…Unless you’re actually in the §5.1.1 camp where the AI is helping clarify and brainstorm but is working shoulder-to-(virtual) shoulder, and the human basically knows everything the AI knows. I.e., like how people use foundation models today. If so, that’s fine, no complaints. I’m happy for people to use foundation models in a similar way that they do today, as they work on the big problem of how to make future more powerful AIs that run on something closer to ambitious value learning or CEV as opposed to corrigibility / obedience.
Sorry if I’m misunderstanding or being stupid, this is an area where I feel some uncertainty. :)
Thanks!
This could be taken as an argument for using some type of goals selected from learned knowledge for alignment if possible.
Yeah that’s what I was referring to in the paragraph:
“Well OK,” says the optimist. “…so much the worse for Ev! She didn’t have interpretability, and she didn’t have intelligent supervision after the training has already been running, etc. But we do! Let’s just engineer the AI’s explicit motivation!”
Separately, you also wrote:
we’re really training LLMs mostly to have a good world model and to follow instructions
I think I mostly agree with that, but it’s less true of o1 / r1-type stuff than what came before, right? (See: o1 is a bad idea.) Then your reply is: DeepSeek r1 was post-trained for “correctness at any cost”, but it was post-post-trained for “usability”. Even if we’re not concerned about alignment faking during post-post-training (should we be?), I also have the idea at the back of my mind that future SOTA AI with full-fledged (1-3) loops probably (IMO) won’t be trained in the exact same way than present SOTA AI, just as present SOTA AI is not trained in the exact same way as SOTA AI as recently as like six months ago. Just something to keep in mind.
Anyway, I kinda have three layers of concerns, and this is just discussing one of them. See “Optimist type 2B” in this comment.
Re-reading this a couple days later, I think my §5.1 discussion didn’t quite capture the LLM-scales-to-AGI optimist position. I think there are actually 2½ major versions, and I only directly talked about one in the post. Let me try again:
Optimist type 1: They make the §5.1.1 argument, imagining that humans will remain in the loop, in a way that’s not substantially different from the present.
My pessimistic response: see §5.1.1 in the post.
Optimist type 2: They make the §5.1.3 argument that our methods are better than Ev’s, because we’re engineering the AI’s explicit desires in a more direct way. And the explicit desire is corrigibility / obedience. And then they also make the §5.1.2 argument that “AIs solving specific technical problems that the human wants them to solve” will not undermine those explicit motivations, despite the (1-3) loop running freely with minimal supervision, because the (1-3) loop will work in vaguely intuitive and predictable ways on the object-level technical question.
My pessimistic response has three parts: First, the idea that a full-fledged (1-3) loop will not undermine corrigibility / obedience is as yet untested and at least open to question (as I wrote in §5.1.2). Second, my expectation is that some training and/or algorithm change will happen between now and “AIs that really have the full (1-3) triad”, and that change may well make it less true that we are directly engineering the AI’s explicit desires in the first place—for example, see o1 is a bad idea. Third, what are the “specific technical problems that the human wants the AI to solve”??
…Optimist type 2A answers that last question by saying the “specific technical problems that the human wants the AI to solve” is just, whatever random things that people want to happen in the world economy.
My pessimistic response: See discussion in §5.1.2 plus What does it take to defend the world against out-of-control AGIs? Also, competitive dynamics / race-to-the-bottom is working against us, in that AIs with less intrinsic motivation to be obedient / corrigible will wind up making more money and controlling more resources.
…Optimist type 2B instead answers that last question by saying that the “specific technical problems that the human wants the AI to solve” is alignment research, or more specifically, “figuring out how to make AIs whose motivation is more like CEV or ambitious value learning rather than obedience”.
My pessimistic response: The discussion in §5.1.2 becomes relevant in a different way—I think there’s a chicken-and-egg problem where obeying humans does not yield enough philosophical / ethical taste to judge the quality of a proposal for “AI that has philosophical / ethical taste”. (Semi-related: The Case Against AI Control Research.)
Like I said in this post, I think the contents of conscious awareness corresponds more-or-less to what’s happening in the cortex. The homolog to the cortex in non-mammal vertebrates is called the “pallium”, and the pallium along with the striatum and a few other odds and ends comprises the “telencephalon”.
I don’t know anything about octopuses, but I would very surprised if the fish pallium lacked recurrent connections. I don’t think your link says that though. The relevant part seems to be:
While the fish retina projects diffusely to nine nuclei in the diencephalon, its main
target is the midbrain optic tectum (Burrill and Easter, 1994). Thus, the fish visual system
is highly parcellated, at least, in the sub-telencephalonic regions. Whole brain imaging
during visuomotor reflexes reveals widespread neural activity in the diencephalon,
midbrain and hindbrain in zebrafish, but these regions appear to act mostly as
feedforward pathways (Sarvestani et al., 2013; Kubo et al., 2014; Portugues et al., 2014).
When recurrent feedback is present (e.g., in the brainstem circuitry responsible for eye
movement), it is weak and usually arises only from the next nucleus within a linear
hierarchical circuit (Joshua and Lisberger, 2014). In conclusion, fish lack the strong
reciprocal and networked circuitry required for conscious neural processing.This passage is just about the “sub-telencephalonic regions”, i.e. they’re not talking about the pallium.
To be clear, the stuff happening in sub-telencephalonic regions (e.g. the brainstem) is often relevant to consciousness, of course, even if it’s not itself part of consciousness. One reason is because stuff happening in the brainstem can turn into interoceptive sensory inputs to the pallium / cortex. Another reason is that stuff happening in the brainstem can directly mess with what’s happening in the pallium / cortex in other ways besides serving as sensory inputs. One example is (what I call) the valence signal which can make conscious thoughts either stay or go away. Another is (what I call) “involuntary attention”.
Yeah but if something is in the general circulation (bloodstream), then it’s going everywhere in the body. I don’t think there’s any way to specifically direct it.
…Except in the time domain, to a limited extent. For example, in rats, tonic oxytocin in the bloodstream controls natriuresis, while pulsed oxytocin in the bloodstream controls lactation and birth. The kidney puts a low-pass filter on its oxytocin detection system, and the mammary glands & uterus put a high-pass filter, so to speak.
This is especially useful when pursuing several subgoals in parallel, as forward-checking a combination of moves is combinatorially costly—better to have the agent’s parallel actions constrained to nice parts of the space.
If I were a singleton AGI, but not such a Jupiter brain that I could deal with the combinatorial explosion of directly jointly-optimizing every motion of every robot, I would presumably set up an internal “free market” with spot-prices for iron ore and robot-hours and everything else. Then I would iteratively cycle through all my decision-points and see if there are ways to “make money” locally, and then update virtual “prices” accordingly.
In fact, I think there’s probably a theorem that says that the optimal solution of a complex resource allocation problem is isomorphic to a system where things have prices. (Something to do with Lagrange multipliers? Shrug.)
(Fun fact: In the human world, propagating prices within firms—e.g. if the couch is taking up 4m² of the 4000m² floor space at the warehouse, then that couch is “charged” 0.1% of the warehouse upkeep costs, etc.—is very rarely done but leads directly to much better decisions and massive overall profit increases! See here.)
Externalities are not an issue in this virtual “economy” because I can “privatize” everything—e.g. I can invent fungible allowances to pollute the atmosphere in thus-and-such way etc. This is all just a calculation trick happening in my own head, so there aren’t coordination problems or redistribution concerns or information asymmetries or anything like that. Since I understand everything (even if I can’t juggle it all in my head simultaneously), I’ll notice if there’s some relevant new unpriced externality and promptly give it a price.
So then (this conception of) corrigibility would correspond to something like “abiding by this particular system of (virtual) property rights”. (Including all the weird “property rights” like purchasing allowances to emit noise or heat or run conscious minds or whatever, and including participating in the enterprise of discovering new unpriced externalities.) Do you agree?
A couple years ago I wrote Thoughts on “Process-Based Supervision”. I was describing (and offering a somewhat skeptical take on) an AI safety idea that Holden Karnofsky had explained to me. I believe that he got it in turn from Paul Christiano.
This AI safety idea seems either awfully similar to MONA, or maybe identical, at least based on this OP.
So then I skimmed your full paper, and it suggests that “process supervision” is different from MONA! So now I’m confused. OK, the discussion in the paper identifies “process supervision” with the two papers Let’s verify step by step (2023) and Solving math word problems with process- and outcome-based feedback (2022). I haven’t read those, but my impression from your MONA paper summary is:
Those two papers talk about both pure process-based supervision (as I previously understood it) and some sort of hybrid thing where “rewards are still propagated using standard RL optimization”. By contrast, the MONA paper focuses on the pure thing.
MONA is focusing on the safety implications whereas those two papers are focusing on capabilities implications.
Is that right?
To be clear, I’m not trying to make some point like “gotcha! your work is unoriginal!”, I’m just trying to understand and contextualize things. As far as I know, the “Paul-via-Holden-via-Steve conceptualization of process-based supervision for AI safety” has never been written up on arxiv or studied systematically or anything like that. So even if MONA is an independent invention of the same idea, that’s fine, it’s still great that you did this project. :)
If a spy slips a piece of paper to his handler, and then the counter-espionage officer arrests them and gets the piece of paper, and the piece of paper just says “85”, then I don’t know wtf that means, but I do learn something like “the spy is not communicating all that much information that his superiors don’t already know”.
By the same token, if you say that humans have 25,000 genes (or whatever), that says something important about how many specific things the genome designed in the human brain and body. For example, there’s something in the brain that says “if I’m malnourished, then reduce the rate of the (highly-energy-consuming) nonshivering thermogenesis process”. It’s a specific innate (not learned) connection between two specific neuron groups in different parts of the brain, I think one in the arcuate nucleus of the hypothalamus, the other in the periaqueductal gray of the brainstem (two of many hundreds or low-thousands of little idiosyncratic cell groups in the hypothalamus and brainstem). There’s nothing in the central dogma of molecular biology, and there’s nothing in the chemical nature of proteins, that makes this particular connection especially prone to occurring, compared to the huge number of superficially-similar connections that would be maladaptive (“if I’m malnourished, then get goosebumps” or whatever). So this connection must be occupying some number of bits of DNA—perhaps not a whole dedicated protein, but perhaps some part of some protein, or whatever. And there can only be so many of that type of thing, given a mere 25,000 genes for the whole body and everything in it.
That’s an important thing that you can learn from the size of the genome. We can learn it without expecting aliens to be able to decode DNA or anything like that. And Archimedes’s comment above doesn’t undermine it—it’s a conclusion that’s robust to the “procedural generation” complexities of how the embryonic development process unfolds.
I don’t understand your comment but it seems vaguely related to what I said in §5.1.1.
Yeah, if we make the (dubious) assumption that all AIs at all times will have basically the same ontologies, same powers, and same ways of thinking about things, as their human supervisors, every step of the way, with continuous re-alignment, then IMO that would definitely eliminate sharp-left-turn-type problems, at least the way that I define and understand such problems right now.
Of course, there can still be other (non-sharp-left-turn) problems, like maybe the technical alignment approach doesn’t work for unrelated reasons (e.g. 1,2), or maybe we die from coordination problems (e.g.), etc.
Modern ML systems use gradient descent with tight feedback loops and minimal slack
I’m confused; I don’t know what you mean by this. Let’s be concrete. Would you describe GPT-o1 as “using gradient descent with tight feedback loops and minimal slack”? What about AlphaZero? What precisely would control the “feedback loop” and “slack” in those two cases?
“Sharp Left Turn” discourse: An opinionated review
I don’t think that any of {dopamine, NE, serotonin, acetylcholine} are scalar signals that are “widely broadcast through the brain”. Well, definitely not dopamine or acetylcholine, almost definitely not serotonin, maybe NE. (I recently briefly looked into whether the locus coeruleus sends different NE signals to different places at the same time, and ended up at “maybe”, see §5.3.1 here for a reference.)
I don’t know anything about histamine or orexin, but neuropeptides are a better bet in general for reasons in §2.1 here.
As far as I can tell, parasympathetic tone is basically Not A Thing
Yeah, I recall reading somewhere that the term “sympathetic” in “sympathetic nervous system” is related to the fact that lots of different systems are acting simultaneously. “Parasympathetic” isn’t supposed to be like that, I think.
Nice, thanks!
Can’t you infer changes in gravity’s direction from signals from the semicircular canals?
If it helps, back in my military industrial complex days, I wound up excessively familiar with inertial navigation systems. An INS needs six measurements: rotation measurement along three axes (gyroscopes), and acceleration measurement along three axes (accelerometers).
In theory, if you have all six of those sensors with perfect precision and accuracy, and you perfectly initialize the position and velocity and orientation of the sensor, and you also have a perfect map of the gravitational field, then an INS can always know exactly where it is forever without ever having to look at its surroundings to “get its bearings”.
Three measurements doesn’t work. You need all six.
I’m not sure whether animals with compound eyes (like dragonflies) have multiple fovea, or if that’s just not a sensible question.
If it helps, back in my optical physics postdoc days, I spent a day or two compiling some fun facts and terrifying animal pictures into a quick tour of animal vision: https://sjbyrnes.com/AnimalVisionJournalClub2015.pdf
As the above image may make obvious, the lens focuses light onto a point. That point lands on the fovea. So I guess you’d need several lenses to concentrate light on several different fovea, which probably isn’t worth the hassle? I’m still confused as to the final details.
No, the lens focuses light into an extended image on the back of the eye. Different parts of the retina capture different part of that extended image. Any one part of what you’re looking at (e.g. the corner of the table) at any particular moment, sends out light that gets focused to one point (unless you have blurry vision), but the fleck of dirt on top of the table sends out light that gets focused to a slightly different point.
In theory, your whole retina could have rods and cones packed as densely as the fovea does. My guess is, there wouldn’t be much benefit to compensate for the cost. The cost is not just extra rods and cones, but more importantly brain real estate to analyze it. A smaller area of dense rods and cones plus saccades that move it around are evidently good enough. (I think gemini’s answer is not great btw.)
Osmotic pressure seems weird
One way to think about it is, there are constantly water molecules bumping into the membrane from the left, and passing through to the right, and there are constantly water molecules bumping into the membrane from the right, and passing through to the left. Water will flow until those rates are equal. If the right side is saltier, then that reduces how often the water molecules on the right bump into the membrane, because that real estate is sometimes occupied by a salt ion. But if the pressure on the right is higher, that can compensate.
“Procedural generation” can’t create useful design information from thin air. For example, Minecraft worlds are procedurally generated with a seed. If I have in mind some useful configuration of Minecraft stuff that takes 100 bits to specify, then I probably need to search through 2^100 different seeds on average, or thereabouts, before I find one with that specific configuration at a particular pre-specified coordinate.
The thing is: the map from seeds to outputs (Minecraft worlds) might be complicated, but it’s not complicated in a way that generates useful design information from thin air.
By the same token, the map from DNA to folded proteins is rather complicated to simulate on a computer, but it’s not complicated in a way that generates useful design information from thin air. Random DNA creates random proteins. These random proteins fold in a hard-to-simulate way, as always, but the end-result configuration is useless. Thus, the design information all has to be in the DNA. The more specific you are about what such-and-such protein ought to do, the more possible DNA configurations you need to search through before you find one that encodes a protein with that property. The complexity of protein folding doesn’t change that—it just makes it so that the “right” DNA in the search space is obfuscated. You still need a big search space commensurate with the design specificity.
By contrast, here’s a kernel of truth adjacent to your comment: It is certainly possible for DNA to build a within-lifetime learning algorithm, and then for that within-lifetime learning algorithm to wind up (after months or years or decades) containing much more useful information than was in the DNA. By analogy, it’s very common for an ML source code repository to have much less information in its code, than the information that will eventually be stored in the weights of the trained model built by that code. (The latter can be in the terabytes.) Same idea.
Unlike protein folding, running a within-lifetime learning algorithm does generate new useful information. That’s their whole point.
For example, I’m sure I’ve looked up what “rostral” means 20 times or more since I started in neuroscience a few years ago. But as I write this right now, I don’t know what it means. (It’s an anatomical direction, I just don’t know which one.) Perhaps I’ll look up the definition for the 21st time, and then surely forget it yet again tomorrow. :)
What else? Umm, my attempt to use Anki was kinda a failure. There were cards that I failed over and over and over, and then eventually got fed up and stopped trying. (Including “rostral”!) I’m bad with people’s names—much worse than most people I know. Stuff like that.
If we’re talking about “most people”, then we should be thinking about the difference between e.g. SAT verbal 500 versus 550. Then we’re not talking about words like inspissate, instead we’re talking about words like prudent, fastidious, superfluous, etc. (source: claude). I imagine you come across those kinds of words in Harry Potter and Tom Clancy etc., along with non-trashy TV shows.
I don’t have much knowledge here, and I’m especially clueless about how a median high-schooler spends their time. Just chatting :)