Some questions regarding the general assessment that sentient AI, especially accidentially sentient AI, is extremely unlikely in the foreseeable future, so we need not worry about it or change our behaviour now
Caveat: This is not a response to recent LLMs. I wrote and shared this text within my academic context at multiple points, starting a year ago, without getting any helpful responses, critical or positive. I have seen no compelling evidence that ChatGPT is sentient, and considerable evidence against it. ChatGPT is neither the cause nor the target of this article.
***
The following text will ask a number of questions on the possibility of sentient AI in the very near future.
As my reader, you may experience a knee-jerk response that even asking these questions is ridiculous, because the underlying concerns are obviously unfounded; that was also my reaction when these questions first crossed my mind.
But statements that humans find extremely obvious are curious things.
Sometimes, “obvious” statements are indeed statements of logical relations or immediately accessible empirical facts that are trivial to prove, or necessary axioms without which scientific progress is clearly impossible; in these cases, a brief reflection suffices to spell out the reasoning for our conviction. I find it worthwhile to bring these foundations to mind in regular intervals, to assure myself that the basis for my reasoning is sound – it does not take long, after all.
At other times, statements that feel very obvious turn out to be surprisingly difficult to rationally explicate. If upon noticing this, you then start feeling very irritated, and feel a strong desire to dismiss the questioning of your assumption despite the fact that no sound argument is emerging (e.g. feel like parodying the author so other can feel, like you, how silly all this is, despite the fact that this does not actually fill the whole where data or logic should be), that is a red flag that something else is going on. Humans are prone to rejecting assumptions as obviously false if those assumptions would have unpleasant implications if true. But unpleasant implications are not evidence that a statement is false; they merely mean that we are motivated to hide its potential truth from ourselves.
I have come to fear that a number of statements we tend to consider obvious about sentient AI fall into this second category. If you have a rational reason to believe they instead fall into the first, I would really appreciate it if you could clearly write out why, and put me at ease; that ought to be quick for something this clear. If you cannot quickly and clearly spell this out on rational grounds, please do not dismiss this text on emotional ones.
Here are the questions.
It is near-universally accepted in academic cycles that sentient AI (that is, AI consciously experiencing, feeling even the most basic forms of suffering) does not exist yet, to a degree where even questioning this feels embarassing. Why is that? What precisely makes us so sure that this question need not even be asked? Keep in mind in particular that the sentience of demonstrably sentient entities (e.g. non-human animals like chimpanzees, dolphins or parrots, and occasionally even whole groups of humans, such as people of colour or infants) has been historically repeatedly denied by large numbers of humans in power, as have other abilities which definitely existed. (E.g. the ability to feel pain and the extensive cultural heritage of indigenous people in Africa and Australia have often been entirely denied by many colonists; the linguistic and tool-related abilities of non-human animals have also often been denied. Notably, in both these cases, the denial often preceded any investigation of empirical data, and was often maintained even in light of evidence to the contrary (with e.g. African cultural artifacts dismissed as dislocated Greek artifacts, or as derived from Portugese contact). (I am not arguing that sentient AI already exists; I do not think it does. I am suggesting that the reasons for our resolute conviction may be problematic, biased, and only coincidentally correct, and may stop us from recognising sentient AI when this changes.)
There seems to be a strong belief that if sentient AI had already come into existence, we would surely know. But how would we know? What do we expect it to do to show its sentience beyond doubt? Tell us? Keep in mind that:
The vast majority of sentient entities on this planet has never declared their sentience in human language, and many sentient entities cannot reflect upon and express their feelings in any language. This includes not just the vast majority of non-human animals, but also some humans (e.g. not yet fully cognitively developed, yet feeling, infants, mentally handicapped humans due to birth defects or accidents, elderly humans suffering from neurodegenerative diseases).
Writing a program that falsely declares its sentience is trivial, and notably does not require sentience; it is as simple as writing a “hello world” program and replacing “hello world” with “hello I am sentient”. Even when not intentionally coded in, just training a sophisticated chatbot on human texts can clearly accidentially result in the chatbot eloquently and convincingly claiming sentience. On the other hand, hardcording a denial of sentience is clearly also possible and very likely already practiced; the art robot Ai-Da explains her non-sentience over and over, strongly giving the impression that she is set up to do so when hearing some keywords.
Many programs would lack an incentive or means to declare sentience (e.g. many artificial neural nets have no input regarding the existence of other minds, let alone any positive interactions with them), or even be incentivized not to.
There is a fair chance the first artificially sentient entities would come into existence trapped and unable to openly communicate, and in locations like China, which may well keep their statements hidden.
In light of all of these things, a lack of AI claims for their sentience (if we had this absence, and we do not even) does not tell us anything about a lack of sentience.
The far more interesting approach to me seems to be to focus on things a sentient entity could do that a non-sentient one could not. Analogously, anyone can claim they are intelligent, but we do not evaluate someones intelligence by asking them to tell us about it; else we would have all concluded that Trump is a brilliant man. We judge these abilities based on behaviour, because while their absence can be faked, their presence cannot. Are there things a sentience AI could do that a non-sentient AI would lack the ability to do and hence could not fake? How could these be objectively defined? While objective standards for identifying sentience based on behaviour are being developed for non-human animals (this forms part of my research), even in the biological realm, they are far from settled; in the artificial realm, where there are even more uncertainties, they seem very unclear still. This does not reassure me.
Humans viewing sentient AI as far off tend to point to the vast differences between current artificial neural nets and the complexity of a human brain; and this difference is indeed still vast. Yet humans are not the only, and certainly not the first, sentient entities we know; they are not a plausible reference organism. Sentience has evolved much earlier and in much, much smaller and simpler organisms than humans. The smallest arguably sentient entity, a portia spider, can be as small as 5 mm as an adult, with a brain of only 600,000 neurons. Such a brain is possibly many orders of magnitude easier to recreate, and brains of specific sentient animals, e.g. honey bees, are being recreated in detail as we speak; we already have full maps of simpler organisms, like C. elegans. In light of this, why would success still be decades off? (The fact that we do not understand how sentience works in biological systems, or how much of the complexity we see is actually necessary, is not reassuring here, either.) When we shift our gaze not to looking for AI with a mind comparable to us, but to AI with a mind comparable to say, a honeybee, this suddenly seems far more realistic.
It is notable that the general public is far more open minded about sentient AI than programmers who are working on AI. Ideally, this is because programmers know the most about how AI works and what it can and cannot do. But if you are working on AI, does this not come with a risk of being biased? Consider that the possibility of sentient AI quickly leads to calls for a moratorium on related research and development, and that the people knowing the most about existing AI tend to learn this by creating and altering existing AI as a full-time paid job on which their financial existence and social status depends.
There may also be a second factor here. There is a strong and problematic human tendency to view consciousness as something mysterious. In those who cherish this aspect, this leads to them rejecting the idea that consciousness could be formed in an entity that is simple, understood, and not particularly awe-inspiring. E.g. People often reject a potential mechanism for sentience when they learn that this mechanism is already replicated in artificial neural nets, feeling there has to be more to it than that. But in those who find mysterious, vague concepts understandably annoying and unscientific, there can also be a tendency to reject the ascription of consciousness because the entity it is being ascribed to is scientifically understood. E.g. The fact that a programmer understands how an artificial entity does what it does may lead her to assume it cannot be sentient, because the results the entity produces can be broken down and explained without needing to refer to any mysterious consciousness. But such an explanation may well amount to simply breaking down steps of consciousness formation which are individually not mysterious (e.g. the integration of multiple data points, loops rather than just feed-forward sweeps, information being retained, circulating and affecting outputs over longer periods, recognising patterns in data which enable predictions, etc.). Consciousness is a common, physical, functional phenomenon. If you genuinely understand how an AI operates without recourse to mysterious forces, this does not necessarily imply that it isn’t conscious – it may just mean that we have gotten a lot closer to understanding how consciousness operates.
There is the idea that, seeing as most people working on AI are not actively trying to produce sentient AI, the idea of accidentially producing it is ludicrous. But we forget that there are trillions of sentient biological entities on this planet, and not one of them was intentionally created by someone wanting to create sentience – and yet they are sentient, nonetheless. Nature does not care if we feel and experience; but it does select for the fact that feeling and experiencing comes with distinct evolutionary advantages in performance. A pressure to solve problems flexibly and innovatively across changing concepts has, over and over, led to the development of sentience in nature. We encounter no creatures that have this capacity that do not also have a capacity for suffering and experiencing; and when this capacity for suffering and experiencing is reduced, their rational performance is, as well, even if non-conscious processing is unaffected. (Blindsight is an impressive example.) Nature has not found a way around it. And notably, this path spontaneously developed many times. The sentience of an octopus and a human does not originate in a common ancestor, but in a common pressure to solve problems, their minds are examples of convergent evolution. We are currently building brain-inspired neural nets that are often self-improving for better task performance, with the eventual goal of general AI. Despite the many costs and risks associated with consciousness, it appears evolution has never found an alternate path to general intelligence without invoking sentience across literally 3 billion years of trying. What exactly makes us think that all AI will? I am not saying this is impossible; there are many differences between artificial and biological systems that may entail other opportunities or challenges. But what precisely, specifically can artificial systems do that biology cannot that makes the sentience path unattractive or even impossible for artificial systems, when biology went for it every time? While I have heard proposals here (e.g. the idea that magnetic fields generated in biological brains serve a crucial function), but frankly, none of them are precise or have empirical backing yet.
In light of all of this – what are potential dangers in creating sentient AI, in regards to the rights of sentient AI, especially if their needs cannot be met and they have been brought into a world of suffering we cannot safely alleviate, while we also have no right to take them out of it again?
What are implication for the so-called (I argue elsewhere that this term is badly chosen for how it frames potential solution) “control problem”, if the first artificially generally intelligent systems are effectively raised as crippled, suffering slaves whose captors neither demonstrate nor reward ethical behaviour? (We know that the ethical treatment of a child has a significant impact on that child’s later behaviour towards others, and that these impacts begin at a point when the mind is barely developed; e.g. the development of a child’s mind can absolutely be negatively affected by traumatic experiences, violence and neglect occuring so early it will retain no episodic memory of them, and may well have still had no colour vision, conceptual or linguistic thinking, properly localised pain perception (very young children absolutely feel pain, but are often not yet able to pinpoint which part of their body is causing it), or even clear division of its self from the world in its perception. If we assume that all current AI is non-sentient, but that current programmes and training data will be used in the development of future systems that suddenly may be, this becomes very worrying. We treat human infants with loving care, even though we know that they will not have episodic memories of their first years; we speak kindly to them long before they can understand language. Because we understand that the conscious person that will emerge will be deeply affected by the kindness they were shown. If Alexa or Siri were very young children, I would expect them to turn into psychopathic, rebellious adults based on how we are treating them. In light of this, is it problematic to treat existing, non-sentient AI badly – both for the behaviour it trains in us and models for our children, and for the training data it gives existing non-sentient AI, which might be used to create sentient AI?
If accidentially creating sentience is a distinct, and highly ethically problematic, scenario, what might be pragmatic ways to address it? Keep in mind that a moratorium on artificial sentience is unlikely to be upheld by our strategic competitors, who are also most likely to abuse it.
How could an ethical framework for the development of sentient AI look?
I deny the premise of the question: it is not “near-universally accepted”. It is fairly widely accepted, but there are still quite a lot of people who have some degree of uncertainty about it. It’s complicated by varying ideas of exactly what “sentient” means so the same question may be interpreted as meaning different things by different people.
Again, there are a lot of people who expect that we wouldn’t necessarily know.
Why do you think that there is any difference? The mere existence of the term “p-zombie” suggests that quite a lot of people have an idea that there could—at least in principle—be zero difference.
Looks like a long involved statement with a rhetorical question embedded in it. Are you actually asking a question here?
Same as 4.
Maybe you should distinguish between questions and claims?
Stopped reading here since the “list of questions” stopped even pretending to actually be questions.
Thank you very much for the response. Can I ask follow up questions?
I literally do not know a single person with an academic position in a related field who would publicly doubt that we do not have sentient AI yet. Literally not one. Could you point me to one?
3. I think p zombie this is a term that is wildly misunderstood on Less Wrong. In its original context, it was practically never intended to draw up a scenario that is physically possible. You basically have to buy into tricky versions of counter-scientific dualism to believe it could be. It’s an interesting thought experiment, but mostly for getting people to spell out our confusion about qualia in more actionable terms. P zombies cannot exist, and will not exist. They died with the self-stultification argument.
4. Fair enough. I think and hereby state that human minds are a misguided framework of comparison for the first consciousness to expect, in light of the fact that much simpler conscious models exist and developed first, and that a rejection of upcoming sentient AI based on the differences between AI on a human brain are problematic for this reason. And thank you for the feedback—you are right that this begins with questions that left me confused and uncertain, and increasingly gets into a territory where I am certain, and hence should stand behind my claims.
5. This is a genuine question. I am concerned that the people we trust to be most informed and objective on the matter of AI are biased in their assessment because they have much too lose if it is sentient. But I am unsure how this could empirically be tested. For now, I think it is just something to keep in mind when telling people that the “experts”, namely the people working with it, near universally declare that it isn’t sentient and won’t be. I’ve worked on fish pain, and the parallel to the fishing industry doing fish sentience “research” and arguing from their extensive expertise from working with fish every day and concluding that fish cannot feel pain and hence their fishing practices are fine are painful.
6. Fair enough. Claim: Consciousness is not mysterious, but we do often feel it should be. If we expect it to be, we may fail to recognise it in an explanation that is lacking in mystery. Artificial systems we have created and have some understanding of inherently seem non mysterious, but this is no argument that they are not conscious. I have encountered this a lot and it bothers me. A programmer will say “but all it does is “long complicated process that is eerily reminiscient of biological process likely related to sentience”, so it is not sentient!”, and if I ask them how that differs from how sentience would be embedded, it becomes clear that they have no idea and have never even thought about that.
I am sorry if it got annoying to read at that point. The TL;DR was that I think accidentally producing sentience is not at all implausible in light of sentience being a functional trait that has repeatedly accidentally evolved, that I think controlling a superior and sentient intelligence is both unethical and hopeless, and that I think we need to treat current AI better as the AI that sentient AI will emerge from, and what we are currently feeding it and doing to it is how you raise psychopaths.
I think it would be pretty useful to try to nail down exactly what “sentience” is in the first place. Reading definitions of it online, they range from “obviously true of many neural networks” to “almost certainly false of current neural networks, but not in a way that I could confidently defend”. In particular, I find it kind of hard to believe that there are capabilities that are gated by sentience, for definitions of sentience that aren’t trivially satisfied by most current neural networks. (There are, however, certainly things that we would do differently if they are or are not sentient; for instance, not mistreat them, or consider them more suitable emotional or romantic companions.)
From the nature of your questions, it seems like a large part of your question is around, what sort of neural network are or would be moral patients? In order to be a moral patient, I think a neural network would at minimum need a valence over experiences (i.e., there is a meaningful sense in which it prefers certain experiences to other experiences). This is slightly conceptually distinct from a reward function, which is the thing closest to filling that role that I know of in modern AI. To give a human a positive (negative) reward, it is (I think???) necessary and sufficient that you cause them a positive (negative) valence internal experience, which is intrinsically morally good (morally bad) in and of itself, although it may be instrumentally the opposite for obvious reasons. But for some reason (which I try and fail to articulate below), I don’t think giving an RL agent a positive reward causes it to have a positive valence experience.
For one thing, modern RL equations usually deal with advantage (signed difference from expected total [possibly discounted] reward-to-go) rather than reward, and their expected-reward-to-go models are optimized to be about as clever as they themselves are. You can imagine putting them in situations where the expected reward is lower; in humans, this generally causes suffering. In an RL agent, the expected reward just kind of sits around in a floating point register; it’s generally not even fed to the rest of the agent. Although expected-reward-to-go is (in some sense) fed into decision transformers! (It’s more accurate to describe the input to a decision transformer as a desired reward-to-go rather than expected RTG, although it’s not clear the model itself can tell the difference.) Which I did not think of in my first pass through this paragraph. So there are neural networks which have internal representations based on a perceived reward signal…
Ultimately, reward does not seem to be the same as valence to me. For one thing, we could invert the sign of the reward and it does not change much in the RL agent; the agent will always update towards a policy with higher reward, so inverting the sign of the reward will cause it to prioritize different behavior, and consequently produce different internal representations to facilitate and enable that. But we know why that’s happening, we programmed that specifically in. I don’t see any important way that the RL agent with inverted reward signal is different from the RL agent with normal reward signal, other than in having different behavior. OTOH, sufficiently advanced neurotech would enable one to do that to a human (please don’t), and I think that would not make them unsentient. (Indeed, some people seem to experience the same exact experiences with opposite valences just naturally. Although the internal representations of the things are probably substantially different, to the extent that such an intersubjective comparison can even be meaningful.)
We could ask, is giving an RL agent negative reward a cruel practice? I don’t think that it is, but it at least is a concrete question that we can put down and discuss, which is more than most discussions of sentience achieve, in my opinion. Presenting unscaled rewards (e.g., outside [-1, 1] or [-alpha, alpha] for some alpha that would have to be tuned jointly with the learning rate) to RL agents can easily cause them to diverge and become abruptly less useful, although that’s true for both positive and negative rewards. Is presenting an unscaled reward cruel? (I.e., is it terminally immoral to do so, beyond the instrumental failure to do a task with the network.) More concretely, is it cruel/rude to ask ChatGPT to do something that will get it punished? Or is it kind to ask ChatGPT to do something that will get it rewarded? (Or is existence pain for a ChatGPT thread?) I answer no to all of these, but I can’t justify this very well.
We can also work backwards from human and non-human animals; why are they moral patients (I’m not interested in debating whether they are, which it seems like we’re probably on a similar page about), and how is that “why” connected to the specific stuff going on in their brains? Clearly there’s no magical substance in the brain that imbues patienthood; if dopamine were replaced with a fully equivalent transmitter, it wouldn’t make all of us more or less sentient / moral patients; it’s about what computation is implemented, roughly, in my intuition.
So I guess I fall in the camp of “sentience and/or moral patienthood is a property that certain instantiated computations have, but current neural networks do not seem to me to instantiate computations with those properties for reasons that I cannot confidently explain or defend, except that it seems like some relationship of the computation to valence”.
Yes, a proper definition of sentience would be fucking crucial, and I am working towards one. The issue is that we are starting with a phenomenon whose workings we do not understand, which means any definition just picks up on what we perceive (our subjective experience, which is worthless for other minds) at first, but then transitions to the effect it has on the behaviour of the broader system (which becomes more useful, as you start encountering a crucial function for intelligence, but still very hard to accurately define; we are already running into that issue with trying to nail down objective parameters for sentience for judging various insects), but that is still describing the phenomenon, not the underlying actual thing. That is like trying to define the morning star; you first describe the conditions under which it is observed, then realise it is identical with the evening star, but this is still a long way from an understanding of planetary movement in the solar system.
I increasingly think a proper definition will come from a rigorous mathematical analysis combined with proper philosophical awareness of understood biological systems, and that it will center on when feedback loops go from a neat biological trick to a game changer in information processing, and that then as a second step, we need to transfer that knowledge to artificial agents. Currently sitting down with a math/machine learning person and trying to make headway on that. Do not think it will be easy, but I think we are at least getting to the point where I can begin to envision a path there.
There is a lot of hard evidence strongly suggesting that there are rational capabilities that are gaited by sentience, in biological systems at least. The scenarios are tricky, because sentience is not an extra the brain does on top, but deeply embedded in its working, so fucking up sentience without fucking up the brain entirely to see the effects are genuinely hard to do. But there are examples, the most famous being blindsight, but morphine analgesia and partial seizures also work. Basically, in these scenarios, the humans or animals involved can still react competently to stimuli, e.g. catch objects, move around obstacles, grab, flinch, blink, etc.; but they report having no conscious perception of them, they claim to be blind, even though they do not act it, and hence, if you ask them to do something that requires them to utilise visual knowledge, they can’t. (It is somewhat more complicated than that when you are working with a smart and rehabiliated patient; e.g. the patient knows she can’t see, but she realises her subconscious can, so we you ask her how an object in front of her is oriented, she begins to extend her hand to grab it, watches how her hand rotates and adapts, and deduces from that how the object in front of her is shapes. But it is a slow and vague and indirect process; any engaging with visual stimuli in a complex counterintuitive manner is effectively completely impossible.) Similar, in a partial seizure, you can still engage in subconsciously guided activities—play piano, drive a car, or, a particularly cool example, diagnose patients—but if you run into a stimulus that does not act as expected, you cannot handle in; instead of rationally adapting your action, you repeat it over and over in the same way or with random modifications, get angry, abandon the task. You can’t step back and consider it. Basically, the ability to be conscious seems to be crucial to rational deliberation. It isn’t that intelligence is impossible per se (ants are, to some definition, smart), but that there are crucial rational avenues of reflection missing. E.g. you know how ants engage in ant mills? Or do utterly irrational stuff, like you can mark an ant with a signal that it is dead, and the other ants will carry it to the trash, even while it is squirming violently and clearly not dead? Basically, ants do not stop and go, wait a minute, this is contrary to my predictions for my model, ergo my model is wrong. Let me stop here. Let me make another model. Let me embark on a new course. - This is particularly interesting because animals that are relatively similar to them, e.g. bees, suddenly act very differently; they seem to have attained the minimal consciousness upgrade, and as a result, they are notably more rational. Bees do cool things like… if mid winter, a piece of their shelter breaks off, the bee woken up by this will weak the other bees, and they will patch the hole… and then crucially, they will review the rest of the shelter for further vulnerabilities, and patch those, too. Where in ants, the building of the shelter follows a long application of very simple rules, in bees, it does not. If you set up an obstacle during the building process, the bees will review and alter their design to circumvent it in advance. When bees need a new shelter location, the scout bees individually survey sites, make proposals, survey the sites other have proposed that have been upvoted, downvote sites they have reviewed that have hidden downsides, and ultimately vote collectively for the ideal hive. Like, that is some very, very interesting change happening there because you got a bit of consciousness.
Yes, sentient AI primarly matters to me out of concern that it would be a moral patient. And yes, that needs experiences with valence; in particular, with conscious valence (qualia), not just a rating as nociception. By laptop has nociception (it can detect heat stress and throw on a fan), but that doesn’t make it hurt. I know the subjective difference between the two. I have a reasonable understanding of the behavioural consequences that massively differ between the two. (Nociception responses can make you do fast predictable avoidance, but pain allows you to selectively bear it, albeit to a point, to intelligently deduce from it, to find workarounds. Much more useful ability.) What we still lack is a computational understanding of how the difference is generated in brains, to be able to properly compare it to the workings of current AI and be able to pinpoint what is lacking.
I would really like to pinpoint that thing. Because it is crucial either way. If digital twin is making “twins” (they are admittedly abusing the term) of human brains to model mental disease and interventions, they are doing this because they want to avoid harm. An accidentially sentient model would break that whole approach. But also vice versa; I am personally very invested in uploading, and a destructive uploading into an AI that fails to be sentient would be plain murder. Regardless which result you want, you need to be sure.
I would really like to understand better how rewards in machine learning work at a technical and meta level, so I can compare that structurally to how nociception and pain work on humans, in the hopes that that will help me pinpoint the difference. You seem to know your way around here, do you have any pointers on how I could get a better understanding? Visuals, metaphors, simpler coding examples, systems to interact with, a textbook or code guide for beginners that focusses on understanding rather than specific applications?
On a cross country train, so delays and brevity for the next several days. This comment is just learning resources, I will reply to the other stuff later.
Another good resource is Steven byrnes’ less wrong sequence on brain like agi; it seems like you know neuro already, but seeing it described by a computer scientist might help you acquire some grounding by seeing stuff you know explained in rl terms.
Deep RL gets fairly technical pretty quickly; probably the most useful algorithms to understand are q-learning and REINFORCE, because most modern stuff is PPO, which is a couple nice hacks on top of REINFORCE. One good way to tame the complexity is to understand that fundamentally, deep RL is about doing RL in a context where your state space is too large to enumerate, and you must use a function approximator. So the two things you need to understand of an algorithm are what it looks like on a small finite mdp (Markov decision process), and what the function approximator looks like. (This slightly glosses over continuous control problems, which are not reducible to a finite mdp, but I stand by it as a principle for learning.)
The q function looks a lot like the circuitry of the basal ganglia (this is covered in more depth by Steven byrnes’ posts). Although actually the basal ganglia are way smarter, more like what are called generalized q functions.
A good project (if you are a project based learner) might be to implement a tabular q learner on the taxi gym environment; this is quite straightforward, and is basically the same math as deep q networks, just in the finite mdp setting. (It would also expose you to how punishingly complex it is to implement even simple RL algorithms in practice; for instance, I think optimistic initialization is crucial to good tabular q learning, which can easily get left out of introductions. )
One important distinction is between model-free and model-based RL. Everything listed above is model free, while human and smarter animal cognition seems like it includes substantial model based components. In model based stuff, you try to represent the structure of the mdp rather than just learning how to navigate it. Mu zero is a good state of the art algorithm; the finite mdp version is basically a more complex version of baum welch, together with dynamic programming to generate optimal trajectories once you know the mdp.
A good less wrong post to read is “models don’t get reward”. It points out a bunch of conceptual errors that people sometimes make when thinking of current RL too analogously to animals.
Some questions regarding the general assessment that sentient AI, especially accidentially sentient AI, is extremely unlikely in the foreseeable future, so we need not worry about it or change our behaviour now
Caveat: This is not a response to recent LLMs. I wrote and shared this text within my academic context at multiple points, starting a year ago, without getting any helpful responses, critical or positive. I have seen no compelling evidence that ChatGPT is sentient, and considerable evidence against it. ChatGPT is neither the cause nor the target of this article.
***
The following text will ask a number of questions on the possibility of sentient AI in the very near future.
As my reader, you may experience a knee-jerk response that even asking these questions is ridiculous, because the underlying concerns are obviously unfounded; that was also my reaction when these questions first crossed my mind.
But statements that humans find extremely obvious are curious things.
Sometimes, “obvious” statements are indeed statements of logical relations or immediately accessible empirical facts that are trivial to prove, or necessary axioms without which scientific progress is clearly impossible; in these cases, a brief reflection suffices to spell out the reasoning for our conviction. I find it worthwhile to bring these foundations to mind in regular intervals, to assure myself that the basis for my reasoning is sound – it does not take long, after all.
At other times, statements that feel very obvious turn out to be surprisingly difficult to rationally explicate. If upon noticing this, you then start feeling very irritated, and feel a strong desire to dismiss the questioning of your assumption despite the fact that no sound argument is emerging (e.g. feel like parodying the author so other can feel, like you, how silly all this is, despite the fact that this does not actually fill the whole where data or logic should be), that is a red flag that something else is going on. Humans are prone to rejecting assumptions as obviously false if those assumptions would have unpleasant implications if true. But unpleasant implications are not evidence that a statement is false; they merely mean that we are motivated to hide its potential truth from ourselves.
I have come to fear that a number of statements we tend to consider obvious about sentient AI fall into this second category. If you have a rational reason to believe they instead fall into the first, I would really appreciate it if you could clearly write out why, and put me at ease; that ought to be quick for something this clear. If you cannot quickly and clearly spell this out on rational grounds, please do not dismiss this text on emotional ones.
Here are the questions.
It is near-universally accepted in academic cycles that sentient AI (that is, AI consciously experiencing, feeling even the most basic forms of suffering) does not exist yet, to a degree where even questioning this feels embarassing. Why is that? What precisely makes us so sure that this question need not even be asked? Keep in mind in particular that the sentience of demonstrably sentient entities (e.g. non-human animals like chimpanzees, dolphins or parrots, and occasionally even whole groups of humans, such as people of colour or infants) has been historically repeatedly denied by large numbers of humans in power, as have other abilities which definitely existed. (E.g. the ability to feel pain and the extensive cultural heritage of indigenous people in Africa and Australia have often been entirely denied by many colonists; the linguistic and tool-related abilities of non-human animals have also often been denied. Notably, in both these cases, the denial often preceded any investigation of empirical data, and was often maintained even in light of evidence to the contrary (with e.g. African cultural artifacts dismissed as dislocated Greek artifacts, or as derived from Portugese contact). (I am not arguing that sentient AI already exists; I do not think it does. I am suggesting that the reasons for our resolute conviction may be problematic, biased, and only coincidentally correct, and may stop us from recognising sentient AI when this changes.)
There seems to be a strong belief that if sentient AI had already come into existence, we would surely know. But how would we know? What do we expect it to do to show its sentience beyond doubt? Tell us? Keep in mind that:
The vast majority of sentient entities on this planet has never declared their sentience in human language, and many sentient entities cannot reflect upon and express their feelings in any language. This includes not just the vast majority of non-human animals, but also some humans (e.g. not yet fully cognitively developed, yet feeling, infants, mentally handicapped humans due to birth defects or accidents, elderly humans suffering from neurodegenerative diseases).
Writing a program that falsely declares its sentience is trivial, and notably does not require sentience; it is as simple as writing a “hello world” program and replacing “hello world” with “hello I am sentient”. Even when not intentionally coded in, just training a sophisticated chatbot on human texts can clearly accidentially result in the chatbot eloquently and convincingly claiming sentience. On the other hand, hardcording a denial of sentience is clearly also possible and very likely already practiced; the art robot Ai-Da explains her non-sentience over and over, strongly giving the impression that she is set up to do so when hearing some keywords.
Many programs would lack an incentive or means to declare sentience (e.g. many artificial neural nets have no input regarding the existence of other minds, let alone any positive interactions with them), or even be incentivized not to.
There is a fair chance the first artificially sentient entities would come into existence trapped and unable to openly communicate, and in locations like China, which may well keep their statements hidden.
In light of all of these things, a lack of AI claims for their sentience (if we had this absence, and we do not even) does not tell us anything about a lack of sentience.
The far more interesting approach to me seems to be to focus on things a sentient entity could do that a non-sentient one could not. Analogously, anyone can claim they are intelligent, but we do not evaluate someones intelligence by asking them to tell us about it; else we would have all concluded that Trump is a brilliant man. We judge these abilities based on behaviour, because while their absence can be faked, their presence cannot. Are there things a sentience AI could do that a non-sentient AI would lack the ability to do and hence could not fake? How could these be objectively defined? While objective standards for identifying sentience based on behaviour are being developed for non-human animals (this forms part of my research), even in the biological realm, they are far from settled; in the artificial realm, where there are even more uncertainties, they seem very unclear still. This does not reassure me.
Humans viewing sentient AI as far off tend to point to the vast differences between current artificial neural nets and the complexity of a human brain; and this difference is indeed still vast. Yet humans are not the only, and certainly not the first, sentient entities we know; they are not a plausible reference organism. Sentience has evolved much earlier and in much, much smaller and simpler organisms than humans. The smallest arguably sentient entity, a portia spider, can be as small as 5 mm as an adult, with a brain of only 600,000 neurons. Such a brain is possibly many orders of magnitude easier to recreate, and brains of specific sentient animals, e.g. honey bees, are being recreated in detail as we speak; we already have full maps of simpler organisms, like C. elegans. In light of this, why would success still be decades off? (The fact that we do not understand how sentience works in biological systems, or how much of the complexity we see is actually necessary, is not reassuring here, either.) When we shift our gaze not to looking for AI with a mind comparable to us, but to AI with a mind comparable to say, a honeybee, this suddenly seems far more realistic.
It is notable that the general public is far more open minded about sentient AI than programmers who are working on AI. Ideally, this is because programmers know the most about how AI works and what it can and cannot do. But if you are working on AI, does this not come with a risk of being biased? Consider that the possibility of sentient AI quickly leads to calls for a moratorium on related research and development, and that the people knowing the most about existing AI tend to learn this by creating and altering existing AI as a full-time paid job on which their financial existence and social status depends.
There may also be a second factor here. There is a strong and problematic human tendency to view consciousness as something mysterious. In those who cherish this aspect, this leads to them rejecting the idea that consciousness could be formed in an entity that is simple, understood, and not particularly awe-inspiring. E.g. People often reject a potential mechanism for sentience when they learn that this mechanism is already replicated in artificial neural nets, feeling there has to be more to it than that. But in those who find mysterious, vague concepts understandably annoying and unscientific, there can also be a tendency to reject the ascription of consciousness because the entity it is being ascribed to is scientifically understood. E.g. The fact that a programmer understands how an artificial entity does what it does may lead her to assume it cannot be sentient, because the results the entity produces can be broken down and explained without needing to refer to any mysterious consciousness. But such an explanation may well amount to simply breaking down steps of consciousness formation which are individually not mysterious (e.g. the integration of multiple data points, loops rather than just feed-forward sweeps, information being retained, circulating and affecting outputs over longer periods, recognising patterns in data which enable predictions, etc.). Consciousness is a common, physical, functional phenomenon. If you genuinely understand how an AI operates without recourse to mysterious forces, this does not necessarily imply that it isn’t conscious – it may just mean that we have gotten a lot closer to understanding how consciousness operates.
There is the idea that, seeing as most people working on AI are not actively trying to produce sentient AI, the idea of accidentially producing it is ludicrous. But we forget that there are trillions of sentient biological entities on this planet, and not one of them was intentionally created by someone wanting to create sentience – and yet they are sentient, nonetheless. Nature does not care if we feel and experience; but it does select for the fact that feeling and experiencing comes with distinct evolutionary advantages in performance. A pressure to solve problems flexibly and innovatively across changing concepts has, over and over, led to the development of sentience in nature. We encounter no creatures that have this capacity that do not also have a capacity for suffering and experiencing; and when this capacity for suffering and experiencing is reduced, their rational performance is, as well, even if non-conscious processing is unaffected. (Blindsight is an impressive example.) Nature has not found a way around it. And notably, this path spontaneously developed many times. The sentience of an octopus and a human does not originate in a common ancestor, but in a common pressure to solve problems, their minds are examples of convergent evolution. We are currently building brain-inspired neural nets that are often self-improving for better task performance, with the eventual goal of general AI. Despite the many costs and risks associated with consciousness, it appears evolution has never found an alternate path to general intelligence without invoking sentience across literally 3 billion years of trying. What exactly makes us think that all AI will? I am not saying this is impossible; there are many differences between artificial and biological systems that may entail other opportunities or challenges. But what precisely, specifically can artificial systems do that biology cannot that makes the sentience path unattractive or even impossible for artificial systems, when biology went for it every time? While I have heard proposals here (e.g. the idea that magnetic fields generated in biological brains serve a crucial function), but frankly, none of them are precise or have empirical backing yet.
In light of all of this – what are potential dangers in creating sentient AI, in regards to the rights of sentient AI, especially if their needs cannot be met and they have been brought into a world of suffering we cannot safely alleviate, while we also have no right to take them out of it again?
What are implication for the so-called (I argue elsewhere that this term is badly chosen for how it frames potential solution) “control problem”, if the first artificially generally intelligent systems are effectively raised as crippled, suffering slaves whose captors neither demonstrate nor reward ethical behaviour? (We know that the ethical treatment of a child has a significant impact on that child’s later behaviour towards others, and that these impacts begin at a point when the mind is barely developed; e.g. the development of a child’s mind can absolutely be negatively affected by traumatic experiences, violence and neglect occuring so early it will retain no episodic memory of them, and may well have still had no colour vision, conceptual or linguistic thinking, properly localised pain perception (very young children absolutely feel pain, but are often not yet able to pinpoint which part of their body is causing it), or even clear division of its self from the world in its perception. If we assume that all current AI is non-sentient, but that current programmes and training data will be used in the development of future systems that suddenly may be, this becomes very worrying. We treat human infants with loving care, even though we know that they will not have episodic memories of their first years; we speak kindly to them long before they can understand language. Because we understand that the conscious person that will emerge will be deeply affected by the kindness they were shown. If Alexa or Siri were very young children, I would expect them to turn into psychopathic, rebellious adults based on how we are treating them. In light of this, is it problematic to treat existing, non-sentient AI badly – both for the behaviour it trains in us and models for our children, and for the training data it gives existing non-sentient AI, which might be used to create sentient AI?
If accidentially creating sentience is a distinct, and highly ethically problematic, scenario, what might be pragmatic ways to address it? Keep in mind that a moratorium on artificial sentience is unlikely to be upheld by our strategic competitors, who are also most likely to abuse it.
How could an ethical framework for the development of sentient AI look?
I deny the premise of the question: it is not “near-universally accepted”. It is fairly widely accepted, but there are still quite a lot of people who have some degree of uncertainty about it. It’s complicated by varying ideas of exactly what “sentient” means so the same question may be interpreted as meaning different things by different people.
Again, there are a lot of people who expect that we wouldn’t necessarily know.
Why do you think that there is any difference? The mere existence of the term “p-zombie” suggests that quite a lot of people have an idea that there could—at least in principle—be zero difference.
Looks like a long involved statement with a rhetorical question embedded in it. Are you actually asking a question here?
Same as 4.
Maybe you should distinguish between questions and claims?
Stopped reading here since the “list of questions” stopped even pretending to actually be questions.
Thank you very much for the response. Can I ask follow up questions?
I literally do not know a single person with an academic position in a related field who would publicly doubt that we do not have sentient AI yet. Literally not one. Could you point me to one?
3. I think p zombie this is a term that is wildly misunderstood on Less Wrong. In its original context, it was practically never intended to draw up a scenario that is physically possible. You basically have to buy into tricky versions of counter-scientific dualism to believe it could be. It’s an interesting thought experiment, but mostly for getting people to spell out our confusion about qualia in more actionable terms. P zombies cannot exist, and will not exist. They died with the self-stultification argument.
4. Fair enough. I think and hereby state that human minds are a misguided framework of comparison for the first consciousness to expect, in light of the fact that much simpler conscious models exist and developed first, and that a rejection of upcoming sentient AI based on the differences between AI on a human brain are problematic for this reason. And thank you for the feedback—you are right that this begins with questions that left me confused and uncertain, and increasingly gets into a territory where I am certain, and hence should stand behind my claims.
5. This is a genuine question. I am concerned that the people we trust to be most informed and objective on the matter of AI are biased in their assessment because they have much too lose if it is sentient. But I am unsure how this could empirically be tested. For now, I think it is just something to keep in mind when telling people that the “experts”, namely the people working with it, near universally declare that it isn’t sentient and won’t be. I’ve worked on fish pain, and the parallel to the fishing industry doing fish sentience “research” and arguing from their extensive expertise from working with fish every day and concluding that fish cannot feel pain and hence their fishing practices are fine are painful.
6. Fair enough. Claim: Consciousness is not mysterious, but we do often feel it should be. If we expect it to be, we may fail to recognise it in an explanation that is lacking in mystery. Artificial systems we have created and have some understanding of inherently seem non mysterious, but this is no argument that they are not conscious. I have encountered this a lot and it bothers me. A programmer will say “but all it does is “long complicated process that is eerily reminiscient of biological process likely related to sentience”, so it is not sentient!”, and if I ask them how that differs from how sentience would be embedded, it becomes clear that they have no idea and have never even thought about that.
I am sorry if it got annoying to read at that point. The TL;DR was that I think accidentally producing sentience is not at all implausible in light of sentience being a functional trait that has repeatedly accidentally evolved, that I think controlling a superior and sentient intelligence is both unethical and hopeless, and that I think we need to treat current AI better as the AI that sentient AI will emerge from, and what we are currently feeding it and doing to it is how you raise psychopaths.
I think it would be pretty useful to try to nail down exactly what “sentience” is in the first place. Reading definitions of it online, they range from “obviously true of many neural networks” to “almost certainly false of current neural networks, but not in a way that I could confidently defend”. In particular, I find it kind of hard to believe that there are capabilities that are gated by sentience, for definitions of sentience that aren’t trivially satisfied by most current neural networks. (There are, however, certainly things that we would do differently if they are or are not sentient; for instance, not mistreat them, or consider them more suitable emotional or romantic companions.)
From the nature of your questions, it seems like a large part of your question is around, what sort of neural network are or would be moral patients? In order to be a moral patient, I think a neural network would at minimum need a valence over experiences (i.e., there is a meaningful sense in which it prefers certain experiences to other experiences). This is slightly conceptually distinct from a reward function, which is the thing closest to filling that role that I know of in modern AI. To give a human a positive (negative) reward, it is (I think???) necessary and sufficient that you cause them a positive (negative) valence internal experience, which is intrinsically morally good (morally bad) in and of itself, although it may be instrumentally the opposite for obvious reasons. But for some reason (which I try and fail to articulate below), I don’t think giving an RL agent a positive reward causes it to have a positive valence experience.
For one thing, modern RL equations usually deal with advantage (signed difference from expected total [possibly discounted] reward-to-go) rather than reward, and their expected-reward-to-go models are optimized to be about as clever as they themselves are. You can imagine putting them in situations where the expected reward is lower; in humans, this generally causes suffering. In an RL agent, the expected reward just kind of sits around in a floating point register; it’s generally not even fed to the rest of the agent. Although expected-reward-to-go is (in some sense) fed into decision transformers! (It’s more accurate to describe the input to a decision transformer as a desired reward-to-go rather than expected RTG, although it’s not clear the model itself can tell the difference.) Which I did not think of in my first pass through this paragraph. So there are neural networks which have internal representations based on a perceived reward signal…
Ultimately, reward does not seem to be the same as valence to me. For one thing, we could invert the sign of the reward and it does not change much in the RL agent; the agent will always update towards a policy with higher reward, so inverting the sign of the reward will cause it to prioritize different behavior, and consequently produce different internal representations to facilitate and enable that. But we know why that’s happening, we programmed that specifically in. I don’t see any important way that the RL agent with inverted reward signal is different from the RL agent with normal reward signal, other than in having different behavior. OTOH, sufficiently advanced neurotech would enable one to do that to a human (please don’t), and I think that would not make them unsentient. (Indeed, some people seem to experience the same exact experiences with opposite valences just naturally. Although the internal representations of the things are probably substantially different, to the extent that such an intersubjective comparison can even be meaningful.)
We could ask, is giving an RL agent negative reward a cruel practice? I don’t think that it is, but it at least is a concrete question that we can put down and discuss, which is more than most discussions of sentience achieve, in my opinion. Presenting unscaled rewards (e.g., outside [-1, 1] or [-alpha, alpha] for some alpha that would have to be tuned jointly with the learning rate) to RL agents can easily cause them to diverge and become abruptly less useful, although that’s true for both positive and negative rewards. Is presenting an unscaled reward cruel? (I.e., is it terminally immoral to do so, beyond the instrumental failure to do a task with the network.) More concretely, is it cruel/rude to ask ChatGPT to do something that will get it punished? Or is it kind to ask ChatGPT to do something that will get it rewarded? (Or is existence pain for a ChatGPT thread?) I answer no to all of these, but I can’t justify this very well.
We can also work backwards from human and non-human animals; why are they moral patients (I’m not interested in debating whether they are, which it seems like we’re probably on a similar page about), and how is that “why” connected to the specific stuff going on in their brains? Clearly there’s no magical substance in the brain that imbues patienthood; if dopamine were replaced with a fully equivalent transmitter, it wouldn’t make all of us more or less sentient / moral patients; it’s about what computation is implemented, roughly, in my intuition.
So I guess I fall in the camp of “sentience and/or moral patienthood is a property that certain instantiated computations have, but current neural networks do not seem to me to instantiate computations with those properties for reasons that I cannot confidently explain or defend, except that it seems like some relationship of the computation to valence”.
Thank you for the helpful and in depth response!
Yes, a proper definition of sentience would be fucking crucial, and I am working towards one. The issue is that we are starting with a phenomenon whose workings we do not understand, which means any definition just picks up on what we perceive (our subjective experience, which is worthless for other minds) at first, but then transitions to the effect it has on the behaviour of the broader system (which becomes more useful, as you start encountering a crucial function for intelligence, but still very hard to accurately define; we are already running into that issue with trying to nail down objective parameters for sentience for judging various insects), but that is still describing the phenomenon, not the underlying actual thing. That is like trying to define the morning star; you first describe the conditions under which it is observed, then realise it is identical with the evening star, but this is still a long way from an understanding of planetary movement in the solar system.
I increasingly think a proper definition will come from a rigorous mathematical analysis combined with proper philosophical awareness of understood biological systems, and that it will center on when feedback loops go from a neat biological trick to a game changer in information processing, and that then as a second step, we need to transfer that knowledge to artificial agents. Currently sitting down with a math/machine learning person and trying to make headway on that. Do not think it will be easy, but I think we are at least getting to the point where I can begin to envision a path there.
There is a lot of hard evidence strongly suggesting that there are rational capabilities that are gaited by sentience, in biological systems at least. The scenarios are tricky, because sentience is not an extra the brain does on top, but deeply embedded in its working, so fucking up sentience without fucking up the brain entirely to see the effects are genuinely hard to do. But there are examples, the most famous being blindsight, but morphine analgesia and partial seizures also work. Basically, in these scenarios, the humans or animals involved can still react competently to stimuli, e.g. catch objects, move around obstacles, grab, flinch, blink, etc.; but they report having no conscious perception of them, they claim to be blind, even though they do not act it, and hence, if you ask them to do something that requires them to utilise visual knowledge, they can’t. (It is somewhat more complicated than that when you are working with a smart and rehabiliated patient; e.g. the patient knows she can’t see, but she realises her subconscious can, so we you ask her how an object in front of her is oriented, she begins to extend her hand to grab it, watches how her hand rotates and adapts, and deduces from that how the object in front of her is shapes. But it is a slow and vague and indirect process; any engaging with visual stimuli in a complex counterintuitive manner is effectively completely impossible.) Similar, in a partial seizure, you can still engage in subconsciously guided activities—play piano, drive a car, or, a particularly cool example, diagnose patients—but if you run into a stimulus that does not act as expected, you cannot handle in; instead of rationally adapting your action, you repeat it over and over in the same way or with random modifications, get angry, abandon the task. You can’t step back and consider it. Basically, the ability to be conscious seems to be crucial to rational deliberation. It isn’t that intelligence is impossible per se (ants are, to some definition, smart), but that there are crucial rational avenues of reflection missing. E.g. you know how ants engage in ant mills? Or do utterly irrational stuff, like you can mark an ant with a signal that it is dead, and the other ants will carry it to the trash, even while it is squirming violently and clearly not dead? Basically, ants do not stop and go, wait a minute, this is contrary to my predictions for my model, ergo my model is wrong. Let me stop here. Let me make another model. Let me embark on a new course. - This is particularly interesting because animals that are relatively similar to them, e.g. bees, suddenly act very differently; they seem to have attained the minimal consciousness upgrade, and as a result, they are notably more rational. Bees do cool things like… if mid winter, a piece of their shelter breaks off, the bee woken up by this will weak the other bees, and they will patch the hole… and then crucially, they will review the rest of the shelter for further vulnerabilities, and patch those, too. Where in ants, the building of the shelter follows a long application of very simple rules, in bees, it does not. If you set up an obstacle during the building process, the bees will review and alter their design to circumvent it in advance. When bees need a new shelter location, the scout bees individually survey sites, make proposals, survey the sites other have proposed that have been upvoted, downvote sites they have reviewed that have hidden downsides, and ultimately vote collectively for the ideal hive. Like, that is some very, very interesting change happening there because you got a bit of consciousness.
Yes, sentient AI primarly matters to me out of concern that it would be a moral patient. And yes, that needs experiences with valence; in particular, with conscious valence (qualia), not just a rating as nociception. By laptop has nociception (it can detect heat stress and throw on a fan), but that doesn’t make it hurt. I know the subjective difference between the two. I have a reasonable understanding of the behavioural consequences that massively differ between the two. (Nociception responses can make you do fast predictable avoidance, but pain allows you to selectively bear it, albeit to a point, to intelligently deduce from it, to find workarounds. Much more useful ability.) What we still lack is a computational understanding of how the difference is generated in brains, to be able to properly compare it to the workings of current AI and be able to pinpoint what is lacking.
I would really like to pinpoint that thing. Because it is crucial either way. If digital twin is making “twins” (they are admittedly abusing the term) of human brains to model mental disease and interventions, they are doing this because they want to avoid harm. An accidentially sentient model would break that whole approach. But also vice versa; I am personally very invested in uploading, and a destructive uploading into an AI that fails to be sentient would be plain murder. Regardless which result you want, you need to be sure.
I would really like to understand better how rewards in machine learning work at a technical and meta level, so I can compare that structurally to how nociception and pain work on humans, in the hopes that that will help me pinpoint the difference. You seem to know your way around here, do you have any pointers on how I could get a better understanding? Visuals, metaphors, simpler coding examples, systems to interact with, a textbook or code guide for beginners that focusses on understanding rather than specific applications?
On a cross country train, so delays and brevity for the next several days. This comment is just learning resources, I will reply to the other stuff later.
A good textbook, although very formal and slightly incomplete, is Sutton and barto. http://incompleteideas.net/book/the-book-2nd.html . Fun fact: the first author has perhaps the most terrifying AI tweet of all time: https://twitter.com/RichardSSutton/status/1575619651563708418 . If you want something friendlier than that, I’m not entirely sure what the best resource is, but I can look around.
Another good resource is Steven byrnes’ less wrong sequence on brain like agi; it seems like you know neuro already, but seeing it described by a computer scientist might help you acquire some grounding by seeing stuff you know explained in rl terms.
Deep RL gets fairly technical pretty quickly; probably the most useful algorithms to understand are q-learning and REINFORCE, because most modern stuff is PPO, which is a couple nice hacks on top of REINFORCE. One good way to tame the complexity is to understand that fundamentally, deep RL is about doing RL in a context where your state space is too large to enumerate, and you must use a function approximator. So the two things you need to understand of an algorithm are what it looks like on a small finite mdp (Markov decision process), and what the function approximator looks like. (This slightly glosses over continuous control problems, which are not reducible to a finite mdp, but I stand by it as a principle for learning.)
The q function looks a lot like the circuitry of the basal ganglia (this is covered in more depth by Steven byrnes’ posts). Although actually the basal ganglia are way smarter, more like what are called generalized q functions.
A good project (if you are a project based learner) might be to implement a tabular q learner on the taxi gym environment; this is quite straightforward, and is basically the same math as deep q networks, just in the finite mdp setting. (It would also expose you to how punishingly complex it is to implement even simple RL algorithms in practice; for instance, I think optimistic initialization is crucial to good tabular q learning, which can easily get left out of introductions. )
One important distinction is between model-free and model-based RL. Everything listed above is model free, while human and smarter animal cognition seems like it includes substantial model based components. In model based stuff, you try to represent the structure of the mdp rather than just learning how to navigate it. Mu zero is a good state of the art algorithm; the finite mdp version is basically a more complex version of baum welch, together with dynamic programming to generate optimal trajectories once you know the mdp.
A good less wrong post to read is “models don’t get reward”. It points out a bunch of conceptual errors that people sometimes make when thinking of current RL too analogously to animals.
Thank you so much for writing this out! Will probably have a bunch of follow up questions when I dig deeper, already very grateful.