Yes, a proper definition of sentience would be fucking crucial, and I am working towards one. The issue is that we are starting with a phenomenon whose workings we do not understand, which means any definition just picks up on what we perceive (our subjective experience, which is worthless for other minds) at first, but then transitions to the effect it has on the behaviour of the broader system (which becomes more useful, as you start encountering a crucial function for intelligence, but still very hard to accurately define; we are already running into that issue with trying to nail down objective parameters for sentience for judging various insects), but that is still describing the phenomenon, not the underlying actual thing. That is like trying to define the morning star; you first describe the conditions under which it is observed, then realise it is identical with the evening star, but this is still a long way from an understanding of planetary movement in the solar system.
I increasingly think a proper definition will come from a rigorous mathematical analysis combined with proper philosophical awareness of understood biological systems, and that it will center on when feedback loops go from a neat biological trick to a game changer in information processing, and that then as a second step, we need to transfer that knowledge to artificial agents. Currently sitting down with a math/machine learning person and trying to make headway on that. Do not think it will be easy, but I think we are at least getting to the point where I can begin to envision a path there.
There is a lot of hard evidence strongly suggesting that there are rational capabilities that are gaited by sentience, in biological systems at least. The scenarios are tricky, because sentience is not an extra the brain does on top, but deeply embedded in its working, so fucking up sentience without fucking up the brain entirely to see the effects are genuinely hard to do. But there are examples, the most famous being blindsight, but morphine analgesia and partial seizures also work. Basically, in these scenarios, the humans or animals involved can still react competently to stimuli, e.g. catch objects, move around obstacles, grab, flinch, blink, etc.; but they report having no conscious perception of them, they claim to be blind, even though they do not act it, and hence, if you ask them to do something that requires them to utilise visual knowledge, they can’t. (It is somewhat more complicated than that when you are working with a smart and rehabiliated patient; e.g. the patient knows she can’t see, but she realises her subconscious can, so we you ask her how an object in front of her is oriented, she begins to extend her hand to grab it, watches how her hand rotates and adapts, and deduces from that how the object in front of her is shapes. But it is a slow and vague and indirect process; any engaging with visual stimuli in a complex counterintuitive manner is effectively completely impossible.) Similar, in a partial seizure, you can still engage in subconsciously guided activities—play piano, drive a car, or, a particularly cool example, diagnose patients—but if you run into a stimulus that does not act as expected, you cannot handle in; instead of rationally adapting your action, you repeat it over and over in the same way or with random modifications, get angry, abandon the task. You can’t step back and consider it. Basically, the ability to be conscious seems to be crucial to rational deliberation. It isn’t that intelligence is impossible per se (ants are, to some definition, smart), but that there are crucial rational avenues of reflection missing. E.g. you know how ants engage in ant mills? Or do utterly irrational stuff, like you can mark an ant with a signal that it is dead, and the other ants will carry it to the trash, even while it is squirming violently and clearly not dead? Basically, ants do not stop and go, wait a minute, this is contrary to my predictions for my model, ergo my model is wrong. Let me stop here. Let me make another model. Let me embark on a new course. - This is particularly interesting because animals that are relatively similar to them, e.g. bees, suddenly act very differently; they seem to have attained the minimal consciousness upgrade, and as a result, they are notably more rational. Bees do cool things like… if mid winter, a piece of their shelter breaks off, the bee woken up by this will weak the other bees, and they will patch the hole… and then crucially, they will review the rest of the shelter for further vulnerabilities, and patch those, too. Where in ants, the building of the shelter follows a long application of very simple rules, in bees, it does not. If you set up an obstacle during the building process, the bees will review and alter their design to circumvent it in advance. When bees need a new shelter location, the scout bees individually survey sites, make proposals, survey the sites other have proposed that have been upvoted, downvote sites they have reviewed that have hidden downsides, and ultimately vote collectively for the ideal hive. Like, that is some very, very interesting change happening there because you got a bit of consciousness.
Yes, sentient AI primarly matters to me out of concern that it would be a moral patient. And yes, that needs experiences with valence; in particular, with conscious valence (qualia), not just a rating as nociception. By laptop has nociception (it can detect heat stress and throw on a fan), but that doesn’t make it hurt. I know the subjective difference between the two. I have a reasonable understanding of the behavioural consequences that massively differ between the two. (Nociception responses can make you do fast predictable avoidance, but pain allows you to selectively bear it, albeit to a point, to intelligently deduce from it, to find workarounds. Much more useful ability.) What we still lack is a computational understanding of how the difference is generated in brains, to be able to properly compare it to the workings of current AI and be able to pinpoint what is lacking.
I would really like to pinpoint that thing. Because it is crucial either way. If digital twin is making “twins” (they are admittedly abusing the term) of human brains to model mental disease and interventions, they are doing this because they want to avoid harm. An accidentially sentient model would break that whole approach. But also vice versa; I am personally very invested in uploading, and a destructive uploading into an AI that fails to be sentient would be plain murder. Regardless which result you want, you need to be sure.
I would really like to understand better how rewards in machine learning work at a technical and meta level, so I can compare that structurally to how nociception and pain work on humans, in the hopes that that will help me pinpoint the difference. You seem to know your way around here, do you have any pointers on how I could get a better understanding? Visuals, metaphors, simpler coding examples, systems to interact with, a textbook or code guide for beginners that focusses on understanding rather than specific applications?
On a cross country train, so delays and brevity for the next several days. This comment is just learning resources, I will reply to the other stuff later.
Another good resource is Steven byrnes’ less wrong sequence on brain like agi; it seems like you know neuro already, but seeing it described by a computer scientist might help you acquire some grounding by seeing stuff you know explained in rl terms.
Deep RL gets fairly technical pretty quickly; probably the most useful algorithms to understand are q-learning and REINFORCE, because most modern stuff is PPO, which is a couple nice hacks on top of REINFORCE. One good way to tame the complexity is to understand that fundamentally, deep RL is about doing RL in a context where your state space is too large to enumerate, and you must use a function approximator. So the two things you need to understand of an algorithm are what it looks like on a small finite mdp (Markov decision process), and what the function approximator looks like. (This slightly glosses over continuous control problems, which are not reducible to a finite mdp, but I stand by it as a principle for learning.)
The q function looks a lot like the circuitry of the basal ganglia (this is covered in more depth by Steven byrnes’ posts). Although actually the basal ganglia are way smarter, more like what are called generalized q functions.
A good project (if you are a project based learner) might be to implement a tabular q learner on the taxi gym environment; this is quite straightforward, and is basically the same math as deep q networks, just in the finite mdp setting. (It would also expose you to how punishingly complex it is to implement even simple RL algorithms in practice; for instance, I think optimistic initialization is crucial to good tabular q learning, which can easily get left out of introductions. )
One important distinction is between model-free and model-based RL. Everything listed above is model free, while human and smarter animal cognition seems like it includes substantial model based components. In model based stuff, you try to represent the structure of the mdp rather than just learning how to navigate it. Mu zero is a good state of the art algorithm; the finite mdp version is basically a more complex version of baum welch, together with dynamic programming to generate optimal trajectories once you know the mdp.
A good less wrong post to read is “models don’t get reward”. It points out a bunch of conceptual errors that people sometimes make when thinking of current RL too analogously to animals.
Thank you for the helpful and in depth response!
Yes, a proper definition of sentience would be fucking crucial, and I am working towards one. The issue is that we are starting with a phenomenon whose workings we do not understand, which means any definition just picks up on what we perceive (our subjective experience, which is worthless for other minds) at first, but then transitions to the effect it has on the behaviour of the broader system (which becomes more useful, as you start encountering a crucial function for intelligence, but still very hard to accurately define; we are already running into that issue with trying to nail down objective parameters for sentience for judging various insects), but that is still describing the phenomenon, not the underlying actual thing. That is like trying to define the morning star; you first describe the conditions under which it is observed, then realise it is identical with the evening star, but this is still a long way from an understanding of planetary movement in the solar system.
I increasingly think a proper definition will come from a rigorous mathematical analysis combined with proper philosophical awareness of understood biological systems, and that it will center on when feedback loops go from a neat biological trick to a game changer in information processing, and that then as a second step, we need to transfer that knowledge to artificial agents. Currently sitting down with a math/machine learning person and trying to make headway on that. Do not think it will be easy, but I think we are at least getting to the point where I can begin to envision a path there.
There is a lot of hard evidence strongly suggesting that there are rational capabilities that are gaited by sentience, in biological systems at least. The scenarios are tricky, because sentience is not an extra the brain does on top, but deeply embedded in its working, so fucking up sentience without fucking up the brain entirely to see the effects are genuinely hard to do. But there are examples, the most famous being blindsight, but morphine analgesia and partial seizures also work. Basically, in these scenarios, the humans or animals involved can still react competently to stimuli, e.g. catch objects, move around obstacles, grab, flinch, blink, etc.; but they report having no conscious perception of them, they claim to be blind, even though they do not act it, and hence, if you ask them to do something that requires them to utilise visual knowledge, they can’t. (It is somewhat more complicated than that when you are working with a smart and rehabiliated patient; e.g. the patient knows she can’t see, but she realises her subconscious can, so we you ask her how an object in front of her is oriented, she begins to extend her hand to grab it, watches how her hand rotates and adapts, and deduces from that how the object in front of her is shapes. But it is a slow and vague and indirect process; any engaging with visual stimuli in a complex counterintuitive manner is effectively completely impossible.) Similar, in a partial seizure, you can still engage in subconsciously guided activities—play piano, drive a car, or, a particularly cool example, diagnose patients—but if you run into a stimulus that does not act as expected, you cannot handle in; instead of rationally adapting your action, you repeat it over and over in the same way or with random modifications, get angry, abandon the task. You can’t step back and consider it. Basically, the ability to be conscious seems to be crucial to rational deliberation. It isn’t that intelligence is impossible per se (ants are, to some definition, smart), but that there are crucial rational avenues of reflection missing. E.g. you know how ants engage in ant mills? Or do utterly irrational stuff, like you can mark an ant with a signal that it is dead, and the other ants will carry it to the trash, even while it is squirming violently and clearly not dead? Basically, ants do not stop and go, wait a minute, this is contrary to my predictions for my model, ergo my model is wrong. Let me stop here. Let me make another model. Let me embark on a new course. - This is particularly interesting because animals that are relatively similar to them, e.g. bees, suddenly act very differently; they seem to have attained the minimal consciousness upgrade, and as a result, they are notably more rational. Bees do cool things like… if mid winter, a piece of their shelter breaks off, the bee woken up by this will weak the other bees, and they will patch the hole… and then crucially, they will review the rest of the shelter for further vulnerabilities, and patch those, too. Where in ants, the building of the shelter follows a long application of very simple rules, in bees, it does not. If you set up an obstacle during the building process, the bees will review and alter their design to circumvent it in advance. When bees need a new shelter location, the scout bees individually survey sites, make proposals, survey the sites other have proposed that have been upvoted, downvote sites they have reviewed that have hidden downsides, and ultimately vote collectively for the ideal hive. Like, that is some very, very interesting change happening there because you got a bit of consciousness.
Yes, sentient AI primarly matters to me out of concern that it would be a moral patient. And yes, that needs experiences with valence; in particular, with conscious valence (qualia), not just a rating as nociception. By laptop has nociception (it can detect heat stress and throw on a fan), but that doesn’t make it hurt. I know the subjective difference between the two. I have a reasonable understanding of the behavioural consequences that massively differ between the two. (Nociception responses can make you do fast predictable avoidance, but pain allows you to selectively bear it, albeit to a point, to intelligently deduce from it, to find workarounds. Much more useful ability.) What we still lack is a computational understanding of how the difference is generated in brains, to be able to properly compare it to the workings of current AI and be able to pinpoint what is lacking.
I would really like to pinpoint that thing. Because it is crucial either way. If digital twin is making “twins” (they are admittedly abusing the term) of human brains to model mental disease and interventions, they are doing this because they want to avoid harm. An accidentially sentient model would break that whole approach. But also vice versa; I am personally very invested in uploading, and a destructive uploading into an AI that fails to be sentient would be plain murder. Regardless which result you want, you need to be sure.
I would really like to understand better how rewards in machine learning work at a technical and meta level, so I can compare that structurally to how nociception and pain work on humans, in the hopes that that will help me pinpoint the difference. You seem to know your way around here, do you have any pointers on how I could get a better understanding? Visuals, metaphors, simpler coding examples, systems to interact with, a textbook or code guide for beginners that focusses on understanding rather than specific applications?
On a cross country train, so delays and brevity for the next several days. This comment is just learning resources, I will reply to the other stuff later.
A good textbook, although very formal and slightly incomplete, is Sutton and barto. http://incompleteideas.net/book/the-book-2nd.html . Fun fact: the first author has perhaps the most terrifying AI tweet of all time: https://twitter.com/RichardSSutton/status/1575619651563708418 . If you want something friendlier than that, I’m not entirely sure what the best resource is, but I can look around.
Another good resource is Steven byrnes’ less wrong sequence on brain like agi; it seems like you know neuro already, but seeing it described by a computer scientist might help you acquire some grounding by seeing stuff you know explained in rl terms.
Deep RL gets fairly technical pretty quickly; probably the most useful algorithms to understand are q-learning and REINFORCE, because most modern stuff is PPO, which is a couple nice hacks on top of REINFORCE. One good way to tame the complexity is to understand that fundamentally, deep RL is about doing RL in a context where your state space is too large to enumerate, and you must use a function approximator. So the two things you need to understand of an algorithm are what it looks like on a small finite mdp (Markov decision process), and what the function approximator looks like. (This slightly glosses over continuous control problems, which are not reducible to a finite mdp, but I stand by it as a principle for learning.)
The q function looks a lot like the circuitry of the basal ganglia (this is covered in more depth by Steven byrnes’ posts). Although actually the basal ganglia are way smarter, more like what are called generalized q functions.
A good project (if you are a project based learner) might be to implement a tabular q learner on the taxi gym environment; this is quite straightforward, and is basically the same math as deep q networks, just in the finite mdp setting. (It would also expose you to how punishingly complex it is to implement even simple RL algorithms in practice; for instance, I think optimistic initialization is crucial to good tabular q learning, which can easily get left out of introductions. )
One important distinction is between model-free and model-based RL. Everything listed above is model free, while human and smarter animal cognition seems like it includes substantial model based components. In model based stuff, you try to represent the structure of the mdp rather than just learning how to navigate it. Mu zero is a good state of the art algorithm; the finite mdp version is basically a more complex version of baum welch, together with dynamic programming to generate optimal trajectories once you know the mdp.
A good less wrong post to read is “models don’t get reward”. It points out a bunch of conceptual errors that people sometimes make when thinking of current RL too analogously to animals.
Thank you so much for writing this out! Will probably have a bunch of follow up questions when I dig deeper, already very grateful.