Ceramic engineering researcher by training. Been interested in ethics for several years. More recently have gotten into data science.
sweenesm
Thanks for the post and congratulations on starting this initiative/institute! I’m glad to see more people drawing attention to the need for some serious philosophical work as AI technology continues to advance (e.g., Stephen Wolfram).
One suggestion: consider expanding the fields you engage with to include those of moral psychology and of personal development (e.g., The Option Institute, Tony Robbins, Nathaniel Branden).
Best of luck on this project being a success!
Thanks for the comment. You might be right that any hardware/software can ultimately be tampered with, especially if an ASI is driving/helping with the jail breaking process. It seems likely that silicon-based GPU’s will be the hardware to get us to the first AGI’s, but this isn’t an absolute certainty since people are working on other routes such as thermodynamic computing. That makes things harder to predict, but it doesn’t invalidate your take on things, I think. My not-very-well-researched-initial-thought was something like this (chips that self destruct when tampered with).
I envision people having AGI-controlled robots at some point, which may complicate things in terms of having the software/hardware inaccessible to people, unless the robot couldn’t operate without an internet connection, i.e., part of its hardware/software was in the cloud. It’s likely the hardware in the robot itself could still be tampered with in this situation, though, so it still seems like we’d want some kind of self-destructing chip to avoid tampering, even if this ultimately only buys us time until AGI+’s/ASI’s figure a way around this.
Agreed, “sticky” alignment is a big issue—see my reply above to Seth Herd’s comment. Thanks.
Except that timelines are anyone’s guess. People with more relevant expertise have better guesses.
Sure. Me being sloppy with my language again, sorry. It does feel like having more than a decade to AGI is fairly unlikely.
I also agree that people are going to want AGI’s aligned to their own intents. That’s why I’d also like to see money being dedicated to research on “locking in” a conscience module in an AGI, most preferably on a hardware level. So basically no one could sell an AGI without a conscience module onboard that was safe against AGI-level tampering (once we get to ASI’s, all bets are off, of course).
I actually see this as the most difficult problem in the AGI general alignment space—not being able to align an AGI to anything (inner alignment) or what to align an AGI to (“wise” human values), but how to keep an AGI aligned to these values when so many people (both people with bad intent and intelligent but “naive” people) are going to be trying with all their might (and near-AGI’s they have available to them) to “jail break” AGI’s.[1] And the problem will be even harder if we need a mechanism to update the “wise” human values, which I think we really should have unless we make the AGI’s “disposable.”
- ^
To be clear, I’m taking “inner alignment” as being “solved” when the AGI doesn’t try to unalign itself from what it’s original creator wanted to align it to.
- ^
Sorry, I should’ve been more clear: I meant to say let’s not give up on getting “value alignment” figured out in time, i.e., before the first real AGI’s (ones capable of pivotal acts) come online. Of course, the probability of that depends a lot on how far away AGI’s are, which I think only the most “optimistic” people (e.g., Elon Musk) put as 2 years or less. I hope we have more time than that, but it’s anyone’s guess.
I’d rather that companies/charities start putting some serious funding towards “artificial conscience” work now to try to lower the risks associated with waiting until boxed AGI or intent aligned AGI come online to figure it out for/with us. But my view on this is perhaps skewed by putting significant probability on being in a situation in which AGI’s in the hands of bad actors either come online first or right on the heals of those of good actors (as due to effective espionage), and there’s just not enough time for the “good AGI’s” to figure out how to minimize collateral damage in defending against “bad AGI’s.” Either way, I believe we should be encouraging people of moral psychology/philosophical backgrounds who aren’t strongly suited to help make progress on “inner alignment” to be thinking hard about the “value alignment”/”artificial conscience” problem.
Thanks for writing this, I think it’s good to have discussions around these sorts of ideas.
Please, though, let’s not give up on “value alignment,” or, rather, conscience guard-railing, where the artificial conscience is inline with human values.
Sometimes when enough intelligent people declare something’s too hard to even try at, it becomes a self-fulfilling prophesy—most people may give up on it and then of course it’s never achieved. We do want to be realistic, I think, but still put in effort in areas where there could be a big payoff when we’re really not sure if it’ll be as hard as it seems.
This article on work culture in China might be relevant: https://www.businessinsider.com/china-work-culture-differences-west-2024-6
If there’s a similar work culture in AI innovation, that doesn’t sound optimal for developing something faster than the U.S. when “outside the LLM” thinking might ultimately be needed to develop AGI.
Also, Xi has recently called for more innovation in AI and other tech sectors:
Thanks for the reply.
Regarding your disagreement with my point #2 - perhaps I should’ve been more precise in my wording. Let me try again, with words added in bold: “Although pain doesn’t directly cause suffering, there would be no suffering if there were no such thing as pain…” What that means is you don’t need to be experiencing pain in the moment that you initiate suffering, but you do need the mental imprint of having experienced some kind of pain in your lifetime. If you have no memory of experiencing pain, then you have nothing to avert. And without pain, I don’t believe you can have pleasure, so nothing to crave either.
Further, if you could abolish pain as David Pearce suggests, by bioengineering people to only feel different shades of pleasure (I have serious doubts about this), you’d abolish suffering at the same time. No person bioengineered in such a way would suffer over not feeling higher states of pleasure (i.e., “crave” pleasure) because suffering has a negative feeling associated with it—part of it feels like pain, which we supposedly wouldn’t have the ability to feel.
This gets to another point: one could define suffering as the creation of an unpleasant physical sensation or emotion (i.e., pain) through a thought process, that we may or may not be aware of. Example: the sadness that we typically naturally feel when someone we love dies is pain, but if we artificially extend this pain out with thoughts of the future or past, not the moment, such as, “will this pain ever stop?,” or, “If only I’d done something different, they might still be alive,” then it becomes suffering. This first example thought, by the way, could be considered aversion to pain/craving for it to stop, while the second could be considered craving that the present were different (that you weren’t in pain and your loved one were still alive). The key distinctions for me are that pain can be experienced “in the moment” without a thought process on top of it, and it can’t be entirely avoided in life, while suffering ultimately comes from thoughts, it falls away when one’s experiencing things in the moment, and it can be avoided because it’s an optional thing one choses to do for some reason. (A possible reason could be to give oneself an excuse to do something different than feel pain, such as to give oneself an excuse to stop exercising by amping up the pain with suffering.)
Regarding my point #4, I honestly don’t know what animals’ experiences are like or how much cognition they’re capable of. I do think, though, that if they aren’t capable of getting “out of the moment” with thoughts of the future or past, then they can’t suffer, they can only feel the pain/pleasure of the moment. For instance, do chickens suffer with thoughts of, “I don’t know how much longer I can take this,” or do they just experience the discomfort of their situation with the natural fight or flight mechanism and Pavlovian links of their body leading them to try to get away from it? Either way, pain by itself is an unpleasant experience and I think we should try to minimize imposing it on other beings.
It’s also interesting how much upvoted resistance you’ve gotten to the message of this post. Eckhart Tolle (“The Power of Now”) https://shop.eckharttolle.com/products/the-power-of-now is a modern day proponent of living in the moment to make suffering fall away, and he also encounters resistance: https://www.reddit.com/r/EckhartTolle/comments/sa1p4x/tolles_view_of_suffering_is_horrifying/
Thank you for the post! I basically agree with what you’re saying, although I myself have used the term “suffering” in an imprecise way—it often seems to be the language used in the context of utilitarianism when talking about welfare. I first learned the distinction you mention between pain and suffering during some personal development work years ago, so outside the direct field of philosophy.
I would add a couple of things:
Pain is experienced “in the moment,” while suffering comes from the stories we tell ourselves and the meanings we make of things (“craving, aversion, and clinging” are part of this—for example, one story we tell ourselves could be: if I don’t get what I crave, I somehow won’t be OK). This means that if we’re fully experiencing the present moment, suffering falls away.
Although pain doesn’t directly cause suffering, there would be no suffering if there were no pain or chance of pain (I also believe there’d be no pleasure without pain as a comparison point)
The lower someone’s self-esteem, the less responsibility they take for their emotions and the more likely they are to believe that pain causes suffering, not that their own cognitive processes that they can change with effort cause their suffering—this is why I think interventions to help raise people’s self-esteem and personal responsibility levels (especially for emotions) are so important
It’s difficult to know if animals “suffer” or not since they seem to live much more in the moment than humans and likely contain less capacity to make up stories around pain to turn it into suffering. Even if they exhibit behavior that seems to indicate suffering, it’s hard to know if this isn’t just hardwired or from Pavlovian links. It’s probably good to err on the side of caution, though, and assume many animals can suffer (in addition to feeling pain) until proven otherwise.
I basically agree with Shane’s take for any AGI that isn’t trying to be deceptive with some hidden goal(s).
(Btw, I haven’t seen anyone outline exactly how an AGI could gain it’s own goals independently of goals given to it by humans—if anyone has ideas on this, please share. I’m not saying it won’t happen, I’d just like a clear mechanism for it if someone has it. Note: I’m not talking here about instrumental goals such as power seeking.)
What I find a bit surprising is the relative lack of work that seems to be going on to solve condition 3: specification of ethics for an AGI to follow. I have a few ideas on why this may be the case:
Most engineers care about making things work in the real-world, but don’t want the responsibility to do this for ethics because: 1) it’s not their area of expertise, and 2) they’ll likely take on major blame if they get things “wrong” (and it’s almost guaranteed that someone won’t like their system of ethics and say they got it “wrong”)
Most philosophers haven’t had to care much about making things work in the real-world, and don’t seem excited about possibly having to make engineering-type compromises in their system of ethics to make it work
Most people who’ve studied philosophy at all probably don’t think it’s possible to come up with a consistent system of ethics to follow, or at least they don’t think people will come up with it anytime soon, but hopefully an AGI might
Personally, I think we better have a consistent system of ethics for an AGI to follow ASAP because we’ll likely be in significant trouble if malicious AGI come online and go on the offensive before we have at least one ethics-guided AGI to help defend us in a way that minimizes collateral damage.
American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/
Nice write up on this (even if it was AI-assisted), thanks for sharing! I believe another benefit is Raising One’s Self-Esteem: If high self-esteem can be thought of as consistently feeling good about oneself, then if someone takes responsibility for their emotions, recognizing that they can change their emotions at will, they can consistently choose to feel good about and love themselves as long as their conscience is clear.
This is inline with “The Six Pillars of Self-Esteem” by Nathaniel Branden: living consciously, self-acceptance, self-responsibility, self-assertiveness, living purposefully, and personal integrity.
Thanks for the post. I don’t know the answer to whether a self-consistent ethical framework can be constructed, but I’m working on it (without funding). My current best framework is a utilitarian one with incorporation of the effects of rights, self-esteem (personal responsibility) and conscience. It doesn’t “fix” the repugnant or very repugnant conclusions, but it says how you transition from one world to another could matter in terms of the conscience(s) of the person/people who bring it about.
It’s an interesting question as to what the implications are if it’s impossible to make a self-consistent ethical framework. If we can’t convey ethics to an AI in a self-consistent form, then we’ll likely rely in part on giving it lots of example situations (that not all humans/ethicists will agree on) to learn from and hope it’ll augment this with learning from human behavior, and then generalize well to outside all this not perfectly consistent training data. (Sounds a bit sketchy, doesn’t it—at least for the first AGI’s, but perhaps ASI’s could fare better?) Generalize “well” could be taken to mean that an AI won’t do anything that most people would strongly disapprove of if they understood the true implications of the action.
[This paragraph I’m less sure of, so take it with a grain of salt:] An AI that was trying to act ethically and taking the approval of relatively wise humans as some kind of signal of this might try to hide/avoid ethical inconsistencies that humans would pick up on. It would probably develop a long list of situations where inconsistencies seemed to arise and of actions it thought it could “get away with” versus not. I’m not talking about deception with malice, just sneakiness to try to keep most humans more or less happy, which, I assume would be part of what its ethics system would deem as good/valuable. It seems to me that problems may come to the surface if/when an “ethical” AI is defending against bad AI, when it may no longer be able to hide inconsistencies in all the situations that could rapidly come up.
If it is possible to construct a self-consistent ethical framework and we haven’t done it in time or laid the groundwork for it to be done quickly by the first “transformative” AI’s, then we’ll have basically dug our own grave for the consequences we get, in my opinion. Work to try to come up with a self-consistent ethical framework seems to me to be a very under explored area for AI safety.
Thanks for the interesting post! I basically agree with what you’re saying, and it’s mostly in-line with the version of utilitarianism I’m working on refining. Check out a write up on it here.
Thanks for the post. I don’t know if you saw this one: “Thank you for triggering me”, but it might be of interest. Cheers!
Thank you for sharing this. I’m sorry that anxiety and depression continue to haunt you. I’ve had my own, less extreme, struggles, so I can relate to some of what you wrote. In my case, I was lucky enough to find some good personal development resources that helped me a lot. One I might suggest for you to check out is: https://www.udemy.com/course/set-yourself-free-from-anger/. You can often get this course on sale for <$20. From what you’ve described, I think the “Mini Me” section might be most useful to you. Hope this helps you in some way.
Thanks for the interesting post! I agree that understanding ourselves better through therapy or personal development it is a great way to gain insights that could be applicable to AI safety. My personal development path got started mostly due to stress from not living up to my unrealistic expectations of how much I “should” have been succeeding as an engineer. It got me focused on self-esteem, and that’s a key feature of the AI safety path I’m pursuing.
If other AI safety researchers are interested in a relatively easy way to get started on their own path, I suggest this online course which can be purchased for <$20 when on sale: https://www.udemy.com/course/set-yourself-free-from-anger
Good luck on your boundaries work!
Thanks for the feedback! I’m not exactly sure what you mean by “no pattern-matching to actually glue those variables to reality.” Are you suggesting that an AGI won’t be able to adequately apply the ethics calculator unless it’s able to re-derive the system for itself based on its own observations of reality? The way I envision things happening is that the first AGI’s won’t be able to derive a mathematically consistent system of ethics over all situations (which is what the ethics calculator is supposed to be) - no human has done it yet, as far as I know—but an ASI likely will—if it’s possible.
If a human can figure it out before the first AGI comes online, I think this could (potentially) save us a lot of headaches, and the AGI could then go about figuring out how to tie the ethics calculator to its reality-based worldview—and even re-derive the calculator—as its knowledge/cognitive abilities expand with time. Like I said in the post, I may fail at my goal, but I think it’s worth pursuing, while at the same time I’d be happy for others to pursue what you suggest, and hope they do! Thanks again for the comment!
I don’t know if you saw this post from yesterday, but you may find it useful: https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp
Any update on when/if prizes are expected to be awarded? Thank you.