[Question] Does human (mis)alignment pose a significant and imminent existential threat?

jr23 Feb 2025 10:03 UTC

6 points

AI Inner Alignment Deceptive Alignment Outer Alignment

(This question was born from my comment on a very excellent post, LOVE in a simbox is all you need by @jacob_cannell )

Why am I asking this question?

I am personally very troubled by what I would equate to human misalignment—our deep divisions, our susceptibility to misinformation and manipulation, our inability to identify and act collectively in our best interests. I am further troubled by the deleterious effects that technology has had in that regard already (think social media), and would like to see efforts to not only produce AI that is ethical or aligned itself (which, don’t get me wrong, I LOVE and find very encouraging), but also ensure that AI is being harnessed to offer humans the support they need to realign themselves, which is critical to achieving the ultimate goal of Alignment of the (Humans + AI) Collaboration as a whole.

However, that’s just my current perspective. And while I think I have good reasons for it, I realize I have limitations—both in knowledge and experience, and in my power to effect change. So, I’m curious to hear other perspectives that might help me become more right or just understand other viewpoints, or perhaps connect with others who are like-minded so we can figure out what we might be able to do about it together.

How is this practical?

If others here share my concerns and believe it is a significant threat that warrants action, I will likely have follow-on questions for discussion toward that end. For instance, I’d love to hear if there are efforts already being made that address my concerns. Or perhaps if anyone thinks that creating and deploying aligned AI will naturally help humans overcome those issues, I’d be curious to hear their thoughts. I have some ideas of my own too, but I’ll save those at least until I’ve done a lot more listening and understanding first to establish some mutual understanding and trust.

jr23 Feb 2025 10:03 UTC

6 points

3 comments1 min readLW link

AI Inner Alignment Deceptive Alignment Outer Alignment

Dave Orr 23 Feb 2025 21:36 UTC
3 points
0
Humans have always been misaligned. Things now are probably significantly better in terms of human alignment than almost any time in history (citation needed) due to high levels of education and broad agreement about many things that we take for granted (e.g. the limits of free trade are debated but there has never been so much free trade). So you would need to think that something important was different now for there to be some kind of new existential risk.
One candidate is that as tech advances, the amount of damage a small misaligned group could do is growing. The obvious example is bioweapons—the number of people who could create a lethal engineered global pandemic is steadily going up, and at some point some of them may be evil enough to actually try to do it.
This is one of the arguments in favor of the AGI project. Whether you think it’s a good idea probably depends on your credences around human-caused xrisks versus AGI xrisk.
- jr 24 Feb 2025 2:22 UTC
  1 point
  0
  Parent
  Thanks so much for your thoughts Dave.
  I agree humans have always been misaligned, and that in many ways we have made significant advancements in alignment over long time frames. However, I think few would deny that any metric approximating alignment would be quite volatile over shorter time frames or specific populations, which creates periods of greater risk.
  
  I agree that there must be something new that increases that existential risk to justify significant concern. You identified bioweapons as one example, which I agree is a risk, but not the specific one I am concerned about.
  The new factors I am concerned about are:
  - The vastly increased ease and ability of small groups of misaligned actors to significantly alter, manipulate, or undermine large numbers of other humans’ capacities for alignment. This seems largely tied to social media. As evidence, I would point to the sharp increase in social divisions in the US in recent years.
  - The introduction of AI that allows individuals to project their misaligned will and power without having to involve or persuade other individuals who previously would have exerted some degree of influence toward realignment
  It seems to be putting the cart before the horse to be spending so much time, money, effort, and thought on AI Alignment, while our alignment as humans is so poor. In my mind, understanding the nature and roots of our misalignment, and identifying how to use technology to increase our alignment rather than undermine it, seems to me to be an obvious prerequisite (or co-requisite, at least) to being able to trust ourselves to use powerful AI in ways that don’t decrease alignment. While recent years may have presented conditions that were especially effective at exploiting vulnerabilities in our capacities for maintaining alignment, those vulnerabilities have always been and always will be a risk, so we will always be the weakest link in the Alignment equation until we put serious effort into elevating ourselves to the same standards we expect to hold AI to.
  Just to clarify, I am not at all suggesting putting less effort into AI alignment. Just proposing that perhaps putting more effort into human alignment would be wise, and likely mutually beneficial in conjunction with AI Alignment efforts. ^[1]
  This is one of the arguments in favor of the AGI project.
  Could you please explain (or point me to) the specific argument in favor of the AGI project that you had in mind here, so I don’t risk making incorrect assumptions? I apologize I’m not as familiar with other perspectives as I’d like to be yet. Also, I’d love to hear your take on my additional thoughts.
  Thank you for engaging, I find the dialogue very helpful.
  1. ^
    I acknowledge there are likely efforts to improve human alignment that I am unaware of, so my intuitive assessment of a deficit may be inaccurate.
  - jr 24 Feb 2025 4:39 UTC
    2 points
    0
    Parent
    TL;DR—I just recalled Google Jigsaw, which might be one effort to address my concerns. I would love to hear your thoughts on it if you are familiar.
    (Read on below if you prefer the nuance)
    As I was just rereading and making minor edits to my comment, and considering whether I needed to substantiate my statement about social divisions, I recalled that I had discovered Google’s Project Jigsaw in March 2024, which seemed to be working on various smaller-scale initiatives intended to address these concerns using technology. When I checked it out again just now, I see they shifted their focus over the summer, which seems to be another positive step toward addressing my concerns. Particularly this:
    Over the past year, Jigsaw has been exploring how to make large-scale online conversations, particularly online deliberations, more impactful and scalable, and to facilitate their use in a wider array of contexts.
    Working with that team would be as close as I could imagine to a dream job, ^[1] and I believe I might be able to bring significant value. If you know anything about it, I’d love to hear your take on the work they are doing, and whether/how it might relate to the current discussion. Thanks!
    ^
    It occurs to me I said something similar to Rohin Shah recently, which was completely sincere, but honestly Jigsaw is likely an even stronger mutual match. When I reflect on how Jigsaw could have possibly fallen off my radar, I have to admit it has been an extremely stressful year, there had been no open positions, and it appeared all positions were on-site in New York (a difficult proposition for my wife and kids), so I had forced myself to relegate that to perhaps a longer-term goal.

No comments.