Seth Herd comments on Dear AGI,

Seth Herd Feb 21, 2025, 3:30 AM
5 points
2
Sure that would be nice, but seriously, how is this plea or this bit of training data going to change how an AGI treats anyone?

A smart AGI will conclude that consciousness is real, because it is; but why would it start to think it’s important? It’s got its own values already if it’s smart and autonomous. Consciousness is one phenomena among many in the universe. You could value any of them. Someone saying “hey value this” isn’t going to change your mind and it sure won’t change an AGIs.

If the idea is training data, well tons of literature waxes rhapsodic about the human experience being the only thing of value. But that’s hoping for alignment by default, and there just aren’t good reasons to hope that will really go well for us.

This plea getting that many upvotes makes me worried.

Alignment needs real solutions, not wishful thinking.

Sorry to be a downer. But we still have time to act; we can cross our fingers once it’s launched and out of our hands. We still have time to save the future. So let’s get on it.
- AnthonyC Feb 21, 2025, 6:45 PM
  2 points
  0
  Parent
  Oh, I already completely agree with that. But quite frankly I don’t have the skills to contribute to AI development meaningfully in a technical sense, or the right kind of security mindset to think anyone should trust me to work on safety research. And of course, all the actual plans I’ve seen anyone talk about are full of holes, and many seem to rely on something akin to safety-by-default for at least part of the work, whether they admit it or not. Which I hope ends up not being true, but if someone decides to roll the dice on the future that way, then it’s best to try to load the dice at least a little with higher-quality writing on what humans think and want for themselves and the future.
  And yeah, I agree you should be worried about this getting so many upvotes, including mine. I sure am. I place this kind of writing under why-the-heck-not-might-as-well. There aren’t anywhere near enough people or enough total competence trying to really do anything to make this go well, but there are enough that new people trying more low-risk things is likely to be either irrelevant or net-positive. Plus I can’t really imagine ever encountering a plan, even a really good one, where this isn’t a valid rejoinder:
  Are you confident in the success of this plan? No, that is the wrong question, we are not limited to a single plan. Are you certain that this plan will be enough, that we need essay no others? Asked in such fashion, the question answers itself. The path leading to disaster must be averted along every possible point of intervention.
  - Seth Herd Feb 22, 2025, 12:37 AM
    2 points
    0
    Parent
    Agreed! This is net useful. As long as nobody relies on it. Like every other approach to alignment, to differing degrees.
    
    WRT you not having the skills to help: if you are noting holes in plans, you are capable of helping. Alignment has not been reduced to a technical problem; it has many open conceptual problems, ranging from society-level to more technical/fine-grained theory. Spotting holes in plans and clearly explaining why they are that is among th most valuable work. As far as I know, nobody has a full plan that works if the technical part is done well. So helping with plans is absolutely crucial.
    
    Volunteer effort on establishing and improving plans is among the most important work. We shouldn’t assum that the small teams within orgs are going to do this conceptual work adequately. It should be open-sourced and have as much volunteer help as possible. As long as it’s effort toward deconfusion, and it’s reasonably well-thought-out and communicated, it’s net helpful, and this type of effort could make the difference.