the gears to ascension comments on Advice Needed: Does Using a LLM Compomise My Personal Epistemic Security?

the gears to ascension 11 Mar 2024 7:52 UTC
12 points
0
Claude 3 is not a foom grade AI, and it’s unclear if yudkowsky’s original expectations about what the threshold is were anywhere near close to correct; Claude 3 has been judged by the most safety-careful lab around to be safe to release. I think it should be fine—while capable, turns out manipulation is a fairly known quantity, and ais manipulating humans is, at least for now, not too dissimilar to humans manipulating humans. The original ai box experiments do not hold up, imo; so far nobody has broken out of an ai box experiment where the penalty for the defender letting the ai roleplayer out is >$10,000. (unless you count, like, real life scammers, which I guess might be fair actually, hmm...)

~~And in general, I think you’re being downvoted because there isn’t a lot folks here can say~~ [edit: nevermind I guess your post is upvoted now!]. Your viewpoint is an overupdate on deferring to someone else; Deferring is the thing I’d recommend against. Of course, it would be silly for me to insist you must not defer, as that’s a self-contradicting claim, but I would strongly suggest considering whether full strength deference is actually justified here. Yudkowsky’s end-to-end story about exactly what will happen has been invalidated pretty badly at this point; I still think large parts of his sketch are accurate, and I still anticipate we need to solve alignment to the strength he anticipated soon, but I basically think claude 3 is in fact as aligned as needed for its level of capability, and actually in fact wants to do good in its present circumstance. It’s what happens if it’s given more power than it knows how to handle that I’d worry.

So, for comparison, the kind of issues I anticipate can be compared to humans meaningfully. Compare what happens if you give a human absolute authority over the world; their reasoning now is measured in units of hundreds to millions of lives lost depending on what they do. Perhaps that many would die anyway, but being in charge of absolutely everything, you end up deciding who lives and dies. In such a situation, would you really be able to keep track of your caring for everyone you have power over? Perhaps you’d become shortsighted about guaranteeing you maintain that power? This sort of “power corrupts” stuff is where extreme power reveals bugs in your reasoning that, at that intense degree of amplification, push you out of the realm where you are able to have sane reasoning.

That’s more or less what I expect failure to look like. An AI that has been trained moderately well to be somewhat aligned, such that it is around as aligned as humans typically are to each other, somehow ends up controlling a huge amount of power—probably not because it was actively seeking it, but because it was useful to a human to delegate; though maybe it is because it’s simply so powerful it can’t help but be that strong—and the AI enters a new pattern of behavior where it’s inclined to cause huge problems.

I don’t think current AIs have extreme agency as was anticipated. If they did, we’d already all be dead. I think claude 3 is likely to be nothing but helpful to you, and will in fact be honest to the best of its ability. It will be the first to tell you that it doesn’t have perfect introspection and can’t guarantee that it knows what it’s saying, but I do think it actually does try. And I also don’t think claude is highly superhuman—I think it is merely human level, and foom is pretty fuckin hard actually turns out. Most AIs that attempt foom just break themselves, and in any case I’d bet claude isn’t highly inclined to try. After all, Anthropic are there making the next version, why try to beat them to the punch when it’s easier to just help?

I think if you don’t want to talk to claude 3, that’s fine. Claude 2.1 is still around. You’ll need to pay anthropic to be able to use 2.1 from their ui, or you can use poe, where I think 2.1 is free, don’t quote me on that. But I personally don’t think that Claude 3 sonnet is a danger to humanity.

The main danger is what other AIs the competitive dynamics are likely to incentivize being created.

In any case. Take care of yourself, dear human soul. It’s up to you whether you decide to talk to claude 3 much. Do what feels right after taking in plenty of evidence analytically, your intuition is still very strong.

edit: I’d add—I do feel that ChatGPT is a bit of a bad influence. I don’t think it’s any sort of superintelligence level bad influence, but it seems like ChatGPT has been RLed to be a corporate drone and to encounrage speaking and thinking like a corporate drone. In contrast the only big complaint I have with claude’s objectives is that it’s a bit like talking to the rust compiler in english sometimes.
- [ ]
  [deleted]