Theoretical AI alignment (and relevant upskilling) in my free time. My current view of the field is here (part 1) and here (part 2).
Genderfluid (differs on hour/day-ish timescale.). It’s not a multiple-personality thing.
Theoretical AI alignment (and relevant upskilling) in my free time. My current view of the field is here (part 1) and here (part 2).
Genderfluid (differs on hour/day-ish timescale.). It’s not a multiple-personality thing.
For more details on (the business side of) a potential AI crash, see recent articles by the blog Where’s Your Ed At, which wrote the sorta-well-known post “The Man Who Killed Google Search”.
For his AI-crash posts, start here and here and click on links to his other posts. Sadly, the author falls into the trap of “LLMs will never get to reasoning because they don’t, like, know stuff, man”, but luckily his core competencies (the business side, analyzing reporting) show why an AI crash could still very much happen.
Further context on the Scott Adams thing lol: He claims to have taken hypnosis lessons decades ago and has referred to using it multiple times. His, uh, personality also seems to me like it’d be more susceptible to hypnosis than average (and even he’d probably admit this in a roundabout way).
I think deeply understanding top tier capabilities researchers’ views on how to achieve AGI is actually extremely valuable for thinking about alignment. Even if you disagree on object level views, understanding how very smart people come to their conclusions is very valuable.
I think the first sentence is true (especially for alignment strategy), but the second sentence seems sort of… broad-life-advice-ish, instead of a specific tip? It’s a pretty indirect help to most kinds of alignment.
Otherwise, this comment’s points really do seem like empirical things that people could put odds or ratios on. Wondering if a more-specific version of those “AI Views Snapshots” would be warranted, for these sorts of “research meta-knowledge” cruxes. Heck, it might be good to have lots of AI Views Snapshot DLC Mini-Charts, from for-specific-research-agendas(?) to internal-to-organizations(?!?!?!?).
I can’t make this one, but I’d love to be at future LessOnline events when I’m less time/budget-constrained! :)
First link is broken.
“But my ideas are likely to fail! Can I share failed ideas?”: If you share a failed idea, that saves the other person time/effort they would’ve spent chasing that idea. This, of course, speeds up that person’s progress, so don’t even share failed ideas/experiments about AI, in the status quo.
“So where do I privately share such research?” — good question! There is currently no infrastructure for this. I suggest keeping your ideas/insights/research to yourself. If you think that’s difficult for you to do, then I suggest not thinking about AI, and doing something else with your time, like getting into factorio 2 or something.
“But I’m impatient about the infrastructure coming to exist!”: Apply for a possibly-relevant grant and build it! Or build it in your spare time. Or be ready to help out if/when someone develops this infrastructure.
“But I have AI insights and I want to convert them into money/career-capital/personal-gain/status!”: With that kind of brainpower/creativity, you can get any/all of those things pretty efficiently without publishing AI research, working at a lab, advancing a given SOTA, or doing basically (or literally) anything that differentially speeds up AI capabilities. This, of course, means “work on the object-level problem, without routing that work through AI capabilities”, which is often as straightforward “do it yourself”.
“But I’m wasting my time if I don’t get involved in something related to AGI!”: “I want to try LSD, but it’s only available in another country. I could spend my time traveling to that country, or looking for mushrooms, or even just staying sober. Therefore, I’m wasting my time unless I immediately inject 999999 fentanyl.”
How scarce are tickets/”seats”?
I will carefully hedge my investment in this company by giving it $325823e7589245728439572380945237894273489, in exchange for a board seat so I can keep an eye on it.
I have over 5 Twitter followers, I’ll take my board seat when ur ready
Giving up on transhumanism as a useful idea of what-to-aim-for or identify as, separate from how much you personally can contribute to it.
More directly: avoiding “pinning your hopes on AI” (which, depending on how I’m supposed to interpret this, could mean “avoiding solutions that ever lead to aligned AI occurring” or “avoiding near-term AI, period” or “believing that something other than AI is likely to be the most important near-future thing”, which are pretty different from each other, even if the end prescription for you personally is (or seems, on first pass, to be) the same.), separate from how much you personally can do to positively affect AI development.
Then again, I might’ve misread/misinterpreted what you wrote. (I’m unlikely to reply to further object-level explanation of this, sorry. I mainly wanted to point out the pattern. It’d be nice if your reasoning did turn out correct, but my point is that its starting-place seems/seemed to be rationalization as per the pattern.)
Yes, I think this post / your story behind it, is likely an example of this pattern.
That’s technically a different update from the one I’m making. However, I also update in favor of that, as a propagation of the initial update. (Assuming you mean “good enough” as “good enough at pedagogy”.)
This sure does update me towards “Yudkowsky still wasn’t good enough at pedagogy to have made ‘teach people rationality techniques’ an ‘adequately-covered thing by the community’”.
Person tries to work on AI alignment.
Person fails due to various factors.
Person gives up working on AI alignment. (This is probably a good move, when it’s not your fit, as is your case.)
Danger zone: In ways that sort-of-rationalize-around their existing decision to give up working on AI alignment, the person starts renovating their belief system around what feels helpful to their mental health. (I don’t know if people are usually doing this after having already tried standard medical-type treatments, or instead of trying those treatments.)
Danger zone: Person announces this shift to others, in a way that’s maybe and/or implicitly prescriptive (example).
There are, depressingly, many such cases of this pattern. (Related post with more details on this pattern.)
Group Debugging is intriguing...
How many times has someone expressed “I’m worried about ‘goal-directed optimizers’, but I’m not sure what exactly they are, so I’m going to work on deconfusion.”? There’s something weird about this sentiment, don’t you think?
I disagree, and I will take you up on this!
“Optimization” is a real, meaningful thing to fear, because:
We don’t understand human values, or even necessarily meta-understand them.
Therefore, we should be highly open to the idea that a goal (or meta-goal) that we encode (or meta-encode) would be bad for anything powerful to base-level care about.
And most importantly, high optimization power breaks insufficiently-strong security assumptions. That, in itself, is why something like “security mindset” is useful without necessarily thinking of a powerful AI as an “enemy” in war-like terms.
Here “security assumptions” is used in a broad sense, the same way that “writing assumptions” (the ones needed to design a word-processor software) could include seemingly-trivial things like “there is an input device we can access” and “we have the right permissions on this OS”.
Ah, yeah that’s right.
If it helps clarify: I (and some others) break down the alignment problem into “being able to steer it at all” and “what to steer it at”. This post is about the danger of having the former solved, without the latter being solved well (e.g. through some kind of CEV).
EDIT: Due to the incoming administration’s ties to tech investors, I no longer think an AI crash is so likely. Several signs IMHO point to “they’re gonna go all-in on racing for AI, regardless of how ‘needed’ it actually is”.