LessWrong team member / moderator. I’ve been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I’ve been interested in improving my own epistemic standards and helping others to do so as well.
Raemon
Anthropic, and taking “technical philosophy” more seriously
I would bet they are <1% of the population. Do you disagree, or think they disproportionately matter?
I’m skeptical that there are actually enough people so ideologically opposed to this, that it outweighs the upside of driving home that capabilities are advancing, through the medium itself. (similar to how even though tons of people hate FB, few people actually leave)
I’d be wanting to target a quality level similar to this:
One of the things I track are “ingredients for a good movie or TV show that would actually be narratively satisfying / memetically fit,” that would convey good/realistic AI hard sci-fi to the masses.
One of the more promising strategies withint that I can think of is “show multiple timelines” or “flashbacks from a future where the AI wins but it goes slowly enough to be human-narrative-comprehensible” (with the flashbacks being about the people inventing the AI).
This feels like one of the reasonable options for a “future” narrative. (A previous one I was interested in was the Green goo is plausible concept)
Also, I think many Richard Ngo stories would lend themselves well to being some kind of cool youtube video, leveraging AI generated content to make things feel higher budget and also sending an accompanying message of “the future is coming, like now.” (King and the Golem was nice but felt more like a lecture than a video, or something). A problem with AI generated movies is that the tech’s not there yet for it not being slightly uncanny, but I think Ngo stories have a vibe where the uncanniness will be kinda fine.
I also kinda thought this. I actually thought it sounded sufficiently academic that I didn’t realize at first it was your org, instead of some other thing you were supporting.
LW moderators have a policy of generally rejecting LLM stuff, but some things slip through cracks. (I think maybe LLM writing got a bit better recently and some of the cues I used are less reliable now, so I may have been missing some)
Curated. This was one of the more interesting results from the alignment scene in awhile.
I did like Martin Randall’s comment distinguishing “alignment” from “harmless” in the Helpful/Harmless/Honest sense (i.e. the particular flavor of ‘harmlessness’ that got trained into the AI). I don’t know whether Martin’s particular articulation is correct for what’s going on here, but in general it seems important to track that just because we’ve identified some kind of vector, that doesn’t mean we necessarily understand what that vector means. (I also liked that Martin gave some concrete predictions implied by his model)
@kave @habryka
main bottleneck to counterfactuality
I don’t think the social thing ranks above “be able to think useful important thoughts at all”. (But maybe otherwise agree with the rest of your model as an important thing to think about)
[edit: hrm, “for smart people with a strong technical background” might be doing most of the work here”]
It seems good for me to list my predictions here. I don’t feel very confident. I feel an overall sense of “I don’t really see why major conceptual breakthroughs are necessary.” (I agree we haven’t seen, like, an AI do something like “discover actually significant novel insights.”)
This doesn’t translate into me being confident in very short timelines, because the remaining engineering work (and “non-major” conceptual progress) might take a while, or require a commitment of resources that won’t materialize before a hype bubble pops.
But:
a) I don’t see why novel insights or agency wouldn’t eventually fall out of relatively straightforward pieces of:
“make better training sets” (and training-set generating processes)
“do RL training on a wide variety of tasks”
“find some algorithmic efficiency advances that, sure, require ‘conceptual advances’ from humans, but of a sort of straightforward kind that doesn’t seem like it requires deep genius?”
b) Even if A doesn’t work, I think “make AIs that are hyperspecialized at augmenting humans doing AI research” is pretty likely to work, and that + just a lot of money/attention generally going into the space seems to increase the likelihood of it hitting The Crucial AGI Insights (if they exist) in a brute-force-but-clever kinda way.
Assembling the kind of training sets (or, building the process that automatedly generates such sets) you’d need to do the RL seems annoyingly-hard but not genius-level hard.
I expect there to be a couple innovations that are roughly on the same level as “inventing attention” that improve efficiency a lot, but don’t require a deep understanding of intelligence.
One thing is I’m definitely able to spin up side projects that I just would not have been able to do before, because I can do them with my “tired brain.”
Some of them might turn out to be real projects, although it’s still early stage.
My current guess is:
1. This is more relevant for up-to-the first couple generations of “just barely superintelligent” AIs.
2. I don’t really expect it to be the deciding factor after many iterations of end-to-end RSI that gets you to the “able to generate novel scientific or engineering insights much faster than a human or institution could.”
I do think it’s plausible that the initial bias towards “evil/hackery AI” could start it off in a bad basin of attraction, but a) even if you completely avoided that, I would still basically expect this to rediscover this on it’s own as it gained superhuman levels of competence, b) one of the things I most want to use a slightly-superhuman AI to do is to robustly align massively superhuman AI, and I don’t really see how to do that without directly engaging with the knowledge of the failure modes there.
I think there are other plans that route more though “use STEM AI to build an uploader or bioenhancer, and then have an accelerated human-psyche do the technical philosophy necessary to handle the unbounded alignment case. I could see that being the right call, and I could imagine the bias from the “already knows about deceptive alignment etc” being large-magnitude enough to matter in the initial process. [edit: In those cases I’d probably want to filter out a lot more than just “unfriendly AI strategies”]But, basically, how this applies depends on what it is you’re trying to do with the AI, and what stage/flavor of AI you’re working with and how it’s helping.
Yep, thank you!
It’d be nice to have the key observations/evidence in the tl;dr here. I’m worried about this but would like to stay grounded in how bad it is exactly.
I think I became at least a little wiser reading this sentence. I know you’re mostly focused on other stuff but I think I’d benefit from some words connecting more of the dots.
I think the Gears Which Turn The World sequence, and Specializing in Problems We Don’t Understand, and some other scattered John posts I don’t remember as well, are a decent chunk of an answer.
Curated. I found this a clearer explanation of “how to think about bottlenecks, and things that are not-especially-bottlenecks-but-might-be-helpful” than I previously had.
Previously, I had thought about major bottlenecks, and I had some vague sense of “well, there definitely seems like there should be more ways to be helpful than just tackling central bottlenecks, but a lot of ways to do that misguidedly.” But I didn’t have any particular models for thinking about it, and I don’t think I could have explained it very well.
I think there are better ways of doing forward-chaining and backward-chaining than listed here (but which roughly correspond to “the one who thought about it a bit,” but with a bit more technique for getting traction).
I do think the question of “to what degree is your field shaped like ‘there’s a central bottleneck that is to a first approximation the only thing that matters here’?” is an important question that hasn’t really been argued for here. (I can’t recall offhand if John has previously written a post exactly doing that in those terms, although the Gears Which Turn the World sequence is at least looking at the same problemspace)
Update: In a slack I’m in, someone said:
A friend of mine who works at US AISI advised:
> “My sense is that relevant people are talking to relevant people (don’t know specifics about who/how/etc.) and it’s better if this is done in a carefully controlled manner.”And another person said:
Per the other thread, a bunch of attention on this from EA/xrisk coded people could easily be counterproductive, by making AISI stick out as a safety thing that should be killed
And while I don’t exactly wanna trust “the people behind the scenes have it handled”, I do think the failure mode here seems pretty real.
Cruxes and Questions
The broad thrust of my questions are:
Anthropic Research Strategy
Does Anthropic building towards automated AGI research make timelines shorter (via spurring competition or leaking secrets)
...or, make timelines worse (by inspiring more AI companies or countries to directly target AGI, as opposed to merely trying to cash in on the current AI hype)
Is it realistic for Anthropic to have enough of a lead to safely build AGI in a way that leads to durably making the world safer?
“Is Technical Philosophy actually that big a deal?”
Can there be pivotal acts that require high AI powerlevels, but not unboundedly high, in a reasonable timeframe, such that they’re achievable without solving The Hard Parts of robust pointing?
Governance / Policy Comms
Is it practical for a western coalition to stop the rest of the world (and, governments and other major actors within the western coalition) from building reckless or evil AI?