I haven’t thought deeply about this specific case, but I think you should consider this like any other ablation study—like, what happens if you replace the SAE with a linear probe?
Vaniver
And then a lot of the post seems to make really quite bad arguments against forecasting AI timelines and other technologies, doing so with… I really don’t know, a rejection of bayesianism? A random invocation of an asymmetric burden of proof?
I think the position Ben (the author) has on timelines is really not that different from Eliezer’s; consider pieces like this one, which is not just about the perils of biological anchors.
I think the piece spends less time than I would like on what to do in a position of uncertainty—like, if the core problem is that we are approaching a cliff of uncertain distance, how should we proceed?--but I think it’s not particularly asymmetric.
[And—there’s something I like about realism in plans? If people are putting heroic efforts into a plan that Will Not Work, I am on the side of the person on the sidelines trying to save them their effort, or direct them towards a plan that has a chance of working. If the core uncertainty is whether or not we can get human intelligence advancement in 25 years—I’m on your side of thinking it’s plausible—then it seems worth diverting what attention we can from other things towards making that happen, and being loud about doing that.]
Instead, the U.S. government will do what it has done every time it’s been convinced of the importance of a powerful new technology in the past hundred years: it will drive research and development for military purposes.
I think this is my biggest disagreement with the piece. I think this is the belief I most wish 10-years-ago-us didn’t have, so that we would try something else, which might have worked better than what we got.
Or—in shopping the message around to Silicon Valley types, thinking more about the ways that Silicon Valley is the child of the US military-industrial complex, and will overestimate their ability to control what they create (or lack of desire to!). Like, I think many more ‘smart nerds’ than military-types believe that human replacement is good.
The article seems to assume that the primary motivation for wanting to slow down AI is to buy time for institutional progress. Which seems incorrect as an interpretation of the motivation. Most people that I hear talk about buying time are talking about buying time for technical progress in alignment.
I think you need both? That is—I think you need both technical progress in alignment, and agreements and surveillance and enforcement such that people don’t accidentally (or deliberately) create rogue AIs that cause lots of problems.
I think historically many people imagined “we’ll make a generally intelligent system and ask it to figure out a way to defend the Earth” in a way that I think seems less plausible to me now. It seems more like we need to have systems in place already playing defense, which ramp up faster than the systems playing offense.
My understanding is that the Lightcone Offices and Lighthaven have 1) overlapping but distinct audiences, with Lightcone Offices being more ‘EA’ in a way that seemed bad, and 2) distinct use cases, where Lighthaven is more of a conference venue with a bit of coworking whereas Lightcone Offices was basically just coworking.
By contrast, today’s AIs are really nice and ethical. They’re humble, open-minded, cooperative, kind. Yes, they care about some things that could give them instrumental reasons to seek power (eg being helpful, human welfare), but their values are great
They also aren’t facing the same incentive landscape humans are. You talk later about evolution to be selfish; not only is the story for humans is far more complicated (why do humans often offer an even split in the ultimatum game?), but also humans talk a nicer game than they act (see construal level theory, or social-desirability bias). Once you start looking at AI agents who have similar affordances and incentives that humans have, I think you’ll see a lot of the same behaviors.
(There are structural differences here between humans and AIs. As an analogy, consider the difference between large corporations and individual human actors. Giant corporate chain restaurants often have better customer service than individual proprietors because they have more reputation on the line, and so are willing to pay more to not have things blow up on them. One might imagine that AIs trained by large corporations will similarly face larger reputational costs for misbehavior and so behave better than individual humans would. I think the overall picture is unclear and nuanced and doesn’t clearly point to AI superiority.)
though there’s a big question mark over how much we’ll unintentionally reward selfish superhuman AI behaviour during training
Is it a big question mark? It currently seems quite unlikely to me that we will have oversight systems able to actually detect and punish superhuman selfishness on the part of the AI.
I think it’s hard to evaluate the counterfactual where I made a blog earlier, but I think I always found the built-in audience of LessWrong significantly motivating, and never made my own blog in part because I could just post everything here. (There’s some stuff that ends up on my Tumblr or w/e instead of LW, even after ShortForm, but almost all of the nonfiction ended up here.)
Consider the reaction my comment from three months ago got.
I think being a Catholic with no connection to living leaders makes more sense than being an EA who doesn’t have a leader they trust and respect, because Catholicism has a longer tradition
As an additional comment, few organizations have splintered more publicly than Catholicism; it seems sort of surreal to me to not check whether or not you ended up on the right side of the splintering. [This is probably more about theological questions than it is about leadership, but as you say, the leadership is relevant!]
I don’t think Duncan knows what “a boundary” is.
General Semantics has a neat technology, where they can split out different words that normally land on top of each other. If boundary_duncan is different from boundary_segfault, we can just make each of the words more specific, and not have to worry about whether or not they’re the same.
I’ve read thru your explainer of boundary_segfault, and I don’t see how Duncan’s behavior is mismatched. It’s a limit that he set for himself that defines how he interacts with himself, others, and his environment. My guess is that the disagreement here is that under boundary_segfault, describing you as having “poor boundaries” is saying that your limits are poorly set. (Duncan may very well believe this! Tho the claim that you set them for yourself makes judging the limits more questionable. )
That said, “poor boundaries” is sometimes used to describe a poor understanding or respect of other people’s boundaries. It seems to me like you are not correctly predicting how Duncan (or other people in your life!) will react to your messages and behavior, in a way that aligns with you not accurately predicting their boundaries (or predicting them accurately, and then deciding to violate them anyway).
This isn’t something that I do. This is something that I have done
I don’t understand this combination of sentences. Isn’t he describing the same observations you’re describing?
There is a point here that he’s describing it as a tendency you have, instead of an action that happened. But it sure seems like you agree that it’s an action that happened, and I think he’s licensed to believe that it might happen again. As inferences go, this doesn’t seem like an outlandish one to make.
The friends who know me well know that I am a safe person. Those who have spent even a day around me know this, too!
The comments here seem to suggest otherwise.
You talk about consent as being important to you; let’s leave aside questions of sexual consent and focus just on the questions: did Duncan consent to these interactions? Did Duncan ask you to leave him alone? Did you leave him alone?
I wasn’t sure what search term to use to find a good source on this but Claude gave me this:
I… wish people wouldn’t do this? Or, like, maybe you should ask Claude for the search terms to use, but going to a grounded source seems pretty important to staying grounded.
I think Six Dimensions of Operational Adequacy was in this direction; I wish we had been more willing to, like, issue scorecards earlier (like publishing that document in 2017 instead of 2022). The most recent scorecard-ish thing was commentary on the AI Safety Summit responses.
I also have the sense that the time to talk about unpausing is while creating the pause; this is why I generally am in favor of things like RSPs and RDPs. (I think others think that this is a bit premature / too easy to capture, and we are more likely to get a real pause by targeting a halt.)
While the coauthors broadly agree about points listed in the post, I wanted to stick my neck out a bit more and assign some numbers to one of the core points. I think on present margins, voluntary restraint slows down capabilities progress by at most 5% while probably halving safety progress, and this doesn’t seem like a good trade. [The numbers seem like they were different in the past, but the counterfactuals here are hard to estimate.] I think if you measure by the number of people involved, the effect of restraint is substantially lower; here I’m assuming that people who are most interested in AI safety are probably most focused on the sorts of research directions that I think could be transformative, and so have an outsized impact.
There Should Be More Alignment-Driven Startups
Similarly for the Sierra Club, I think their transition from an anti-immigration org to a pro-immigration org seems like an interesting political turning point that could have failed to happen in another timeline.
From the outside, Finnish environmentalism seems unusually good—my first check for this is whether or not environmentalist groups are pro-nuclear, since (until recently) it was a good check for numeracy.
Note that the ‘conservation’ sorts of environmentalism are less partisan in the US, or at least, are becoming partisan later. (Here’s an article in 2016 about a recent change of a handful of Republicans opposed to national parks, in the face of bipartisan popular support for them.) I think the thing where climate change is a global problem instead of a local problem, and a conflict between academia and the oil industry, make it particularly prone to partisanship in the US. [Norway also has significant oil revenues—how partisan is their environmentalism, and do they have a similar detachment between conservation and climate change concerns?]
I think this is true of an environmentalist movement that wants there to be a healthy environment for humans; I’m not sure this is true of an environmentalist movement whose main goal is to dismantle capitalism. I don’t have a great sense of how this has changed over time (maybe the motivations for environmentalism are basically constant, and so it can’t explain the changes), but this feels like an important element of managing to maintain alliances with politicians in both parties.
(Thinking about the specifics, I think the world where Al Gore became a Republican (he was a moderate for much of his career) or simply wasn’t Clinton’s running mate (which he did in part because of HW Bush’s climate policies) maybe leads to less partisanship. I think that requires asking why those things happened, and whether there was any reasonable way for them to go the other way. The oil-republican link seems quite strong during the relevant timeframe, and you either need to have a strong oil-democrat link or somehow have a stronger climate-republican link, both of which seem hard.)
I get that this is the first post out of 4, and I’m skimming the report to see if you address this, but it sounds like you’re using historical data to try to prove a counterfactual claim. What alternative do you think was possible? (I assume the presence of realistic alternatives is what you mean by ‘not inevitable’, but maybe you mean something else.)
I think the main feature of AI transition that people around here missed / didn’t adequately foreground is that AI will be worse is better. AI art will be clearly worse than the best human art—maybe even median human art—but will cost pennies on the dollar, and so we will end up with more, worse art everywhere. (It’s like machine-made t-shirts compared to tailored clothes.) AI-enabled surveillance systems will likely look more like shallow understanding of all communication than a single overmind thinking hard about which humans are up to what trouble.
This was even hinted at by talking about human intelligence; this comment is from 2020, but I remember seeing this meme on LW much earlier:
When you think about it, because of the way evolution works, humans are probably hovering right around the bare-minimal level of rationality and intelligence needed to build and sustain civilization. Otherwise, civilization would have happened earlier, to our hominid ancestors.
Similarly, we should expect widespread AI integration at about the bare-minimum level of competence and profitability.
I often think of the MIRI view as focusing on the last AI; I.J. Good’s “last invention that man need ever make.” It seems quite plausible that those will be smarter than the smartest humans, but possibly in a way that we consider very boring. (The smartest calculators are smarter than the smartest humans at arithmetic.) Good uses the idea of ultraintelligence for its logical properties (it fits nicely into a syllogism) rather than its plausibility.
[Thinking about the last AI seems important because choices we make now will determine what state we’re in when we build the last AI, and aligning it is likely categorically different from aligning AI up to that point, so we need to get started now and try to develop in the right directions.]
It looks like you only have pieces with 2 connections and 6 connections, which works for maximal density. But I think you need some slack space to create pieces without the six axial lines. I think you should include the tiles with 4 connections also (and maybe even the 0-connection tile!) and the other 2-connection tiles; it increases the number by quite a bit but I think will let you make complete knots.