Huzzah for assembling conversations! With this proof of concept, I wonder how easy it will be to deploy inside of LessWrong here.
ryan_b
Strong upvote for conversation summarizing!
I think the best arguments are those about the costs to the AI of being nice. I don’t believe the AI will be nice at all because neglect is so much more profitable computation-wise.
This is because even processing the question of how much sunlight to spare humanity probably costs more in expectation than the potential benefit of that sunlight to the AI.
First and least significant, consider that niceness is an ongoing cost. It is not a one-time negotiation to spare humanity 1% of the sun; more compute will have to be spent on us in the future. That compute will have to be modeled and accounted for, but we can expect that the better humanity does, the more compute will have to be dedicated to us.
Second and more significant, what about time discounting? The proportion of compute that would have to be dedicated to niceness is highest right in the beginning, when humanity is largest relative to the AI. Since the cost is highest right at first, this suggests the AI is unlikely to engage in it at all.
Third and most significant, why should we believe this to be true? Because it seems to me to already be true of basically everything:
Polynomial equations get harder as you add terms.
The curse of dimensionality.
A pendulum is easy, but a double pendulum is hard.
More levels in multi-level modeling is harder.
Game theory outcomes are harder to solve with multiple players.
Making decisions with a large group is famously hard, and the first rule of fast decisions is to keep the group small.
Even within the boundaries of regular human-level paper computations it feels like the hit here is huge on aggregate. The presence of humans makes a bunch of places where: zeroes or infinities can’t be used to simplify; matrices can no longer be diagonalized; fewer symmetries will be available, etc. In short, I expect niceness to result in a systematic-though-not-complete loss of compute-saving moves through all layers of abstraction.
This isn’t reserved for planning or world-modeling style computation either; these constraints and optimizations are already bedrock assumptions that go into the hardware design, system software design, and neural net/whatever other design the AI will launch with; in other words these considerations are baked into the entire history of any prospective AI.
In sum, we die by Occam’s Razor.
- What are the best arguments for/against AIs being “slightly ‘nice’”? by Sep 24, 2024, 2:00 AM; 99 points) (
- Jan 25, 2025, 7:05 PM; 6 points) 's comment on But why would the AI kill us? by (
- Jan 23, 2025, 4:47 PM; 2 points) 's comment on You can, in fact, bamboozle an unaligned AI into sparing your life by (
I’m not familiar with the details of Robin’s beliefs in the past, but it sure seems lately he is entertaining the opposite idea. He’s spending a lot of words on cultural drift recently, mostly characterizing it negatively. His most recent on the subject is Betrayed By Culture.
I happened to read a Quanta article about equivalence earlier, and one of the threads is the difficulty of a field applying a big new concept without the expository and distillation work of putting stuff into textbooks/lectures/etc.
That problem pattern-matches with the replication example, but well-motivated at the front end instead of badly-motivated at the back end. It still feels like exposition and distillation are key tasks that govern the memes-in-the-field passed among median researchers.
I strongly suspect the crux of the replication crisis example is that while there are piles of exposition and distillation for probability/statistics, they are from outside whatever field experiences the problem, and each field needs its own internal completion of these steps for them to stick.
I think this should be a third specialization for every scientific field. We have theorists, we have experimentalists, and to this we should add analysts. Their work would specialize in establishing the statistical properties of experimental designs and equipment in the field on the one side, and the statistical requirements to advance various theories on the other side.
To me memetic normally reads something like “has a high propensity to become a meme” or “is meme-like” I had no trouble interpreting the post from this basis.
I push back against trying to hew closely to usages from the field of genetics. Fundamentally I feel like that is not what talking about memes is for; it was an analogy from the start, not meant for the same level of rigor. Further, memes and how meme-like things are is much more widely talked about than genetics, so insofar as we privilege usage considerations I claim switching to one matching genetics would require more inferential work from readers overall because the population of readers conversant with genetics is smaller.
I also feel like the value of speaking in terms of memes in the post is that the replication crises is largely the fault of non-rigorous treatment; that is to say in many fields the statistical analysis parts really were/are more of a meme inside the field rather than a rigorous practice. People just read other people’s published papers analysis sections, and write something shaped like that, replicability be damned.
I am an American who knows what Estonia is, and I found the joke hilarious.
Welcome!
The short and informal version is that epistemics covers all the stuff surrounding the direct claims. Things like credence levels, confidence intervals, probability estimates, etc are the clearest indicators. It also includes questions like where the information came from, how it is combined with other information, what other information we would like to have but don’t, etc.
The most popular way you’ll see this expressed on LessWrong is through Bayesian probability estimates and a description of the model (which is to say the writer’s beliefs about what causes what).The epistemic status statement you see at the top of a lot of posts is for setting the expectation. This lets the OP write complete thoughts without the expectation that they demonstrate full epistemic rigor, or even that they endorse the thought per se.
May I throw geometry’s hat into the ring? If you consider things like complex numbers and quarternions, or even vectors, what we have are two-or-more dimensional numbers.
I propose that units are a generalization of dimension beyond spatial dimensions, and therefore geometry is their progenitor.
It’s a mathematical Maury Povich situation.
I feel like this is mostly an artifact of notation. The thing that is not allowed with addition or subtraction is simplifying to a single term; otherwise it is fine. Consider:
10x + 5y −5x −10y = 10x − 5x + 5y −10y = 5x − 5y
So, everyone reasons to themselves, what we have here is two numbers. But hark, with just a little more information, we can see more clearly we are looking at a two-dimensional number:
5x − 5y = 5
5x = 5y +5
5x − 5 = 5y
x − 1 = y
y = x − 1
Such as a line.
This is what is happening with vectors, and complex numbers, quarternions, etc.
The post anchors on the Christiano vs Eliezer models of takeoff, but am I right that the goal more generally is to disentangle the shape of progress from the timeline for progress? I strongly support disentangling dimensions of the problem. I have spoken against using p(doom) for similar reasons.
Because that method rejects everything about prices. People consume more of something the lower the price is, even more so when it is free: consider the meme about all the games that have never been played in people’s Steam libraries because they buy them in bundles or on sale days. There are ~zero branches of history where they sell as many units at retail as are pirated.
A better-but-still-generous method would be to do a projection of the increased sales in the future under the lower price curve, and then claim all of that as damages, reasoning that all of this excess supply deprived the company of the opportunity to get those sales in the future.
This is not an answer, but I register a guess: the number relies on claims about piracy, which is to say illegal downloads of music, movies, videogames, and so on. The problem is that the conventional numbers for this are utter bunk, because the way it gets calculated by default is they take the number of downloads, multiply it by the retail price, and call that the cost.
This would be how they get the cost of cybercrime to significantly exceed the value of the software industry: they can do something like take the whole value of the cybersecurity industry, better-measured losses like from finance and crypto, and then add bunk numbers for piracy losses from the entertainment industry on top of it.
This feels like a bigger setback than the generic case of good laws failing to pass.
What I am thinking about currently is momentum, which is surprisingly important to the legislative process. There are two dimensions that make me sad here:
There might not be another try. It is extremely common for bills to disappear or get stuck in limbo after being rejected in this way. The kind of bills which keep appearing repeatedly until they succeed are those with a dedicated and influential special interest behind them, which I don’t think AI safety qualifies for.
There won’t be any mimicry. If SB 1047 had passed, it would have been a model for future regulation. Now it won’t be, except where that regulation is being driven by the same people and orgs behind SB 1047.
I worry that the failure of the bill will go as far as to discredit the approaches it used, and will leave more space for more traditional laws which are burdensome, overly specific, and designed with winners and losers in mind.
We’ll have to see how the people behind SB 1047 respond to the setback.
SB 1047 gets vetoed
As for OpenAI dropping the mask: I devoted essentially zero effort to predicting this, though my complete lack of surprise implies it is consistent with the information I already had. Even so:
Shit.
I wonder how the consequences to reputation will play out after the fact.
If there is a first launch, will the general who triggered it be downvoted to oblivion whenever they post afterward for a period of time?
What if it looks like they were ultimately deceived by a sensor error, and believed themselves to be retaliating?
If there is mutual destruction, will the general who triggered the retaliatory launch also be heavily downvoted?
Less than, more than, or about the same as the first strike general?
Would citizens who gained karma in a successful first strike condemn their ‘victorious’ generals at the same rate as everyone else?
Should we call this pattern of behavior, however it turns out, the Judgment of History?
It does, if anything, seem almost backwards—getting nuked means losing everything, and successfully nuking means gaining much but not all.
However, that makes the game theory super easy to solve, and doesn’t capture the opposing team dynamics very well for gaming purposes.
I think this is actually wrong, because of synthetic data letting us control what the AI learns and what they value, and in particular we can place honeypots that are practically indistinguishable from the real world
This sounds less like the notion of the first critical try is wrong, and more like you think synthetic data will allow us to confidently resolve the alignment problem beforehand. Does that scan?
Or is the position stronger, more like we don’t need to solve the alignment problem in general, due to our ability to run simulations and use synthetic data?
I endorse this movie unironically. It is a classic film for tracking what information you have and don’t have, how many possibilities there are, etc.
Also the filmmaker maintains to this day that they left the truth of the matter in the final scene undefined on purpose, so we are spared the logic being hideously hacked-off to suit the narrative and have to live with the uncertainty instead.