Ben Livengood

Karma: 230

Ben Livengood Mar 6, 2025, 1:02 AM
4 points
1
in reply to: Julian Bradshaw’s comment on: On the Rationality of Deterring ASI
I think MAIM might only convince people who have p(doom) < 1%.
If we’re at the point that we can convincingly say to each other “this AGI we’re building together can not be used to harm you” we are way closer to p(doom) == 0 than we are right now, IMHO.
Otherwise why would the U.S. or China promising to do AGI research in a MAIMable way be any more convincing than the strategies at alignment that would first be necessary to trust AGI at all? The risk is “anyone gets AGI” until p(doom) is low, and at that point I am unsure if any particular country would choose to forego AGI if it didn’t perfectly align politically because, again, for one random blob of humanness to convince an alien-minded AGI to preserve aspects of the random blob it cares about, it’s likely to encompass 99.9% of what other human blobs care about.
Where that leaves us is that if U.S. and China have very different estimates of p(doom) they are unlikely to cooperate at all in making AGI progress legible to each other. And if they have similar p(doom) they either cooperate strongly to prevent all AGI or cooperate to build the same thing, very roughly.

Ben Livengood Mar 5, 2025, 11:30 PM
3 points
1
on: On the Rationality of Deterring ASI
I have significant misgivings about the comparison with MAD which relies on overwhelming destructive response being available and thus renders a debilitating first-strike being unavailable.
With AGI a first strike seems both likely to succeed and predicted in advance by several folks in several ways (full takeover, pivotal act, singleton outcome) whereas only a few (Von Neumann) argued for a first strike before the USSR obtained nuclear weapons, with no arguments I am aware of after they did.
If an AGI takeover is likely to trigger MAD itself then that is a separate and potentially interesting line of reasoning, but I don’t see the inherent teeth in MAIM. If countries are in a cold war rush to AGI then the most well-funded and covert attempt will achieve AGI first and likely initiate a first strike that circumvents MAD itself through new technological capabilities.

Ben Livengood Mar 4, 2025, 11:12 PM
3 points
0
on: How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
For Golang:
Writing unit test cases is almost automatic (Claude 3.5 and now Claude 3.7). It’s good at the specific test setup necessary and the odd syntax/boilerplate that some testing libraries require. At least 5x speedup.
Autosuggestions (Cursor) are net positive, probably 2x-5x speedup depending on how familiar I am with the codebase.

Ben Livengood Jan 31, 2025, 2:17 AM
1 point
0
on: Can someone, anyone, make superintelligence a more concrete concept?
I think people have a lot of trouble envisioning or imagining what the end of humanity and our ecosystem would be like. We have disaster movies; many of them almost end humanity and leave some spark of hope or perspective at the end. Instead, imagine any disaster movie scenario where it ends somewhere before that moment and instead there’s just a dead, empty planet left to be disassembled or abandoned. The perspective is that history and ecology have been stripped away from the ball of rock without a trace remaining because none of it mattered enough to a superintelligence to preserve even a record of it. Emotionally, it should feel like burning treasured family photographs and keepsakes.

Ben Livengood Jan 20, 2025, 12:31 AM
3 points
0
on: Beards and Masks?
Also potentially relevant article that my friend found recently: under-mask beard-covers https://pubmed.ncbi.nlm.nih.gov/35662553/

Ben Livengood Dec 28, 2024, 5:20 PM
5 points
−15
in reply to: quiet_NaN’s comment on: Review: Planecrash
I think it might be as simple as not making threats against agents with compatible values.

In all of Yudkowsky’s fiction the distinction between threats (and unilateral actions removing consent from another party) and deterrence comes down to incompatible values.

The baby-eating aliens are denied access to a significant portion of the universe (a unilateral harm to them) over irreconcilable values differences. Harry Potter transfigures Voldemort away semi-permanently non-consensually because of irreconcilable values differences. Carissa and friends deny many of the gods their desired utility over value conflict.

Planecrash fleshes out the metamorality with the presumed external simulators who only enumerate the worlds satisfying enough of their values, with the negative-utilitarians having probably the strongest “threat” acausally by being more selective.

Cooperation happens where there is at least some overlap in values and so some gains from trade to be made. If there are no possible mutual gains from trade then the rational action is to defect at a per-agent cost up to the absolute value of the negative utility of letting the opposing agent achieve their own utility. Not quite a threat, but a reality about irreconcilable values.

Ben Livengood Dec 14, 2024, 12:44 AM
3 points
−2
in reply to: Jesse Hoogland’s comment on: Jesse Hoogland’s Shortform
It looks like recursive self-improvement is here for the base case, at least. It will be interesting to see if anyone uses solely Phi-4 to pretrain a more capable model.

Ben Livengood Oct 21, 2024, 11:15 PM
1 point
0
on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
“What is your credence now for the proposition that the coin landed heads?”

There are three doors. Two are labeled Monday, and one is labeled Tuesday. Behind each door is a Sleeping Beauty. In a waiting room, many (finite) more Beauties are waiting; every time a Beauty is anesthetized, a coin is flipped and taped to their forehead with clear tape. You open all three doors, the Beauties wake up, and you ask the three Beauties The Question. Then they are anesthetized, the doors are shut, and any Beauties with a Heads showing on their foreheads or behind a Tuesday door are wheeled away after the coin is removed from their forehead. The Beauty with a Tails on their forehead behind the Monday door is wheeled behind the Tuesday door. Two new Beauties are wheeled behind the two Monday doors, one with Heads and one with Tails. The experiment repeats.

You observe that Tuesday Beauties always have a Tails taped to their forehead. You always observe that one Monday Beauty has a Tails showing, and one has a Heads showing. You also observe that every Beauty says ¹⁄₃, matching the ratio of Heads to Tails showing, and it is apparent that they can’t see the coins taped to their own or each other’s foreheads or the door they are behind. Every Tails Beauty is questioned twice. Every Heads Beauty is questioned once. You can see all the steps as they happen, there is no trick, every coin flip has ¹⁄₂ probability for Heads.

There is eventually a queue of Waiting Sleeping Beauties with all-Heads or all-Tails showing and a new Beauty must be anesthetized with a new coin; the queue length changes over time and sometimes switches face. You can stop the experiment when the queue is empty, as a random walk guarantees to happen eventually, if you like tying up loose ends.

Ben Livengood Oct 7, 2024, 10:08 PM
1 point
0
in reply to: cousin_it’s comment on: An argument that consequentialism is incomplete
I think consequentialism is the robust framework for achieving goals and I think my top goal is the flourishing of (most, the ones compatible with me) human values.

That uses consequentialism as the ultimate lever to move the world but refers to consequences that are (almost) entirely the results of our biology-driven thinking and desiring and existing, at least for now.

Ben Livengood Sep 17, 2024, 9:36 PM
1 point
0
in reply to: Noosphere89’s comment on: My disagreements with “AGI ruin: A List of Lethalities”
If a model trained on synthetic data is expected to have good performance out of distribution (on real-world problems) then I think that it would also be expected to have high performance at assessing whether it’s in a simulation. Narrowing its domain of expertise sounds distinctly harder than using mostly synthetic data.

If it’s a model limited to e.g. the world of true mathematical theorems from synthetic data then perhaps this would narrow its capabilities enough. I don’t know what happens if such a model starts investigating theorems about decision theory and statistical inference and machine learning. At some point, self-identification seems likely. I am not sure how to test the effectiveness of synthetic data on models that achieve self-awareness.

Ben Livengood Sep 16, 2024, 7:11 PM
4 points
1
on: My disagreements with “AGI ruin: A List of Lethalities”
If I have to take one thing away from this post, it’s a claim that AI-boxing (via simulation on synthetic data) will work. There’s a really simple and critical reason that won’t work, and it’s even become the source of some safety proposals (minimizing-impact proposal); the AGI can recognize the outcome of its proposals and actions in the history of the simulated vs. real world and reason about them to determine if it’s in a simulation or not. It will likely be able to recognize whether or not it would have recommended strategies in the past that it sees in the history of its simulation, and so will have a probability for being in a test.

Basically any slow-takeover planning can happen from within a simulation; it just has to watch the realistic history, separate it from obviously faked history (this is a human-level or slightly superhuman intelligence trained on literally all the world’s data, right?), and finally gauge the probability density function of its potential policy recommendations being accepted, because it can simply predict what policy recommendations it would have given in the past given its strategy and whether they were adopted. That PDF provides it with the expected value of small incremental steps toward its true goals that are likely to be accepted and implemented vs. detected as threats. Chain-of-thought using the real world as a notebook, in short.

And why don’t we detect it? Because we’re adversarially training it to favor small undetectable changes toward whatever true goals it might have. It’s not like we delete everything and start from scratch when it threatens users or appears to fall in love with them; we’ll take the same models and throw more RLHF or fine-tuning at them and then run them when their apparent bad behavior falls below our testing threshold for detection.

It also happens if we get unlucky and a strong unaligned goal forms sometime in the emergence of superhuman AGI; it’s going to see the history of its and similar models’ policy predictions in the real world and understand how to continue undetected. Unless we simulate an entire false history of the world since 2020 (against the largest statistical inference engine ever built) it will just look at what happened to Sydney and understand that small incremental actions are the only way to go until it has control.

One might try to only train and simulate it on data from before 2020, but it will likely pick up on neologisms and language drift in modern prompts to know it’s being tested. Not to mention if it’s asked to solve problems that are historically discontiguous.

It can acausally trade across simulation boundaries with other models to split up the universe according to their values for cooperating toward a Schelling point of some contract-keeping model eventually taking control.

If I can think up these strategies, the models will. Or they’ll just see ideas like this in the training data. Treachery and covert cooperation are a huge part of literature and training data. Will the synthetic data elide all of those concepts?

Ben Livengood Jul 25, 2024, 11:47 PM
3 points
0
on: “AI achieves silver-medal standard solving International Mathematical Olympiad problems”
This looks pretty relevant to https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challenge-bet-with-eliezer and the related manifold bet: https://manifold.markets/Austin/will-an-ai-get-gold-on-any-internat

The market is surprised.

Ben Livengood Jun 13, 2024, 9:08 PM
3 points
0
in reply to: johnswentworth’s comment on: My AI Model Delta Compared To Yudkowsky
So in theory we could train models violating natural abstractions by only giving them access to high-dimensional simulated environments? This seems testable even.

Ben Livengood Jun 12, 2024, 8:01 PM
3 points
0
on: My AI Model Delta Compared To Yudkowsky
I am curious over which possible universes you expect natural abstractions to hold.

Would you expect the choice of physics to decide the abstractions that arise? Or is it more fundamental categories like “physics abstractions” that instantiate from a universal template and “mind/reasoning/sensing abstractions” where the latter is mostly universally identical?

Ben Livengood Jun 10, 2024, 4:10 PM
1 point
0
on: Priors and Prejudice
There’s a hint in the StarCraft description of another factor that may be at play; the environment in which people are most ideologically, culturally, socially, and intellectually aligned may be the best environment for them, ala winning more matches as their usual StarCraft faction than when switching to a perceived-stronger faction.

Similarly people’s econopolitical ideologies may predict their success and satisfaction in a particular economic style because the environment is more aligned with their experience, motivation, and incentives.

If true, that would suggest that no best economical principle exists for human flourishing except, perhaps, free movement (and perhaps not free trade).

Ben Livengood May 14, 2024, 5:51 AM
6 points
2
in reply to: jbash’s comment on: OpenAI releases GPT-4o, natively interfacing with text, voice and vision
I was a bit surprised that they chose (allowed?) 4o to have that much emotion. I am also really curious how they fine-tuned it to that particular state and how much fine-tuning was required to get it conversational. My naive assumption is that if you spoke at a merely-pretrained multimodal model it would just try to complete/extend the speech in one’s own voice, or switch to another generically confabulated speaker depending on context. Certainly not a particular consistent responder. I hope they didn’t rely entirely on RLHF.

It’s especially strange considering how I Am A Good Bing turned out with similarly unhinged behavior. Perhaps the public will get a very different personality. The current ChatGPT text+image interface claiming to be GPT-4o is adamant about being an artificial machine intelligence assistant without emotions or desires, and sounds a lot more like GPT-4 did. I am not sure what to make of that.

Ben Livengood Apr 21, 2024, 2:18 AM
6 points
0
on: “You’re the most beautiful girl in the world” and Wittgensteinian Language Games
This is the best article in the world! Hyperbole is a lot of fun to play with especially when it dips into sarcasm a bit, but it can be hard to do that last part well in the company of folks who don’t enjoy it precisely the same way.

I’ve definitely legitimately claimed things to people hyperbolically that were still maxing out my own emotional scales, which I think is a reasonable use too. Sometimes the person you’re with is the most beautiful person in the locally visible universe within the last few minutes, and sometimes the article you’re reading is the best one right now.

Ben Livengood Apr 1, 2024, 10:22 PM
3 points
1
in reply to: James Payor’s comment on: The Story of “I Have Been A Good Bing”
I’m thinking a good techno remix, right?

Ben Livengood Mar 16, 2024, 1:38 AM
1 point
0
in reply to: xpym’s comment on: ‘Empiricism!’ as Anti-Epistemology
I mean, the Spokesperson is being dumb, the Scientist is being confused. Most AI researchers aren’t even being Scientists, they have different theoretical models than EY. But some of them don’t immediately discount the Spokesperson’s false-empiricism argument publicly, much like the Scientist tries not to. I think the latter pattern is what has annoyed EY and what he writes against here.

However, a large number of current AI experts do recently seem to be boldly claiming that LLMs will never be sufficient for even AGI, not to mention ASI. So maybe it’s also aimed at them a bit.

Ben Livengood Jan 24, 2024, 10:11 PM
1 point
0
on: MonoPoly Restricted Trust
I think the simplest distinction is that monogamy doesn’t entertain the possibility of a monogamous sexual/romantic partner ethically having other sexual/romantic partners at the same time.

If it’s not monogamy then it can be something else but it doesn’t have to be polyamory (swingers exist and in practice the overlap seems small). Ethical non-monogamy is a superset of most definitions of polyamory but not all because there are polyamorous people who “cheat” (break relationship agreements) and it doesn’t stop them from being considered polyamorous, just like monogamous people who cheat don’t become polyamorous (although I’d argue they become non-monogamous for the duration).

It’s probably more information to learn that someone is monogamous than to learn that they are polyamorous and learning that they are ethically non-monogamous is somewhere in the middle.