Inherent and irreducible uncertainty of forecasting, foolishness of precise predictions.
The importance of (Pearlian) causality, Solomonoff Induction as theory of formal epistemology, Bayesian statistics, (Shannon) information theory, decision theory [especially UDT-shaped things].
(?nanotech, ?cryonics)
if you had a timemachine to go back to 2010 you should buy bitcoin and write Harry Potter fanfiction
From Akash’s summary of the discussion between Conor Leahy and Michael Trazzi on “The Inside View” from ~ 1.5 years ago:
A lot of Eliezer’s value as a thinker is that he notices & comprehends antimemes. And he figures out how to communicate them.
An antimeme is something that by its very nature resists being known. Most antimemes are just boring—things you forget about. If you tell someone an antimeme, it bounces off them. So they need to be communicated in a special way. Moral intuitions. Truths about yourself. A psychologist doesn’t just tell you “yo, you’re fucked up bro.” That doesn’t work.
“Antimemes are completely real. There’s nothing supernatural about it. Most antimemes are just things that are boring. So things that are extraordinarily boring are antimemes because they, by their nature, resist you remembering them. And there’s also a lot of antimemes in various kinds of sociological and psychological literature. A lot of psychology literature, especially early psychology literature, which is often very wrong to be clear. Psychoanalysis is just wrong about almost everything. But the writing style, the kind of thing these people I think are trying to do is they have some insight, which is an antimeme. And if you just tell someone an antimeme, it’ll just bounce off them. That’s the nature of an antimeme. So to convey an antimeme to people, you have to be very circuitous, often through fables, through stories you have, through vibes. This is a common thing.
Moral intuitions are often antimemes. Things about various human nature or truth about yourself. Psychologists, don’t tell you, “Oh, you’re fucked up, bro. Do this.” That doesn’t work because it’s an antimeme. People have protection, they have ego. You have all these mechanisms that will resist you learning certain things. Humans are very good at resisting learning things that make themselves look bad. So things that hurt your own ego are generally antimemes. So I think a lot of what Eliezer does and a lot of his value as a thinker is that he is able, through however the hell his brain works, to notice and comprehend a lot of antimemes that are very hard for other people to understand.”
Much of the discussion at the time (example) focused on the particular application of this idea in the context of the “Death with Dignity” post, but I think this effect was visible much earlier on, most prominently in the Sequences themselves. As I see it, this did not affect the content that was being communicated so much as it did the vibe, the more ineffable, emotional, and hard-to-describe-using-S2 stylistic packaging that enveloped the specific ideas being conveyed. The latter [1], divorced from Eliezer’s presentation of them, could be (and often are) thought of as dry or entirely technical, but his writing gave them a certain life that made them rather unforgettable and allowed them to hit much harder (see “How An Algorithm Feels From the Inside” and “Beyond the Reach of God” as the standard examples of this).
I think I’d agree with everything you say (or at least know what you’re looking at as you say it) except for the importance of decision theory. What work are you watching there?
As one relevant consideration, I think the topic of “will AI kill all humans” is a question whose answer relies in substantial parts on TDT-ish considerations, and is something that a bunch of value systems I think reasonably care a lot about. Also I think what superintelligent systems will do will depend a lot on decision-theoretic considerations that seem very hard to answer from a CDT vs. EDT-ish frame.
Oh, I thought this was relatively straightforward and has been discussed a bunch. There are two lines of argument I know for why superintelligent AI, even if unaligned, might not literally kill everyone, but keep some humans alive:
The AI might care a tiny bit about our values, even if it mostly doesn’t share them
The AI might want to coordinate with other AI systems that reached superintelligence to jointly optimize the universe. So in a world where there is only a 1% chance that we align AI systems to our values, then even in unaligned worlds we might end up with AI systems that adopt our values as a 1% mixture in its utility function (and also consequently in those 1% of worlds, we might still want to trade away 99% of the universe to the values that the counterfactual AI systems would have had)
Some places where the second line of argument has been discussed:
Note that in this comment I’m not touching on acausal trade (with successful humans) or ECL. I think those are very relevant to whether AI systems kill everyone, but are less related to this implicit claim about kindness which comes across in your parables (since acausally trading AIs are basically analogous to the ants who don’t kill us because we have power).
What did Yudkoswky get right?
The central problem of AI alignment. I am not aware of anything in subsequent work that is not already implicit in Yudkowsky’s writing.
Short timelines avant le lettre. Yudkowsky was predicting AGI in his lifetime from the very start when most academics, observers, AI scientists, etc considered AGI a fairytale.
Inherent and irreducible uncertainty of forecasting, foolishness of precise predictions.
The importance of (Pearlian) causality, Solomonoff Induction as theory of formal epistemology, Bayesian statistics, (Shannon) information theory, decision theory [especially UDT-shaped things].
(?nanotech, ?cryonics)
if you had a timemachine to go back to 2010 you should buy bitcoin and write Harry Potter fanfiction
From Akash’s summary of the discussion between Conor Leahy and Michael Trazzi on “The Inside View” from ~ 1.5 years ago:
In Leahy’s own words:
Much of the discussion at the time (example) focused on the particular application of this idea in the context of the “Death with Dignity” post, but I think this effect was visible much earlier on, most prominently in the Sequences themselves. As I see it, this did not affect the content that was being communicated so much as it did the vibe, the more ineffable, emotional, and hard-to-describe-using-S2 stylistic packaging that enveloped the specific ideas being conveyed. The latter [1], divorced from Eliezer’s presentation of them, could be (and often are) thought of as dry or entirely technical, but his writing gave them a certain life that made them rather unforgettable and allowed them to hit much harder (see “How An Algorithm Feels From the Inside” and “Beyond the Reach of God” as the standard examples of this).
Stuff like probability theory, physics (Quantum Mechanics in particular), philosophy of language, etc.
I think I’d agree with everything you say (or at least know what you’re looking at as you say it) except for the importance of decision theory. What work are you watching there?
As one relevant consideration, I think the topic of “will AI kill all humans” is a question whose answer relies in substantial parts on TDT-ish considerations, and is something that a bunch of value systems I think reasonably care a lot about. Also I think what superintelligent systems will do will depend a lot on decision-theoretic considerations that seem very hard to answer from a CDT vs. EDT-ish frame.
I think I speak for many when I ask you to please elaborate on this!
Oh, I thought this was relatively straightforward and has been discussed a bunch. There are two lines of argument I know for why superintelligent AI, even if unaligned, might not literally kill everyone, but keep some humans alive:
The AI might care a tiny bit about our values, even if it mostly doesn’t share them
The AI might want to coordinate with other AI systems that reached superintelligence to jointly optimize the universe. So in a world where there is only a 1% chance that we align AI systems to our values, then even in unaligned worlds we might end up with AI systems that adopt our values as a 1% mixture in its utility function (and also consequently in those 1% of worlds, we might still want to trade away 99% of the universe to the values that the counterfactual AI systems would have had)
Some places where the second line of argument has been discussed:
This comment by Ryan Greenblatt:[1] https://www.lesswrong.com/posts/tKk37BFkMzchtZThx/miri-2024-communications-strategy?commentId=xBYimQtgASti5tgWv
This comment by Paul Christiano:[2] https://www.lesswrong.com/posts/2NncxDQ3KBDCxiJiP/cosmopolitan-values-don-t-come-free?commentId=ofPTrG6wsq7CxuTXk
See also: https://www.lesswrong.com/posts/rP66bz34crvDudzcJ/decision-theory-does-not-imply-that-we-get-to-have-nice