“A Muggle security expert would have called it fence-post security, like building a fence-post over a hundred metres high in the middle of the desert. Only a very obliging attacker would try to climb the fence-post. Anyone sensible would just walk around the fence-post, and making the fence-post even higher wouldn’t stop that.” —HPMOR, Ch. 115
(Not to be confused with the Trevor who works at Open Phil)
Weird coincidence: I was just thinking about Leopold’s bunker concept from his essay. It was a pretty careless paper overall but the imperative to put alignment research in a bunker makes perfect sense; I don’t see the surface as a viable place for people to get serious work done (at least, not in densely populated urban areas; somewhere in the desert would count as a “bunker” in this case so long as it’s sufficiently distant from passerbys and the sensors and compute in their phones and cars).
Of course, this is unambiguously a necessary evil that a tiny handful of people are going to have to choose to live in a sad uncomfortable place for a while, and only because there’s no other option and it’s obviously the correct move for everyone everywhere including the people in the bunker.
Until the basics of the situation start somehow getting taught in the classrooms or something, we’re going to be stuck with a ludicrously large proportion of people satisfied with the kind of bite-sized convenient takes that got us into this whole unhinged situation in the first place (or have no thoughts at all).
I would have liked to write a post that offers one weird trick to avoid being confused by which areas of AI are more or less safe to advance, but I can’t write that post. As far as I know, the answer is simply that you have to model the social landscape around you and how your research contributions are going to be applied.
Another thing that can’t be ignored is the threat of Social Balkanization, Divide-and-conquer tactics have been prevalent among military strategists for millennia, and the tactic remains prevalent and psychologically available among the people making up corporate factions and many subcultures (likely including leftist and right-wing subcultures).
It is easy for external forces to notice opportunities to Balkanize a group, to make it weaker and easier to acquire or capture the splinters, which in turn provides further opportunity for lateral movement and spotting more exploits. Since awareness and exploitation of this vulnerability is prevalent, social systems without this specific hardening are very brittle and have dismal prospects.
Sadly, Balkanization can also emerge naturally, as you helpfully pointed out in Consciousness as a Conflationary Alliance Term, so the high base rates make it harder to correctly distinguishing attacks from accidents. Inadequately calibrated autoimmune responses are not only damaging, but should be assumed to be automatically anticipated and misdirected by default, including as part of the mundane social dynamics of a group with no external attackers.
The only reason I could think of that this would be the “worst argument in the world” is because it strongly indicates low-level thinkers (e.g. low decouplers).
An actual “worst argument in the world” would be whatever maximizes the gap between people’s models and accurate models.
At the time, I thought something like “given that the nasal tract already produces NO, it seems possible that humming doesn’t increase the NO in the lungs by enough orders of magnitude to make once per hour sufficient”, but I never said anything until too late and a bunch of other people figured it out, and also a bunch of other useful stuff that I was pretty far away from noticing (e.g. considering the rate at which the nasal tract accumulates NO to be released by humming).
Wish I’d said something back when it was still valuable.
It almost always took a personal plea from a persecuted person for altruism to kick in. Once they weren’t just an anonymous member of indifferent crowd, once they were left with no escape but to do a personal moral choice, they often found out that they are not able to refuse help.
This is a crux. I think a better way to look at it is they didn’t have an opportunity to clarify their preference until the situation was in front of them. Otherwise, it’s too distant and hypothetical to process, similar to scope insensitivity (the 2,000⁄20,000/200,000 oil-covered birds thing).
The post-hoc cognitive dissonance angle seems like a big find, and strongly indicates that reliably moral supermen can be produced at scale given an optimized equilibria for them to emerge from.
Stable traits (possibly partially genetic) are likely highly relevant to not-yet-clarified preferences, of course. Epistemics here are difficult due to expecting short inferential distances; Duncan Sabien gave an interesting take on this in a facebook post:
Also, if your worldview is such that, like. *Everyone* makes awful comments like that in the locker room, *everyone* does angle-shooting and tries to scheme and scam their way to the top, *everyone* is looking out for number one, *everyone* lies …
… then *given* that premise, it makes sense to view Trump in a positive light. He’s no worse than everybody else, he’s just doing the normal things that everyone does, with the *added layer* that he’s brave enough and candid enough and strong enough that he *doesn’t have to pretend he doesn’t.*
Admirable! Refreshingly honest and clean!
So long as you can’t conceive of the fact that lots of people are actually just …...............… good. They’re not fighting against urges to be violent or to rape, they’re not biting their tongues when they want to say scathing and hurtful things, they’re not jealous and bitter and willing to throw others under the bus to get ahead. They’re just … fundamentally not interested in any of that.
(To be clear: if you are feeling such impulses all the time and you’re successfully containing them or channeling them and presenting a cooperative and prosocial mask: that is *also* good, and you are a good person by virtue of your deliberate choice to be good. But like. Some people just really *are* the way that other people have to *make* themselves be.)
It sort of vaguely rhymes, in my head, with the type of person who thinks that *everyone* is constantly struggling against the urge to engage in homosexual behavior, how dare *those* people give up the good fight and just *indulge* themselves … without realizing that, hey, bro, did you know that a lot of people are just straight? And that your internal experience is, uh, *different* from theirs?
The best thing I’ve found so far is to watch a movie, and whenever the screen flashes, any moment you feel weirdly relaxed or any other weird feeling feeling, quickly turn your head and eyes ~60 degrees and gently but firmly bite your tongue.
Doing this a few minutes a day for 30 days might substantially improve resistance to a wide variety of threats.
Gently but firmly biting my tongue, for me, also seems like a potentially very good general-use strategy to return the mind to an alert and clear-minded base state, seems like something Critch recommended e.g. for initiatiing a TAP flowchain. I don’t think this can substitute for a whiteboard, but it sure can nudge you towards one.
One of the main bottlenecks on explaining the full gravity of the AI situation to people is that they’re already worn out from hearing about climate change, which for decades has been widely depicted as an existential risk with the full persuasive force of the environmentalism movement.
Fixing this rather awful choke point could plausibly be one of the most impactful things here. The “Global Risk Prioritization” concept is probably helpful for that but I don’t know how accessible it is. Heninger’s series analyzing the environmentalist movement was fantastic, but the fact that it came out recently instead of ten years ago tells me that the “climate fatigue” problem might be understudied, and evaluation of climate fatigue’s difficulty/hopelessness might yield unexpectedly hopeful results.
I just found out that hypnosis is real and not pseudoscience. Apparently the human brain has a zero day such that other humans can find ways to read and write to your memory, and everyone is insisting that this is fine and always happens with full awareness and consent?
Wikipedia says as many as 90% of people are at least moderately susceptible, and depending how successful people have been over the last couple centuries at finding ways to reduce detection risk per instance (e.g. developing and and selling various galaxy-brained misdirection ploys), that seems like very fertile ground for salami-slicing attacks which wear down partially resistant people.
The odds that something like this would be noticed and tested/scaled/optimized by competent cybersecurity experts and power lawyers seems pretty high (e.g. screen refresh rate oscillation in non-visible ways to increase feelings of stress or discomfort and then turning it off whenever the user’s eyes are bout to go over specific kinds of words, slightly altering the color output of specific pixels across the screen in the shape of words and measuring effectiveness based on whether it causally increases the frequency of people using those words, some kind of way to combine these two tactics, something derived from the millions of people on youtube trying hard to look for a video file that hypnotizes them, etc).
It’s really frustrating living in a post-MKUltra world, where every decade our individual sovereignty as humans is increasingly reliant on very senior government officials (who are probably culturally similar to the type of person who goes to business school and have been for centuries) either consistently not succeeding at any of the manipulation science which they are heavily incentivized to diversify their research investment in, or taking them at their word when they insist that they genuinely believe in protecting democracy and the bad things they get caught doing are in service towards that end. Also, they seem to remain uninterested in life extension, possibly due in part to being buried deep in a low-trust dark forest (is trust even possible at all if you’re trapped on a planet with hypnosis?).
Aside from the incredibly obvious move to cover up your fucking webcam right now, are there any non-fake defensive strategies to reduce the risk that someone walks up to you/hacks your computer and takes everything from you? Is there some reliable way to verify that the effects are consistently weak or that scaling isn’t viable? The error bars are always really wide for the prevalence of default-concealed deception (especially when it comes to stuff that wouldn’t scale until the 2010s), making solid epistemics a huge pain to get right, but the situation with directly reading and writing to memory is just way way too extreme to ignore.
Strong upvoted, thank you for the serious contribution.
Children spending 300 hours per year learning math, on their own time and via well-designed engaging video-game-like apps (with eg AI tutors, video lectures, collaborating with parents to dispense rewards for performance instead of punishments for visible non-compliance, and results measured via standardized tests), at the fastest possible rate for them (or even one of 5 different paces where fewer than 10% are mistakenly placed into the wrong category) would probably result in vastly superior results among every demographic than the current paradigm of ~30-person classrooms.
in just the last two years I’ve seen an explosion in students who discreetly wear a wireless earbud in one ear and may or may not be listening to music in addition to (or instead of) whatever is happening in class. This is so difficult and awkward to police with girls who have long hair that I wonder if it has actually started to drive hair fashion in an ear-concealing direction.
This isn’t just a problem with the students; the companies themselves end up in equilibria where visibly controversial practices get RLHF’d into being either removed or invisible (or hard for people to put their finger on). For example, hours a day of instant gratification reducing attention spans, except unlike the early 2010s where it became controversial, reducing attention spans in ways too complicated or ambiguous for students and teachers to put their finger on until a random researcher figures it out and makes the tacit explicit. Or another counterintuitive vector could be the democratic process of public opinion turns against schooling, except in a lasting way. Or the results of multiple vectors like these overlapping.
I don’t see how the classroom-based system, dominated entirely by bureaucracies and tradition, could possibly compete with that without visibly being turned into swiss cheese. It might have been clinging on to continued good results from a dwindling proportion of students who were raised to be morally/ideologically in favor of respecting the teacher more than the other students, but that proportion will also decline as schooling loses legitimacy.
Regulation could plausibly halt the trend from most or all angles, but it would have to be the historically unprecedented kind of regulation that’s managed by regulators with historically unprecedented levels of seriousness and conscientiousness towards complex hard-to-predict/measure outcomes.
I was just wondering, what are some of the branches of rationality that you’re aware of that you’re currently most optimistic about, and/or would be glad to see more people spending time on, if any? Now that people are rapidly shifting effort to policymaking in DC and UK (including through EA) which is largely uncharted territory, what texts/posts/branches do you think might be a good fit for them?
I’ve been thinking that recommending more people to read ratfic would be unusually good for policy efforts, since it’s something very socially acceptable for high-minded people to do in their free time, should have a big impact through extant orgs without costing any additional money, and it’s not weird or awkward in the slightest to talk about the original source if a conversation gets anyone interested in going deeper into where they got the idea from.
Plus, it gets/keeps people in the right headspace the curveballs that DC hits people with, which tend to be largely human-generated and therefore simple enough for humans to easily understand, just like the cartoonish simplifications of reality in ratfic (unusually low levels of math/abstraction/complexity but unusually high levels of linguistic intelligence, creative intelligence, and quick reactions e.g. social situations).
But unlike you, I don’t have much of a track record making judgments about big decisions like this and then seeing how they play out over years in complicated systems.
I think that suddenly starting to using written media (even journals), in an environment without much or any guidance, is like pressing too hard on the gas; you’re gaining incredible power and going from zero to one on things faster than you ever have before.
Depending on their environment and what they’re interested in starting out, some people might learn (or be shown) how to steer quickly, whereas others might accumulate/scaffold really lopsided optimization power and crash and burn (e.g. getting involved in tons of stuff at once that upon reflection was way too much for someone just starting out).
For those of us who haven’t already, don’t miss out on the paper this was based off of. It’s a serious banger for anyone interested in the situation on the ground and probably one of the most interesting and relevant papers this year.
It’s not something to miss just because you don’t find environmentalism itself very valuable; if you think about it for a while, it’s pretty easy to see the reasons why they’re a fantastic case study for a wide variety of purposes.
Here’s a snapshot of the table of contents:
(the link to the report seems to be broken; are the 4 blog posts roughly the same piece?)
Notably, this interview was on March 18th, and afaik the highest-level interview Altman has had to give his two cents since the incident. There’s a transcript here. (There was also this podcast a couple days ago).
I think a Dwarkesh-Altman podcast would be more likely to arrive at more substance from Altman’s side of the story. I’m currently pretty confident that Dwarkesh and Altman are sufficiently competent to build enough trust to make sane and adequate pre-podcast agreements (e.g. don’t be an idiot who plays tons of one-shot games just because podcast cultural norms are more vivid in your mind than game theory), but I might be wrong about this; trailblazing the frontier of making-things-happen, like Dwarkesh and Altman are, is a lot harder than thinking about the frontier of making-things-happen.
Ingroup losing status? Few things are more prone to distorted perception than that.
And I think this makes sense (e.g. Simler’s Social Status: Down the Rabbit Hole which you’ve probably read), if you define “AI Safety” as “people who think that superintelligence is serious business or will be some day”.
The psych dynamic that I find helpful to point out here is Yud’s Is That Your True Rejection post from ~16 years ago. A person who hears about superintelligence for the first time will often respond to their double-take at the concept by spamming random justifications for why that’s not a problem (which, notably, feels like legitimate reasoning to that person, even though it’s not). An AI-safety-minded person becomes wary of being effectively attacked by high-status people immediately turning into what is basically a weaponized justification machine, and develops a deep drive wanting that not to happen. Then justifications ensue for wanting that to happen less frequently in the world, because deep down humans really don’t want their social status to be put at risk (via denunciation) on a regular basis like that. These sorts of deep drives are pretty opaque to us humans but their real world consequences are very strong.
Something that seems more helpful than playing whack-a-mole whenever this issue comes up is having more people in AI policy putting more time into improving perspective. I don’t see shorter paths to increasing the number of people-prepared-to-handle-unexpected-complexity than giving people a broader and more general thinking capacity for thoughtfully reacting to the sorts of complex curveballs that you get in the real world. Rationalist fiction like HPMOR is great for this, as well as others e.g. Three Worlds Collide, Unsong, Worth the Candle, Worm (list of top rated ones here). With the caveat, of course, that doing well in the real world is less like the bite-sized easy-to-understand events in ratfic, and more like spotting errors in the methodology section of a study or making money playing poker.
I think, given the circumstances, it’s plausibly very valuable e.g. for people already spending much of their free time on social media or watching stuff like The Office, Garfield reruns, WWI and Cold War documentaries, etc, to only spend ~90% as much time doing that and refocusing ~10% to ratfic instead, and maybe see if they can find it in themselves to want to shift more of their leisure time to that sort of passive/ambient/automatic self-improvement productivity.
However I would continue to emphasize in general that life must go on. It is important for your mental health and happiness to plan for the future in which the transformational changes do not come to pass, in addition to planning for potential bigger changes. And you should not be so confident that the timeline is short and everything will change so quickly.
This is actually one of the major reasons why 80k recommended information security as one of their top career areas; the other top career areas have pretty heavy switching costs and serious drawbacks if you end up not being a good fit e.g. alignment research, biosecurity, and public policy.
Cybersecurity jobs, on the other hand, are still booming, and depending on how security automation and prompt engineering goes, the net jobs lost by AI is probably way lower than other industries e.g. because more eyeballs might offer perception and processing power that supplement or augment LLMs for a long time, and more warm bodies means more attackers which means more defenders.
The program expanded in response to Amazon wanting to collect data about more retailers, not because Amazon was viewing this program as a profit center.
Monopolies are profitable and in that case the program would have more than paid for itself, but I probably should have mentioned that explicitly, since maybe someone could have objected that they could have been were more focused on mitigating risk of market share shrinking or accumulating power, instead of increasing profit in the long term. Maybe I fit too much into 2 paragraphs here.
I didn’t see any examples mentioned in the WSJ article of Amazon employees cutting corners or making simple mistakes that might have compromised operations.
Hm, that stuff seemed like cutting corners to me. Maybe I was poorly calibrated on this e.g. using a building next to the Amazon HQ was correctly predicted by operatives to be extremely low risk.
Thanks, I’ll look into this! Epistemics is difficult when it comes to publicly available accounts of intelligence agency operations, but I guess you could say the same for bigtech leaks (and the future of neurotoxin poisoning is interesting just for its own sake eg because lower effect strains and doses could be disguised as natural causes like dementia).
That’s interesting, what’s the point of reference that you’re using here for competence? I think stuff from eg the 1960s would be bad reference cases but anything more like 10 years from the start date of this program (after ~2005) would be fine.
You’re right that the leak is the crux here, and I might have focused too much on the paper trail (the author of the article placed a big emphasis on that).
trevor
“A Muggle security expert would have called it fence-post security, like building a fence-post over a hundred metres high in the middle of the desert. Only a very obliging attacker would try to climb the fence-post. Anyone sensible would just walk around the fence-post, and making the fence-post even higher wouldn’t stop that.” —HPMOR, Ch. 115
(Not to be confused with the Trevor who works at Open Phil)
Do Metropolitan Man!
Also, here’s a bunch of ratfic to read and review, weighted by the number of 2022 Lesswrong survey respondents who read them:
Weird coincidence: I was just thinking about Leopold’s bunker concept from his essay. It was a pretty careless paper overall but the imperative to put alignment research in a bunker makes perfect sense; I don’t see the surface as a viable place for people to get serious work done (at least, not in densely populated urban areas; somewhere in the desert would count as a “bunker” in this case so long as it’s sufficiently distant from passerbys and the sensors and compute in their phones and cars).
Of course, this is unambiguously a necessary evil that a tiny handful of people are going to have to choose to live in a sad uncomfortable place for a while, and only because there’s no other option and it’s obviously the correct move for everyone everywhere including the people in the bunker.
Until the basics of the situation start somehow getting taught in the classrooms or something, we’re going to be stuck with a ludicrously large proportion of people satisfied with the kind of bite-sized convenient takes that got us into this whole unhinged situation in the first place (or have no thoughts at all).
Another thing that can’t be ignored is the threat of Social Balkanization, Divide-and-conquer tactics have been prevalent among military strategists for millennia, and the tactic remains prevalent and psychologically available among the people making up corporate factions and many subcultures (likely including leftist and right-wing subcultures).
It is easy for external forces to notice opportunities to Balkanize a group, to make it weaker and easier to acquire or capture the splinters, which in turn provides further opportunity for lateral movement and spotting more exploits. Since awareness and exploitation of this vulnerability is prevalent, social systems without this specific hardening are very brittle and have dismal prospects.
Sadly, Balkanization can also emerge naturally, as you helpfully pointed out in Consciousness as a Conflationary Alliance Term, so the high base rates make it harder to correctly distinguishing attacks from accidents. Inadequately calibrated autoimmune responses are not only damaging, but should be assumed to be automatically anticipated and misdirected by default, including as part of the mundane social dynamics of a group with no external attackers.
There is no way around the loss function.
The only reason I could think of that this would be the “worst argument in the world” is because it strongly indicates low-level thinkers (e.g. low decouplers).
An actual “worst argument in the world” would be whatever maximizes the gap between people’s models and accurate models.
Can you expand the list, go into further detail, or list a source that goes into further detail?
At the time, I thought something like “given that the nasal tract already produces NO, it seems possible that humming doesn’t increase the NO in the lungs by enough orders of magnitude to make once per hour sufficient”, but I never said anything until too late and a bunch of other people figured it out, and also a bunch of other useful stuff that I was pretty far away from noticing (e.g. considering the rate at which the nasal tract accumulates NO to be released by humming).
Wish I’d said something back when it was still valuable.
This is a crux. I think a better way to look at it is they didn’t have an opportunity to clarify their preference until the situation was in front of them. Otherwise, it’s too distant and hypothetical to process, similar to scope insensitivity (the 2,000⁄20,000/200,000 oil-covered birds thing).
The post-hoc cognitive dissonance angle seems like a big find, and strongly indicates that reliably moral supermen can be produced at scale given an optimized equilibria for them to emerge from.
Stable traits (possibly partially genetic) are likely highly relevant to not-yet-clarified preferences, of course. Epistemics here are difficult due to expecting short inferential distances; Duncan Sabien gave an interesting take on this in a facebook post:
The best thing I’ve found so far is to watch a movie, and whenever the screen flashes, any moment you feel weirdly relaxed or any other weird feeling feeling, quickly turn your head and eyes ~60 degrees and gently but firmly bite your tongue.
Doing this a few minutes a day for 30 days might substantially improve resistance to a wide variety of threats.
Gently but firmly biting my tongue, for me, also seems like a potentially very good general-use strategy to return the mind to an alert and clear-minded base state, seems like something Critch recommended e.g. for initiatiing a TAP flowchain. I don’t think this can substitute for a whiteboard, but it sure can nudge you towards one.
One of the main bottlenecks on explaining the full gravity of the AI situation to people is that they’re already worn out from hearing about climate change, which for decades has been widely depicted as an existential risk with the full persuasive force of the environmentalism movement.
Fixing this rather awful choke point could plausibly be one of the most impactful things here. The “Global Risk Prioritization” concept is probably helpful for that but I don’t know how accessible it is. Heninger’s series analyzing the environmentalist movement was fantastic, but the fact that it came out recently instead of ten years ago tells me that the “climate fatigue” problem might be understudied, and evaluation of climate fatigue’s difficulty/hopelessness might yield unexpectedly hopeful results.
I just found out that hypnosis is real and not pseudoscience. Apparently the human brain has a zero day such that other humans can find ways to read and write to your memory, and everyone is insisting that this is fine and always happens with full awareness and consent?
Wikipedia says as many as 90% of people are at least moderately susceptible, and depending how successful people have been over the last couple centuries at finding ways to reduce detection risk per instance (e.g. developing and and selling various galaxy-brained misdirection ploys), that seems like very fertile ground for salami-slicing attacks which wear down partially resistant people.
The odds that something like this would be noticed and tested/scaled/optimized by competent cybersecurity experts and power lawyers seems pretty high (e.g. screen refresh rate oscillation in non-visible ways to increase feelings of stress or discomfort and then turning it off whenever the user’s eyes are bout to go over specific kinds of words, slightly altering the color output of specific pixels across the screen in the shape of words and measuring effectiveness based on whether it causally increases the frequency of people using those words, some kind of way to combine these two tactics, something derived from the millions of people on youtube trying hard to look for a video file that hypnotizes them, etc).
It’s really frustrating living in a post-MKUltra world, where every decade our individual sovereignty as humans is increasingly reliant on very senior government officials (who are probably culturally similar to the type of person who goes to business school and have been for centuries) either consistently not succeeding at any of the manipulation science which they are heavily incentivized to diversify their research investment in, or taking them at their word when they insist that they genuinely believe in protecting democracy and the bad things they get caught doing are in service towards that end. Also, they seem to remain uninterested in life extension, possibly due in part to being buried deep in a low-trust dark forest (is trust even possible at all if you’re trapped on a planet with hypnosis?).
Aside from the incredibly obvious move to cover up your fucking webcam right now, are there any non-fake defensive strategies to reduce the risk that someone walks up to you/hacks your computer and takes everything from you? Is there some reliable way to verify that the effects are consistently weak or that scaling isn’t viable? The error bars are always really wide for the prevalence of default-concealed deception (especially when it comes to stuff that wouldn’t scale until the 2010s), making solid epistemics a huge pain to get right, but the situation with directly reading and writing to memory is just way way too extreme to ignore.
Strong upvoted, thank you for the serious contribution.
Children spending 300 hours per year learning math, on their own time and via well-designed engaging video-game-like apps (with eg AI tutors, video lectures, collaborating with parents to dispense rewards for performance instead of punishments for visible non-compliance, and results measured via standardized tests), at the fastest possible rate for them (or even one of 5 different paces where fewer than 10% are mistakenly placed into the wrong category) would probably result in vastly superior results among every demographic than the current paradigm of ~30-person classrooms.
This isn’t just a problem with the students; the companies themselves end up in equilibria where visibly controversial practices get RLHF’d into being either removed or invisible (or hard for people to put their finger on). For example, hours a day of instant gratification reducing attention spans, except unlike the early 2010s where it became controversial, reducing attention spans in ways too complicated or ambiguous for students and teachers to put their finger on until a random researcher figures it out and makes the tacit explicit. Or another counterintuitive vector could be the democratic process of public opinion turns against schooling, except in a lasting way. Or the results of multiple vectors like these overlapping.
I don’t see how the classroom-based system, dominated entirely by bureaucracies and tradition, could possibly compete with that without visibly being turned into swiss cheese. It might have been clinging on to continued good results from a dwindling proportion of students who were raised to be morally/ideologically in favor of respecting the teacher more than the other students, but that proportion will also decline as schooling loses legitimacy.
Regulation could plausibly halt the trend from most or all angles, but it would have to be the historically unprecedented kind of regulation that’s managed by regulators with historically unprecedented levels of seriousness and conscientiousness towards complex hard-to-predict/measure outcomes.
Thank you for making so much possible.
I was just wondering, what are some of the branches of rationality that you’re aware of that you’re currently most optimistic about, and/or would be glad to see more people spending time on, if any? Now that people are rapidly shifting effort to policymaking in DC and UK (including through EA) which is largely uncharted territory, what texts/posts/branches do you think might be a good fit for them?
I’ve been thinking that recommending more people to read ratfic would be unusually good for policy efforts, since it’s something very socially acceptable for high-minded people to do in their free time, should have a big impact through extant orgs without costing any additional money, and it’s not weird or awkward in the slightest to talk about the original source if a conversation gets anyone interested in going deeper into where they got the idea from.
Plus, it gets/keeps people in the right headspace the curveballs that DC hits people with, which tend to be largely human-generated and therefore simple enough for humans to easily understand, just like the cartoonish simplifications of reality in ratfic (unusually low levels of math/abstraction/complexity but unusually high levels of linguistic intelligence, creative intelligence, and quick reactions e.g. social situations).
But unlike you, I don’t have much of a track record making judgments about big decisions like this and then seeing how they play out over years in complicated systems.
Have you tried whiteboarding-related techniques?
I think that suddenly starting to using written media (even journals), in an environment without much or any guidance, is like pressing too hard on the gas; you’re gaining incredible power and going from zero to one on things faster than you ever have before.
Depending on their environment and what they’re interested in starting out, some people might learn (or be shown) how to steer quickly, whereas others might accumulate/scaffold really lopsided optimization power and crash and burn (e.g. getting involved in tons of stuff at once that upon reflection was way too much for someone just starting out).
For those of us who haven’t already, don’t miss out on the paper this was based off of. It’s a serious banger for anyone interested in the situation on the ground and probably one of the most interesting and relevant papers this year.
It’s not something to miss just because you don’t find environmentalism itself very valuable; if you think about it for a while, it’s pretty easy to see the reasons why they’re a fantastic case study for a wide variety of purposes.
Here’s a snapshot of the table of contents:
(the link to the report seems to be broken; are the 4 blog posts roughly the same piece?)
Notably, this interview was on March 18th, and afaik the highest-level interview Altman has had to give his two cents since the incident. There’s a transcript here. (There was also this podcast a couple days ago).
I think a Dwarkesh-Altman podcast would be more likely to arrive at more substance from Altman’s side of the story. I’m currently pretty confident that Dwarkesh and Altman are sufficiently competent to build enough trust to make sane and adequate pre-podcast agreements (e.g. don’t be an idiot who plays tons of one-shot games just because podcast cultural norms are more vivid in your mind than game theory), but I might be wrong about this; trailblazing the frontier of making-things-happen, like Dwarkesh and Altman are, is a lot harder than thinking about the frontier of making-things-happen.
Recently, John Wentworth wrote:
And I think this makes sense (e.g. Simler’s Social Status: Down the Rabbit Hole which you’ve probably read), if you define “AI Safety” as “people who think that superintelligence is serious business or will be some day”.
The psych dynamic that I find helpful to point out here is Yud’s Is That Your True Rejection post from ~16 years ago. A person who hears about superintelligence for the first time will often respond to their double-take at the concept by spamming random justifications for why that’s not a problem (which, notably, feels like legitimate reasoning to that person, even though it’s not). An AI-safety-minded person becomes wary of being effectively attacked by high-status people immediately turning into what is basically a weaponized justification machine, and develops a deep drive wanting that not to happen. Then justifications ensue for wanting that to happen less frequently in the world, because deep down humans really don’t want their social status to be put at risk (via denunciation) on a regular basis like that. These sorts of deep drives are pretty opaque to us humans but their real world consequences are very strong.
Something that seems more helpful than playing whack-a-mole whenever this issue comes up is having more people in AI policy putting more time into improving perspective. I don’t see shorter paths to increasing the number of people-prepared-to-handle-unexpected-complexity than giving people a broader and more general thinking capacity for thoughtfully reacting to the sorts of complex curveballs that you get in the real world. Rationalist fiction like HPMOR is great for this, as well as others e.g. Three Worlds Collide, Unsong, Worth the Candle, Worm (list of top rated ones here). With the caveat, of course, that doing well in the real world is less like the bite-sized easy-to-understand events in ratfic, and more like spotting errors in the methodology section of a study or making money playing poker.
I think, given the circumstances, it’s plausibly very valuable e.g. for people already spending much of their free time on social media or watching stuff like The Office, Garfield reruns, WWI and Cold War documentaries, etc, to only spend ~90% as much time doing that and refocusing ~10% to ratfic instead, and maybe see if they can find it in themselves to want to shift more of their leisure time to that sort of passive/ambient/automatic self-improvement productivity.
This is actually one of the major reasons why 80k recommended information security as one of their top career areas; the other top career areas have pretty heavy switching costs and serious drawbacks if you end up not being a good fit e.g. alignment research, biosecurity, and public policy.
Cybersecurity jobs, on the other hand, are still booming, and depending on how security automation and prompt engineering goes, the net jobs lost by AI is probably way lower than other industries e.g. because more eyeballs might offer perception and processing power that supplement or augment LLMs for a long time, and more warm bodies means more attackers which means more defenders.
Monopolies are profitable and in that case the program would have more than paid for itself, but I probably should have mentioned that explicitly, since maybe someone could have objected that they could have been were more focused on mitigating risk of market share shrinking or accumulating power, instead of increasing profit in the long term. Maybe I fit too much into 2 paragraphs here.
Hm, that stuff seemed like cutting corners to me. Maybe I was poorly calibrated on this e.g. using a building next to the Amazon HQ was correctly predicted by operatives to be extremely low risk.
Thanks, I’ll look into this! Epistemics is difficult when it comes to publicly available accounts of intelligence agency operations, but I guess you could say the same for bigtech leaks (and the future of neurotoxin poisoning is interesting just for its own sake eg because lower effect strains and doses could be disguised as natural causes like dementia).
That’s interesting, what’s the point of reference that you’re using here for competence? I think stuff from eg the 1960s would be bad reference cases but anything more like 10 years from the start date of this program (after ~2005) would be fine.
You’re right that the leak is the crux here, and I might have focused too much on the paper trail (the author of the article placed a big emphasis on that).