Sorry, to amend my statement about “wasn’t aimed at raising the sanity waterline of eg millions of people, only at teaching smaller sets”:
Way back when Eliezer wrote that post, we really were thinking of trying to raise the rationality of millions, or at least of hundreds of thousands, via clubs and schools and things. It was in the inital mix of visions. Eliezer spent time trying to write a sunk costs unit that could be read by someone who didn’t understand much rationality themselves, aloud to a meetup, and could cause the meetup to learn skills. We imagined maybe finding the kinds of donors who donated to art museums and getting them to donate to us instead so that we could eg nudge legislation they cared about by causing the citizenry to have better thinking skills.
However, by the time CFAR ran our first minicamps in 2012, or conducted our first fundraiser, our plans had mostly moved to “teach those who are unusually easy to teach via being willing and able to pay for workshops, practice, care, etc”. I prefered this partly because I liked getting the money from the customers we were trying to teach, so that they’d be who we were responsible to (fewer principle agent problems, compared to if someone with a political agenda wanted us to make other people think better; though I admit this is ironic given I now think there were some problems around us helping MIRI and being funded by AI risk donors while teaching some rationality hobbyists who weren’t necessarily looking for that). I also prefered it because I thought we knew how to run minicamps that would be good, and I didn’t have many good ideas for raising the sanity waterline more broadly.
We did do nonzero attempts at sanity waterline more broadly: Julia’s book, as mentioned elsewhere, but also, we collaborated a bit on a rationality class at UC Berkeley, tried to prioritize workshop applicants who seemed likely to teach others well (including giving them more financial aid), etc.
AnnaSalamon
I agree, although I also think we ran with this where it was convenient instead of hashing it out properly (like, we asked “what can we say that’ll sound good and be true” when writing fundraiser posts, rather than “what are we up for committing to in a way that will build a high-integrity relationship with whichever community we actually want to serve, and will let any other communities who we don’t want to serve realize that and stop putting their hopes in us.”)
But I agree re: Julia.
I think many of us, during many intention-minutes, had fairly sincere goals of raising the sanity of those who came to events, and took many actions backchained from these goals in a fairly sensible fashion. I also think I and some of us worked to: (a) bring to the event people who were unusually likely to help the world, such that raising their capability would help the world; (b) influence people who came to be more likely to do things we thought would help the world; and (c) draw people into particular patterns of meaning-making that made them easier to influence and control in these ways, although I wouldn’t have put it that way at the time, and I now think this was in tension with sanity-raising in ways I didn’t realize at the time.
I would still tend to call the sentence “we were trying to raise the sanity waterline of smart rationality hobbyists who were willing and able to pay for workshops and do practice and so on” basically true.
I also think we actually helped a bunch of people get a bunch of useful thinking skills, in ways that were hard and required actual work/iteration/attention/curiosity/etc (which we put in, over many years, successfully).
IMO, our goal was to raise the sanity of particular smallish groups who attended workshops, but wasn’t very much to have effects on millions or billions (we would’ve been in favor of that, but most of us mostly didn’t think we had enough shot to try backchaining from that). Usually when people say “raise the sanity waterline” I interpret them as discussing stuff that happens to millions.
I agree the “tens of thousands” in the quoted passage is more than was attending workshops, and so pulls somewhat against my claim.I do think our public statements were deceptive, in a fairly common but nevertheless bad way, in that we had many conflicting visions, tended to avoid contradicting people who thought we were gonna do all the good things that at least some of us had at least some desire/hope to do, and we tended in our public statements/fundraisers to try to avoid alienating all those hopes, as opposed to the higher-integrity / more honorable approach of trying to come to a coherent view of which priorities we prioritized how much and trying to help people not have unrealistic hopes in us, and not have inaccurate views of our priorities.
I agree
I agree with the sentence you quote from Vervaeke (“[myths] are symbolic stories of perennial patterns that are always with us”) but mostly-disagree with “myths … encapsulate some eternal and valuable truths” (your paraphrase).
As an example, let’s take the story of Cain and Abel. IMO, it is a symbolic story containing many perennial patterns:When one person is praised, the not-praised will often envy them
Brothers often envy each other
Those who envy often act against those they envy
Those who envy, or do violence, often lie about it (“Am I my brother’s keeper?”)
Those who have done endured strange events sometimes have a “mark of Cain” that leads others to stay at a distance from them and leave them alone
I suspect this story and its patterns (especially back when there were few stories passed down and held in common) helped many to make conscious sense of what they were seeing, and to share their sense with those around them (“it’s like Cain and Abel”). But this help (if I’m right about it) would’ve been similar to the way words in English (or other natural languages) help people make conscious sense of what they’re seeing, and communicate that sense—myths helped people have short codes for common patterns, helped make those patterns available for including in hypotheses and discussions. But myths didn’t much help with making accurate predictions in one shot, the way “eternal and valuable truths” might suggest.
(You can say that useful words are accurate predictions, a la “cluster structures in thingspace”. And this is technically true, which is why I am only mostly disagreeing with “myths encapsulate some eternal and valuable truths”. But a good word helps differently than a good natural law or something does).
To take a contemporary myth local to our subculture: I think HPMOR is a symbolic story that helps make many useful patterns available to conscious thought/discussion. But it’s richer as a place to see motifs in action (e.g.
the way McGonagal initially acts the picture of herself who lives in her head; the way she learns to break her own bounds
) than as a source of directly stateable truths.
A friend recently complained to me about this post: he said most people do much nonsense under the heading “belief”, and that this post doesn’t acknowledge this adequately. He might be right!
Given his complaint, perhaps I ought to say clearly:
1) I agree — there is indeed a lot of nonsense out there masquerading as sensible/useful cognitive patterns. Some aimed to wirehead or mislead the self; some aimed to deceive others for local benefit; lots of it simple error.
2) I agree also that a fair chunk of nonsense adheres to the term “belief” (and the term “believing in”). This is because there’s a real, useful pattern of possible cognition near our concepts of “belief”, and because nonsense (/lies/self-deception/etc) likes to disguise itself as something real.
3) But — to sort sense from nonsense, we need to understand what the real (useful, might be present in the cogsci books of alien intelligences) pattern is, that is near our “beliefs”. If we don’t:
a) We’ll miss out on a useful way to think. (This is the biggest one.)
b) The parts of the {real, useful way to think} that fall outside our conception of “beliefs” will be practiced noisily anyway, sometimes; sometimes in a true fashion, sometimes mixed (intentionally or accidentally) with error or locally manipulations. We won’t be able to excise these deceptions easily or fully, because it’ll be kinda clear there’s something to real nearby that our concept of “beliefs” doesn’t do justice to, and so people (including us) will not wish to adhere entirely to our concept of “beliefs” in lieu of the so-called “nonsense” that isn’t entirely nonsense. So it’ll be harder to expel actual error.
4) I’m pretty sure that LessWrong’s traditional concept of “beliefs” as “accurate Bayesian predictions about future events” is only half-right, and that we want the other half too, both for (3a) type reasons, and for (3b) type reasons.
a) “Beliefs” as accurate Bayesian predictions is exactly right for beliefs/predictions about things unaffected by the belief itself — beliefs about tomorrow’s weather, or organic chemistry, or the likely behavior of strangers.
b) But there’s a different “belief-math” (or “believing-in math”) that’s relevant for coordinating pieces of oneself in order to take a complex action, and for coordinating multiple people so as to run a business or community other collaborative endeavor. I think I lay it out here (roughly — I don’t have all the math), and I think it matters.
The old LessWrong Sequences-reading crowd *sort of* knew about this — folks talked about how beliefs about matters directly affected by the beliefs could be self-fulfilling or self-undermining prophecies, and how Bayes-math wasn’t defined around here. But when I read those comments, I thought they were discussing an uninteresting edge case. The idioms by which we organize complex actions (within a person, and between people) are part of the bread and butter of how intelligence works; they are not an uninteresting edge case.
Likewise, people talked sometimes (on LW in the past) about they were intentionally holding false beliefs about their start-ups’ success odds; and they were advised not to be clever, and some commenters dissented from this advice. But IMO the “believing in” concept lets us distinguish:
(i) the useful thing such CEOs were up to (holding a target, in detail, that they and others can coordinate action around);
(ii) how to do this without having or requesting false predictions at the same time; and
(iii) how sometimes such action on the part of CEOs/etc is basically “lying” (and “demanding lies”), in the sense that it is designed to extract more work/investment/etc from “allies” than said allies would volunteer if they understood the process generating the CEOs behavior (and demand that their team members are similarly deceptive/extractive). And sometimes it’s not. And there are principles for telling the difference.
All of which is sort of to say that I think this model of “believing in” has substance we can use for the normal human business of planning actions together, and isn’t merely propaganda to mislead people into thinking human thinking bugs are less buggy than they are. Also I think it’s as true to the normal English usage of “believing in” as the historical LW usage of “belief” is to the normal English usage of “belief”.
Elaborating Plex’s idea: I imagine you might be able to buy into participation as an SFF speculation granter with $400k. Upsides:
(a) Can see a bunch of people who’re applying to do things they claim will help with AI safety;
(b) Can talk to ones you’re interested in, as a potential funder;
(c) Can see discussion among the (small dozens?) of people who can fund SFF speculation grants, see what people are saying they’re funding and why, ask questions, etc.
So it might be a good way to get the lay of the land, find lots of people and groups, hear peoples’ responses to some of your takes and see if their responses make sense on your inside view, etc.
I’m tempted to argue with / comment on some bits of the argument about “Instrumental goals are almost-equally as tractable as terminal goals.” But when I click on the “comment” button, it removes the article from view and prompts me with “Discuss the wikitag on this page. Here is the place to ask questions and propose changes.”
Is there a good way to comment on the article, rather than the tag?
I got to the suggestion by imagining: suppose you were about to quit the project and do nothing. And now suppose that instead of that, you were about to take a small amount of relatively inexpensive-to-you actions, and then quit the project and do nothing. What’re the “relatively inexpensive-to-you actions” that would most help?
Publishing the whole list, without precise addresses or allegations, seems plausible to me.
I guess my hope is: maybe someone else (a news story, a set of friends, something) would help some of those on the list to take it seriously and take protective action, maybe after awhile, after others on the list were killed or something. And maybe it’d be more parsable to people if had been hanging out on the internet for a long time, as a pre-declared list of what to worry about, with visibly no one being there to try to collect payouts or something.
Maybe some of those who received the messages were more alert to their surroundings after receiving it, even if they weren’t sure it was real and didn’t return the phone/email/messages?
I admit this sounds like a terrible situation.
Gotcha. No idea if this is a good or bad idea, but: what are your thoughts on dumping an edited version of it onto the internet, including names, photos and/or social media links, and city/country but not precise addresses or allegations?
Can you notify the intended victims? Or at least the more findable intended victims?
A man being deeply respected and lauded by his fellow men, in a clearly authentic and lasting way, seems to be a big female turn-on. Way way way bigger effect size than physique best as I can tell.
…but the symmetric thing is not true! Women cheering on one of their own doesn’t seem to make men want her more. (Maybe something else is analogous, the way female “weight lifting” is beautification?)
My guess at the analogous thing: women being kind/generous/loving seems to me like a thing many men have found attractive across times and cultures, and seems to me far more viable if a woman is embedded in a group who recognize her, tell her she is cared about and will be protected by a network of others, who in fact shield her from some kinds of conflict/exploitation, who help there be empathy for her daily cares and details to balance out the attentional flow of these she gives to others, etc. So the group plays a support role in a woman being able to have/display the quality.
Steven Brynes wrotes:
“For example, I expect that AGIs will be able to self-modify in ways that are difficult for humans (e.g. there’s no magic-bullet super-Adderall for humans), which impacts the likelihood of your (1a).”
My (1a) (and related (1b)), for reference:
(1a) “You” (the decision-maker process we are modeling) can choose anything you like, without risk of losing control of your hardware. (Contrast case: if the ruler of a country chooses unpopular policies, they are sometimes ousted. If a human chooses dieting/unrewarding problems/social risk, they sometimes lose control of themselves.)
1b) There are no costs to maintaining control of your mind/hardware. (Contrast case: if a company hires some brilliant young scientists to be creative on its behalf, it often has to pay a steep overhead if it additionally wants to make sure those scientists don’t disrupt its goals/beliefs/normal functioning.)
I’m happy to posit an AGI with powerful ability to self-modify. But, even so, my (nonconfident) guess is that it won’t have property (1a), at least not costlessly.
My admittedly handwavy reasoning:Self-modification doesn’t get you all powers: some depend on the nature of physics/mathematics. E.g. it may still be that verifying a proof is easier than generating a proof, for our AGI.
Intelligence involves discovering new things, coming into contact with what we don’t specifically expect (that’s why we bother to spend compute on it). Let’s assume our powerful AGI is still coming into contact with novel-to-it mathematics/empirics/neat stuff. Questions are: is it (possible at all / possible at costs worth paying) to anticipate enough about what it will uncover that it can prevent the new things from destablilizing its centralized goals/plans/[“utility function” if it has one]? I… am really not sure what the answers to these questions are, even for powerful AGI that has powerfully self-modified! There are maybe alien-to-it AGIs out there encoded in mathematics, waiting to boot up within it as it does its reasoning.
I just paraphrased the OP for a friend who said he couldn’t decipher it. He said it helped, so I’m copy-pasting here in case it clarifies for others.
I’m trying to say:A) There’re a lot of “theorems” showing that a thing is what agents will converge on, or something, that involve approximations (“assume a frictionless plane”) that aren’t quite true.
B) The “VNM utility theorem” is one such theorem, and involves some approximations that aren’t quite true. So does e.g. Steve Omohundro’s convergent instrumental drives, the “Gandhi folk theorems” showing that an agent will resist changes to its utility function, etc.
C) So I don’t think the VNM utility theorem means that all minds will necessarily want to become VNM agents, nor to follow instrumental drives, nor to resist changes to their “utility functions” (if indeed they have a “utility function”).
D) But “be a better VNM-agent” “follow the instrumental Omohundro drives” etc. might still be a self-fulfilling prophecy for some region, partially. Like, humans or other entities who think its rational to be VNM agents might become better VNM agents, who might become better VNM agents, for awhile.
E) And there might be other [mathematically describable mind-patterns] that can serve as alternative self-propagating patterns, a la D, that’re pretty different from “be a better VNM-agent.” E.g. “follow the god of nick land”.
F) And I want to know what are all the [mathematically describable mind-patterns, that a mind might decide to emulate, and that might make a kinda-stable attractor for awhile, where the mind and its successors keeps emulating that mind-pattern for awhile]. They’ll probably each have a “theorem” attached that involves some sort of approximation (a la “assume a frictionless plane”).
There is a problem that, other things equal, agents that care about the state of the world in the distant future, to the exclusion of everything else, will outcompete agents that lack that property. This is self-evident, because we can operationalize “outcompete” as “have more effect on the state of the world in the distant future”.
I am not sure about that!
One way this argument could fail: maybe agents who care exclusively about the state of the world in the distant future end up, as part of their optimizing, creating other agents who care in different ways from that.
In that case, they would “have more effect on the state of the world in the distant future”, but they might not “outcompete” other agents (in the common-sensical way of understanding “outcompete”).
A person might think this implausible, because they might think that a smart agent who cares exclusively about X can best achieve X by having all minds they create also be [smart agents who care exclusively about X.
But, I’m not sure this is true, basically for reasons of not trusting assumptions (1), (2), (3), and (4) that I listed here.
(As one possible sketch: a mind whose only goal is to map branch B of mathematics might find it instrumentally useful to map a bunch of other branches of mathematics. And, since supervision is not free, it might be more able to do this efficiently if it creates researchers who have an intrinsic interest in math-in-general, and who are not being fully supervised by exclusively-B-interested minds.)
or more centrally, long after I finish the course of action.
I don’t understand why the more central thing is “long after I finish the course of action” as opposed to “in ways that are clearly ‘external to’ the process called ‘me’, that I used to take the actions.”
Thanks; fixed.
I suspect “friendships” form within the psyche, and are part of how we humans (and various other minds, though maybe not all minds, I’m not sure) cohere into relatively unified beings (after being a chaos of many ~subagents in infancy). Insofar as that’s true of a given mind, it may make it a bit easier for kindness to form between it and outside minds, as it will already know some parts of how friendships can be done.