Trialing for the machine learning living library position at MIRI and occasional volunteer instructor and mentor at CFAR.
Qiaochu_Yuan
This seems like a bad idea to me; I think people who are trying to have good ideas should develop courage instead. If you don’t have courage your ideas are being pushed around by fear in general, and asking for a particular source of that fear to be ameliorated will not solve the general problem.
Thanks for writing this up! I’m excited to see more people running experiments like this.
When you say “if I take X as a prompt, I get Y,” how many trials did you wait? In my own experimentation I’ve found lil’ GPT-2′s performance to be really variable across trials, and I’ve needed to wait 5 trials in some cases to get results I even sort of liked.
My sense overall of how lil’ GPT-2 functions after playing with it for awhile on several different kinds of prompts is that it has a strong sense of genre, and has done something like learned a bunch of different genre conventions for the different types of texts in the training set. If the prompt strongly resembles a genre familiar from training it will run with that genre, although sometimes it’ll wander off into another genre. It does quite poorly on prompts that I suspect don’t strongly match to a genre in the training set.
For example, I tried to run a Turing test (mostly as a joke) by prompting with “Question: what is your name? Answer:” and I got this on my ~2nd to 4th trial (don’t remember), with my speculations as to genre in [square brackets]:
Foulplay
I’m a Guardian Huntress, a deviation of the <Here> Monks keyword. I’m like a little “mechanic-creature hire”, but is Disassembler which is general clique-of-guy? I usually live in forms of Priests or Whispers of the Old Gods, but if it’s Asian Brotherhood of Magi..… [RP forum?]
deepmen
Reply ~29000 0 ~13 min 0 By TrueBeliever For MA teenage adventurer: Ooops It looks stupid but I’m given the wrong answer, and this logic is corrupt. Other first hand studies: sometimes your famed mechanical fiends attack other nvrock’s—I had such too—but you’re supposed to have flying paths to defend them would not it? This alien spawn is being unnecessary and a waste of time Man—you stole the Saalis… [gaming forum?]
Shadow Netherlands
Reply ~120200 0 ~30 min 0 By john : I know there’s no better examples of
a landing back off 2lands ( ou ever?) Low challenge difficulty
a lot of new cards in my deck,
which doesn’t draw cards: stacks high levels Some need life if w/o rest of deck already Defense Emperor’s | Inferno/Neck Scorer Necronomicon Mysticpetal Mana Enigma Mortalswords Mysticmist Scout Zarthai Sniper [MtG or Hearthstone forum?]
It’s a multiplier and it increases the yield on a lv301 fighter fighter, next to nothing it’s probably never in my deck or some random deck Dofrone’s | Boltforge Fastboat Sling dmt3 Efreet Flattestalker Infernal Eater Toxic Water Hurricane For another Holy orb suggested… [gaming forum? LoL?]
Was planning on posting a longer (mostly humorous) post with my own results but that post is low priority so I don’t know when it’s going to happen.
Thanks for writing this! I am very excited that this post exists. I think what this model suggests about procrastination and addiction alone (namely, that they’re things that managers and firefighters are doing to protect exiles) are already huge, and resonate strongly with my experience.
In the beginning of 2018 I experienced a dramatic shift that I still don’t quite understand; my sense of it at the time was that there was this crippling fear / shame that had been preventing me from doing almost anything, that suddenly lifted (for several reasons, it’s a long story). That had many dramatic effects, and one of the most noticeable ones was that I almost completely stopped wanting to watch TV, read manga, play video games, or any of my other addiction / procrastination behaviors. It became very clear that the purpose of all of those behaviors was numbing and distraction (“general purpose feeling obliterators” used by firefighters, as waveman says in another comment) from how shitty I felt all the time, and after the shift I basically felt so good that I didn’t want or need to do that anymore.
(This lasted for awhile but not forever; I crashed hard in September (long story again) before experiencing a very similar shift again a few weeks ago.)
Another closely related effect is that many things that had been too scary for me to think about became thinkable (e.g. regrettable dynamics in my romantic relationships), and I think this is a crucial observation for the rationality project. When you have exile-manager-firefighter dynamics going on and you don’t know how to unblend from them, you cannot think clearly about anything that triggers the exile, and trying to make yourself do it anyway will generate tremendous internal resistance in one form or another (getting angry, getting bored, getting sleepy, getting confused, all sorts of crap), first from managers trying to block the thoughts and then from firefighters trying to distract you from the thoughts. Top priority is noticing that this is happening and then attending to the underlying emotional dynamics.
- On Internal Family Systems and multi-agent minds: a reply to PJ Eby by Oct 29, 2019, 2:56 PM; 41 points) (
- Feb 27, 2019, 9:30 AM; 19 points) 's comment on Informal Post on Motivation by (
I like this reading and don’t have much of an objection to it.
This is a bad argument for transhumanism; it proves way too much. I’m a little surprised that this needs to be said.
Consider: “having food is good. Having more and tastier food is better. This is common sense. Transfoodism is the philosophy that we should take this common sense seriously, and have as much food as possible, as tasty as we can make it, even if doing so involves strange new technology.” But we tried that, and what happened was obesity, addiction, terrible things happening to our gut flora, etc. It is just blatantly false in general that having more of a good thing is better.
As for “common sense”: in many human societies it was “common sense” to own slaves, to beat your children, again etc. Today it’s “common sense” to circumcise male babies, to eat meat, to send people who commit petty crimes to jail, etc., to pick some examples of things that might be considered morally repugnant by future human societies. Common sense is mostly moral fashion, or if you prefer it’s mostly the memes that were most virulent when you were growing up, and it’s clearly unreliable as a guide to moral behavior in general.
Figuring out the right thing to do is hard, and it’s hard for comprehensible reasons. Value is complex and fragile; you were the one who told us that!
---
In the direction of what I actually believe: I think that there’s a huge difference between preventing a bad thing happening and making a good thing happen, e.g. I don’t consider preventing an IQ drop equivalent to raising IQ. The boy has had120 IQ his entire life and we want to preserve that, but the girl has had 110 IQ her entire life and we want to change that. Preserving and changing are different, and preserving vs. changing people in particular is morally complicated. Again the argument Eliezer uses here is bad and proves too much:
Either it’s better to have an IQ of 110 than 120, in which case we should strive to decrease IQs of 120 to 110. Or it’s better to have an IQ of 120 than 110, in which case we should raise the sister’s IQ if possible. As far as I can see, the obvious answer is the correct one.
Consider: “either it’s better to be male than female, in which case we should transition all women to men. Or it’s better to be female than male, in which case we should transition all men to women.”
---
What I can appreciate about this post is that it’s an attempt to puncture bad arguments against transhumanism, and if it had been written more explicitly to do that as opposed to presenting an argument for transhumanism, I wouldn’t have a problem with it.
This whole conversation makes me deeply uncomfortable. I expect to strongly disagree at pretty low levels with almost anyone else trying to have this conversation, I don’t know how to resolve those disagreements, and meanwhile I worry about people seriously advocating for positions that seem deeply confused to me and those positions spreading memetically.
For example: why do people think consciousness has anything to do with moral weight?
Relevant reading: gwern’s The Narrowing Circle. He makes the important point that moral circles have actually narrowed in various ways, and also that it never feels that way because the things outside the circle don’t seem to matter anymore. Two straightforward examples are gods and our dead ancestors.
Does anyone else get the sense that it feels vaguely low-status to post in open threads? If so I don’t really know what to do about this.
This makes sense, but I also want to register that I viscerally dislike “controlling the elephant” as a frame, in roughly the same way as I viscerally dislike “controlling children” as a frame.
Huh. Can you go into more detail about what you’ve done and how it’s helped you? Real curious.
I think the original mythology of the rationality community is based around cheat codes
A lot of the original mythology, in the sense of the things Eliezer wrote about in the sequences, is about avoiding self-deception. I continue to think this is very important but think the writing in the Sequences doesn’t do a good job of teaching it.
The main issue I see with the cheat code / munchkin philosophy as it actually played out on LW is that it involved a lot of stuff I would describe as tricking yourself or the rider fighting against / overriding the elephant, e.g. strategies like attempting to reward yourself for the behavior you “want” in order to fix your “akrasia.” Nothing along these lines, e.g. Beeminder, worked for me when I experimented with them, and the whole time my actual bottleneck was that I was very sad and very lonely and distracting myself from and numbing that (which accounted for a huge portion of my “akrasia,” the rest was poor health, sleep and nutrition in particular).
This question feels confused to me but I’m having some difficulty precisely describing the nature of the confusion. When a human programmer sets up an IRL problem they get to choose what the domain of the reward function is. If the reward function is, for example, a function of the pixels of a video frame, IRL (hopefully) learns which video frames human drivers appear to prefer and which they don’t, based on which such preferences best reproduce driving data.
You might imagine that with unrealistic amounts of computational power IRL might attempt to understand what’s going on by modeling the underlying physics at the level of atoms, but that would be an astonishingly inefficient way to reproduce driving data even if it did work. IRL algorithms tend to have things like complexity penalties to make it possible to select e.g. a “simplest” reward function out of the many reward functions that could reproduce the data (this is a prior but a pretty reasonable and justifiable one as far as I can tell) and even with large amounts of computational power I expect it would still not be worth using a substantially more complicated reward function than necessary.
IRL does not need to answer this question along the way to solving the problem it’s designed to solve. Consider, for example, using IRL for autonomous driving. The input is a bunch of human-generated driving data, for example video from inside a car as a human drives it or more abstract (time, position, etc.) data tracking the car over time, and IRL attempts to learn a reward function which produces a policy which produces driving data that mimics its input data. At no point in this process does IRL need to do anything like reason about the distinction between, say, the car and the human; the point is that all of the interesting variation in the data is in fact (from our point of view) being driven by the human’s choices, so to the extent that IRL succeeds it is hopefully capturing the human’s reward structure wrt driving at the intuitively obvious level.
In particular a large part of what is selecting the level at which to work is the human programmer’s choice of how to set up the IRL problem, in the selection of the format of the input data, the selection of the format of the reward function, and in the selection of the format of the IRL algorithm’s actions.
In any case, in MIRI terminology this is related to multi-level world models.
Thanks for the mirror! My recommendation is more complicated than this, and I’m not sure how to describe it succinctly. I think there is a skill you can learn through practices like circling which is something like getting in direct emotional contact with a group, as distinct from (but related to) getting in direct emotional contact with the individual humans in that group. From there you have a basis for asking yourself questions like, how healthy is this group? How will the health of the group change if you remove this member from it? Etc.
It also sounds like there’s an implicit thing in your mirror that is something like ”...instead of doing explicit verbal reasoning,” and I don’t mean to imply that either.
I appreciate the thought. I don’t feel like I’ve laid out my position in very much detail so I’m not at all convinced that you’ve accurately understood it. Can you mirror back to me what you think my position is? (Edit: I guess I really want you to pass my ITT which is a somewhat bigger ask.)
In particular, when I say “real, living, breathing entity” I did not mean to imply a human entity; groups are their own sorts of entities and need to be understood on their own terms, but I think it does not even occur to many people to try in the sense that I have in mind.
(For additional context on this comment you can read this FB status of mine about tribes.)
There’s something strange about the way in which many of us were trained to accept as normal that two of the biggest transitions in our lives—high school to college, college to a job—get packaged in with abandoning a community. In both of those cases it’s not as bad as it could be because everyone is sort of abandoning the community at the same time, but it still normalizes the thing in a way that bugs me.
There’s a similar normalization of abandonment, I think, in the way people treat break-ups by default. Yes, there are such things as toxic relationships, and yes, I want people to be able to just leave those without feeling like they owe their ex-partner anything if that’s what they need to do, but there are two distinct moves that are being bucketed here. I’ve been lucky enough to get to see two examples recently of what it looks like for a couple to break up without abandonment: they mutually decide that the relationship isn’t working, but they don’t stop loving each other at all throughout the process of getting out of the relationship, and they stay in touch with the emotional impact the other is experiencing throughout. It’s very beautiful and I feel a lot of hope that things can be better seeing it.
What I think I’m trying to say is that there’s something I want to encourage that’s upstream of all of your suggestions, which is something like seeing a community as a real, living, breathing entity built out of the connections between a bunch of people, and being in touch emotionally with the impact of tearing your connections away from that entity. I imagine this might be more difficult in local communities where people might end up in logistically important roles without… I’m not sure how to say this succinctly without using some Val language, but like, having the corresponding emotional connections to other community members that ought to naturally accompany those roles? Something like a woman who ends up effectively being a maid in a household without being properly connected to and respected as a mother and wife.
Yes, absolutely. This is what graduate school and CFAR workshops are for. I used to say both of the following things back in 2013-2014:
that nearly all of the value of CFAR workshops came from absorbing habits of thought from the instructors (I think this less now, the curriculum’s gotten a lot stronger), and
that the most powerful rationality technique was moving to Berkeley (I sort of still think this but now I expect Zvi to get mad at me for saying it).
I have personally benefited a ton over the last year and a half through osmosing things from different groups of relationalists—strong circling facilitators and the like—and I think most rationalists have a lot to learn in that direction. I’ve been growing increasingly excited about meeting people who are both strong relationalists and strong rationalists and think that both skillsets are necessary for anything really good to happen.
There is this unfortunate dynamic where it’s really quite hard to compete for the attention of the strongest local rationalists, who are extremely deliberate about how they spend their time and generally too busy saving the world to do much mentorship, which is part of why it’s important to be osmosing from other people too (also for the sake of diversity, bringing new stuff into the community, etc.).
I think your description of the human relationship to heroin is just wrong. First of all, lots of people in fact do heroin. Second, heroin generates reward but not necessarily long-term reward; kids are taught in school about addiction, tolerance, and other sorts of bad things that might happen to you in the long run (including social disapproval, which I bet is a much more important reason than you’re modeling) if you do too much heroin.
Video games are to my mind a much clearer example of wireheading in humans, especially the ones furthest in the fake achievement direction, and people indulge in those constantly. Also television and similar.
In particular, you shouldn’t force yourself to believe that you’re attractive.
And I never said this.
But there’s a thing that can happen when someone else gaslights you into believing that you’re unattractive, which makes it true, and you might be interested in undoing that damage, for example.
Glad to see you’re writing about this! I think motivation is a really central topic and there’s lots more to be said about it than has been said so far around here.
I think these days S1 and S2 have become semantic stopsigns, and in general I recommend that people stop using these terms both in their thinking and verbally, and instead try to get more specific about what parts of their mind actually disagree and why. I can report, for example, that CFAR doesn’t use these terms internally.
Anna Salamon used to say, in the context of teaching internal double crux at CFAR workshops, that there’s no such thing as an S1 vs. S2 conflict. All conflicts are “S1 vs. S1.” “Your S2,” whatever that means, may be capable of engaging in logical reasoning and having explicit verbal models about things, but the part of you that cares about the output of all of that reasoning is a part of your S1 (in my internal dialect, just “a part of you”), and you’ll make more progress once you start identifying what part that is.
---
Here’s an example of what getting more specific might look like. Suppose I’m a high school student and “my S1” says play video games and “my S2″ says do my homework. What is actually going on here?
One version could be that I know I get social reinforcement from my parents and my teachers to do homework, or more sinisterly that I get socially punished for not doing it. So in this case “my S2” is a stopsign blocking an inquiry into the power structure of school, and generally the lack of power children have in modern western society, which is both harder and less comfortable to think about than “akrasia.”
Another version is someone told me to do my homework so I’ll go to a good college so I’ll get a good job. In this case “my S2” is a stopsign blocking an inquiry into why I care about any of the nodes in this causal diagram—maybe I want to go to a good college because it’ll make me feel better about myself, maybe I want to get a good job to avoid disappointing my parents, etc.
That’s on the S2 side, but “my S1” is also blocking inquiry. Why do I want to play video games? Not “my S1,” just me; I can own that desire. There are obvious stories to tell about video games being more immediately pleasurable and addictive than most other things I could do, and those stories have some weight, but they’re also a distraction; much easier to think about than why I wouldn’t rather do anything else. In my actual experience, the times in my life I have played video games the most, the reasons were mostly emotional: I was really depressed and lonely and felt like a failure, and video games (and lots of other stuff) distracted me from feeling those things. Those feelings were very painful to think about, and that pain prevented me from even looking at the structure of this problem, let alone debugging it, for a long time.
(One sign I was doing this is that the video games I chose were not optimized for pleasure. I deliberately avoided video games that could be fun in a challenging way, because I didn’t want to feel bad about doing poorly at them. Another sign is that everything else I did was also chosen for its ability to distract: for example, watching anime (never live-action TV, too uncomfortably close to real life), reading fiction (never nonfiction, again too uncomfortably close to real life), etc.)
Strongly agree, except that I wouldn’t use the term “S2 goals.” That’s a stopsign. Again I suggest getting more specific: what part of you has those goals and why? Where did they come from?
If I understand correctly what you mean by this, I have a lot of thoughts about how to do this. The short, unsatisfying version, which will probably surprise no one, is “find out what you actually want by learning how to have feelings.”
The long version can be explained in terms of Internal Family Systems. The deal is that procrastinative behaviors like playing a lot of video games are evidence that you’re trying to avoid feeling a bad feeling, and that that bad feeling is being generated by a part of you that IFS calls an “exile.” Exiles are producing bad feelings in order to get you to avoid a catastrophic situation that resembles a catastrophic situation earlier in your life that you weren’t prepared for; for example, if you were overwhelmed by criticism from your parents as a child, you might have an exile that floods you with pain whenever people criticize you, especially people you really respect in a way that might cause you to project your parents onto them.
Exiles are paired with parts called protectors, whose job it is to protect exiles from being triggered. In the criticism example, that might look like avoiding people who criticize you, avoiding doing things you might get criticized for, or feeling sleepy or confused when someone manages to criticize you anyway.
Behavior that’s driven by exile / protector dynamics (approximately “red-brain, deficit-reduction,” if I understand you correctly) can become very dysfunctional, as the goal of avoiding psychological pain becomes a worse and worse proxy for avoiding bad situations. In extreme cases it can almost completely block your access to what you want, as that becomes less of a priority than avoiding pain. In the criticism example, you might be so paralyzed by the possibility that someone could criticize you for doing things that you stop doing anything.
There are lots of different ways to get exiles and protectors to chill the fuck out, and once they do you get to find out what you actually want when you aren’t just trying to avoid pain. It’s good times. See also my comment on Kaj’s IFS post.