Trying to become stronger.
hath
Also this comment:
Eliezer, do you have any advice for someone wanting to enter this research space at (from your perspective) the eleventh hour?
I don’t have any such advice at the moment. It’s not clear to me what makes a difference at this point.
If you didn’t already try, I bet Lightcone would let you post more if you asked over Intercom.
Thank you so much! Fixed.
(although, measuring impact on alignment to that degree might be of a similar difficulty as actually solving alignment).
Sure, but it’s dignity in the specific realm of “facing unaligned AGI knowing we did everything we could”, not dignity in general.
Do you have any ideas for how to go about measuring dignity?
I mean this completely seriously: now that MIRI has changed to the Death With Dignity strategy, is there anything that I or anyone on LW can do to help with said strategy, other than pursue independent alignment research? Not that pursuing alignment research is the wrong thing to do, just that you might have better ideas.
My inner Professor Quirrell is currently saying that if someone did have a moral policy in which animals had little-to-no value, they probably wouldn’t abuse their pets where we could see; it’d be as if someone had read Snuff and thought “That man was a fool. He shouldn’t have done that in public, because look what happened to him.” Someone who really didn’t care about animals in the slightest would still probably act like a normal member of society and just avoid interacting with animals whenever possible, because seeming like a stereotypical villain is going to be counterproductive for achieving your desires.
Wow. I have this strange feeling that someday, someone is going to look at the above paragraph and say “hath, you condone animal abuse?” or something to that effect. hopefully that doesn’t happen.
(there’s also a level here of “i have no idea how to handle this situation/dynamic”, and if you think I did something wrong either in the events described in these posts or by posting this, feel free to tell me i’m an idiot and that I should’ve done something different)
...I forgot about the annual review. I think I’ll just say that doesn’t count, and also commit to no more changes of the conditions.
EDIT: actually, just going to kill the market.
Created a market on Manifold to see if either today’s GoodHeart system will last past today, or else if LW will try financial rewards for posting in 2022.
It’s really interesting seeing the change in attitude toward low-effort asking-for-money posts. Earlier, people upvoted/put up with them; now people are actively punishing bullshit with strong downvotes. This is good for LW implementing monetary incentives in the future; we can punish Goodharters ourselves.
I’ve been working on setting up a TED talk at my high school, and since the beginning have been planning on asking for speakers through a post here. However, the day that we finally finished the website, and I can finally post here about it, is… when we’re doing this whole GoodHeart thing. Not sure whether I should publish it today or tomorrow. (Pros: money. Cons: possibly fewer views because of everything else posted today.) What do you all think?
This book occupies the same genre as The Theory And Practice of Oligarchial Collectivism, though I’m not sure what to call that genre. Thank you so much. Would you recommend the longer book?
I think that was part of the whole “haha goodhart’s law doesn’t exist, making value is really easy” joke. However, it’s also possible that that’s… actually one of the hard-to-fake things they’re looking for (along with actual competence/intelligence). See PG’s Mean People Fail or Earnestness. I agree that “just give good money to good people” is a terrible idea, but there’s a steelman of that which is “along with intelligence, originality, and domain expertise, being a Good Person (whatever that means) and being earnest is a really good trait in EA/LW and the world at large, and so we should try and find people who are Good and Earnest, to whatever extent that we can make sure that isn’t Goodharted .”
(I somewhat expect someone at LW to respond to this saying “no, the whole goodness thing was a joke”)
As a follow up: There have been a couple incidents with said teacher trying to assert authority and win debates over, like, actually listening to her students. Today, we had a quiz on 1984. When, during the allotted study time beforehand, students started to go over the material with each other, the teacher told everyone that this was a silent study time; after the quiz, she expanded on this, mentioning a story she had told earlier in the year. It was a story of how a student who had helped their friend on a quiz was rejected by a college the friend was accepted to; the moral from this that she repeated throughout the year was “Your peers are your enemies. You should not help them, because that just actively hurts you in college admissions. Also, let’s be real, helping them in this way before the quiz, telling them the answers, is cheating. So, don’t help your fellow students; it’s cheating, and it only hurts you.” I pointed out that a former teacher of mine had lamented grading on a curve strictly because it makes them see their fellow students as competitors instead of friends and allies, and that her argument proved too much; under that, helping other students study in any way counte—she interrupted me, saying that I was equivocating between helping and cheating; when I tried to explain myself she shut me down, saying “You don’t want to argue with me about this.” (in an earlier conversation, she attributed her aptitude in this to doing debate.)
Another relevant time was when, when at one point I misspoke during a debate, repeatedly said “But you said X!” in response to me. “I don’t believe that, either you misheard me or I misspoke.” “You said X!” “You are purposefully misinterpreting my words.” “I’m just saying back what you said!” “You aren’t being at all charitable.” “I’m just saying what you said!”
The point here is that, repeatedly, she’s only cared about asserting authority rather than listening or being a charitable debate partner. It’s not fun to be effectively shamed in front of the class without a valid chance to defend myself, and I already feel that impacting my decisions now; if I cared more about what the people in my classes thought, I’d never have spoken up in the first place. Maybe that’s why nobody else does.
(dialogue reconstructed as well as I can remember it)
For once, I actually cared about what we were doing in English. For our final essay on Macbeth I wrote 1260 words on Duncan’s choices through the play, analyzing if he could have made better decisions given the information that he had, and trying to see whether his decisions would have worked out well if not for the supernatural occurrences of the play. This was a couple weeks after my English teacher had talked to me and told me that I wasn’t putting enough effort into her class, and that I was doing significantly worse than I could be. At the time, I wasn’t completely sure whether this was due to my own laziness or whether it was that the topics we wrote about in that class (essays on symbolism in Lord Of The Flies and other things of roughly equal uselessness) were bullshit, and not relevant to actually learning to write (see everything Paul Graham has ever written) but I gave her the benefit of the doubt, and resolved to try harder on the next essay.
She gave me a C-. Had her criticism been that the essay was poorly-written or rambling, I would have been okay with that; instead, her issue with it was that it was off-prompt^[1]. I had written about something I actually cared about for only the second time in the six months of that class, and she had rejected it for not specifically answering the prompt. In her feedback, she helpfully explained what she wanted me to write instead:
This is not about blame. This is a cause/effect essay.
Duncan decides X.
X, therefore Y.
This illustrates the theme of Z.
I’m grateful to her for spelling out exactly what she wanted, because otherwise I may have been under the impression that she wanted me to actually do something interesting. I talked to her about it in person. “hath, your issue is that every time you’re given an assignment, you want to talk about the philosophy of it. You need to learn to follow directions. If you have a job, and someone gives you the assignment to write a report, and you instead say ‘I’m not sure if writing this report is the best thing to do’, you’re going to end up jobless.” I disagreed with everything she had said (I wouldn’t call “trying to make better decisions” philosophy, I have had an internship in which the person said specifically “If you come up with better ideas for assignments than what I give you, do those instead”, and I don’t plan to work in the type of job where conformity is useful), but instead said this: “I grant that there is some value to conformism and respect to authority in certain environments.” “I’m not teaching conformism.” At this point, I honestly didn’t see any point in continuing this discussion (we had had talks like this for the past few months with zero change), and left. Not sure if I should have left, but I don’t necessarily regret it. I’ve been trying since the start of the year to do independent work on writing I actually enjoy, but every time she shoots me down and tells me not to tell her how to teach.
Yes, there is a reason to actually follow directions in this setting, but I would say that forcing students to write specifically one type of formulaic paper that doesn’t have any other relevance is actively counterproductive to learning the skill of writing well. Furthermore, every other English teacher I’ve had would have gladly accepted the essay I wrote, and not rejected it solely based on it being adjacent to the prompt instead of exactly following the prompt. Throughout high school, I’ve had to optimize for getting a good grade over actually learning, and don’t look forward to doing it more. At this point, the point of getting good grades is to signal to my parents that I’m capable of doing non-school things, because they have made it abundantly clear that putting effort into my schoolwork is the only way that they can be persuaded into allowing me to do independent study of any type, or anything resembling a nontraditional path. (or even to apply to fellowships).
This was the prompt:
Macbeth examines the difficulty of ruling: who gets to make important decisions, what should be the course of action to take, and what will be the impact? In a well-written essay, identify two such decisions in Macbeth made by a ruler (Duncan, Macbeth, etc), and through the lens of one of the themes of the play, evaluate how those decisions contribute to the profound personal and societal consequences that follow. Make sure you illustrate causality (cause/effect) in this essay.
I didn’t realize that this was a test-bed for seeing if monetary value would actually work; everything I post on this account from now on will be actual effort posts that I would post normally. I really don’t want to be the type of person who sees the Petrov Day button and automatically tries a bunch of random passwords, in the process ruining it for everyone. I also think that monetary rewards for LW posts might be a good idea, and want to help operating by the decision-theory of whatever would work well in an actual LW environment with financial rewards; this means passing up the (12 eligible comments + however many other comments I make)*2 dollars which would result from me strong upvoting every comment w/ the test account I made yesterday to try and debug something. (posting this is also me committing myself to NOT doing that).
I guess this means it’s actually a test we’re supposed to pass. I have a feeling that there’s going to be some confusion over “what game are we really playing”, like with Petrov Day. Oh. It’s to see if we could actually, long-term, reward LW participation with money. Guess I’ll just focus on actually writing posts.
Man, this is the worst time for me to have only skimmed one of those posts.
Are we changing from “payment sent every day at midnight” to “payment sent at end of week”?