LessWrong team member / moderator. I’ve been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I’ve been interested in improving my own epistemic standards and helping others to do so as well.
Raemon
(my guess is you took more like 15-25 minutes per question? Hard to tell from my notes, you may have finished early but I don’t recall it being crazy early)
[Question] What are the most interesting / challenging evals (for humans) available?
(This seems like more time than Buck was taking – the goal was to not get any wrong so it wasn’t like people were trying to crank through them in 7 minutes)
The problems I gave were (as listed in the csv for the diamond problems)
#1 (Physics) (1 person got right, 3 got wrong, 1 didn’t answer)
#2 (Organic Chemistry), (John got right, I think 3 people didn’t finish)
#4 (Electromagnetism), (John and one other got right, 2 got wrong)
#8 (Genetics) (3 got right including John)
#10 (Astrophysics) (5 people got right)
I at least attempted to be filtering the problems I gave you for GPQA diamond, although I am not very confident that I succeeded.
(Update: yes, the problems John did were GPQA diamond. I gave 5 problems to a group of 8 people, and gave them two hours to complete however many they thought they could complete without getting any wrong)
ReSolsticed vol I: “We’re Not Going Quietly”
I like all these questions. “Maybe you should X” is least likely to be helpful but still fine so long as “nah” wraps up the thread quickly and we move on. The first three are usually helpful (at least filtered for assistants who are asking them fairly thoughtfully)
I imagined “FocusMate + TaskRabbit” specifically to address this issue.
Three types of workers I’m imagining here:
People who are reasonable skilled types, but who are youngish and haven’t landed a job yet.
People who actively like doing this sort of work and are good at it
People who have trouble getting/keeping a fulltime job for various reasons (which would land them in the “unreliable” sector), but… it’s FocusMate/TaskRabbit, they don’t need to be reliable all the time, there just needs to be one of them online who responds to you within a few hours, who is at least reasonably competent when they’re sitting down and paying attention.
And then there are reviews (which I somehow UI design to elicit honest reactions, rather than just slapping a 0-5 stars rating which everyone feels obligated to rate “5″ all the time unless something was actively wrong”), and they have profiles about what they think they’re good at and what others thought they were good at.
(where an expectation is, if you don’t have active endorsementss, if you haven’t yet been rated you will probably charge a low rate)
Meanwhile if you’re actively good and actively reliable, people can “favorite” you and work out deals where you commit to some schedule.
(Quick note to people DMing me, I’m doing holidays right now and will followup in a week or so. I won’t necessarily have slots/need for everyone expressing interest)
Can you say more details about how this works (in terms of practical steps) and how it went?
I actually meant to say “x-risk focused individuals” there (not particularly researchers), and yes was coming from the impact side of things. (i.e. if you care about x-risk, one of the options available to you is to becoming a thinking assistant).
Hire (or become) a Thinking Assistant / Body Double
I’d like to hire cognitive assistants and tutors more often. This could (potentially) be you, or people you know. Please let me know if you’re interested or have recommendations.
By “cognitive assistant” I mean a range of things, but the core thing is “sit next to me, and notice when I seem like I’m not doing the optimal thing, and check in with me.” I’m interested in advanced versions who have particular skills (like coding, or Applied Quantitivity, or good writing, or research taste) who can also be tutoring me as we go.
I’d like a large rolodex of such people, both for me, and other people I know who could use help. Let me know if you’re interested.
I was originally thinking “people who live in Berkeley” but upon reflection this could maybe be a remote role.
Yep, endorsed. One thing I would add: the “semi-official” dresscode I’ve been promoting explicitly includes black (for space/darkness), silver (for stars), gold (for the sun), and blue (for the earth).
(Which is pretty much what you have here, I think the blue works best when it is sort of a minority-character distributed across people, such that it’s a bit special when you notice it)
The complaints I remember about this post seem mostly to be objecting to how some phrases were distilled into the opening short “guideline” section. When I go reread the details it mostly seems fine. I have suggestions on how to tweak it.
(I vaguely expect this post to get downvotes that are some kind of proxy for vague social conflict with Duncan, and I hope people will actually read what’s written here and vote on the object level. I also encourage more people to write up versions of The Basics of Rationalist Discourse as they seem them)
The things I’d want to change are:
1. Make some minor adjustments to the “Hold yourself to the absolute highest standard when directly modeling or assessing others’ internal states, values, and thought processes.” (Mostly, I think the word “absolute” is just overstating it. “Hold yourself to a higher standard” seems fine to me. How much higher-a-standard depends on context)
2. Somehow resolve an actual confusion I have with the ”...and behave as if your interlocutors are also aiming for convergence on truth” clause. I think this is doing important, useful work, but a) it depends on the situation, b) it feels like it’s not quite stating the right thing.
Digging into #2...
Okay, so when I reread the detailed section, I think I basically don’t object to anything. I think the distillation sentence in the opening paragraphs conveys a thing that a) oversimplifies, and b) some people have a particularly triggered reaction to.
The good things this is aiming for that I’m tracking:
Conversations where everyone trusts that each other are converging on truth are way less frictiony than ones where everyone is mistrustful and on edge about it.
Often, even when the folk you’re talking to aren’t aiming for convergence on truth, proactively acting as if they are helps make it more true. Conversational vibes are contagious.
People are prone to see others’ mistakes as more intense than their own mistakes, and if most humans aren’t specifically trying to compensate for this bias, there’s a tendency to spiral into a low-trust conversation unnecessarily (and then have the wasted motion/aggression of a low-trust conversation instead of a medium-or-high one).
I think maybe the thing I want to replace this with is more like “aim for about 1-2 levels more trusting-that-everyone-is-aiming-for-truth than currently feel warranted, to account for your own biases, and to lead by example in having the conversation focus on truth.” But I’m not sure if this is quite right either.
...
This post came a few months before we created our New User Reject Template system. It should have at least occurred to me to use some of the items here as some of the advice we have easily-on-hand to give to new users (either as part of a rejection notice, or just “hey, welcome to LW but it seems like you’re missing some of the culture here.”
If this post was voted in the Top 50, and a couple points were resolved, I’d feel good making a making a fork with minor context-setting adjustments and then linking to it as a moderation resource), since I’d feel like The People had a chance to weigh in on it.
The context-setting I’m imagining is not “these are the official norms of LessWrong”, but, if I think a user is making a conversation worse for reasons covered in this post, be more ready to link to this post. Since this post came out, we’ve developed better Moderator UI for sending users comments on their comments, and it hadn’t occurred to me until now to use this post as reference for some of our Stock Replies.
(Note: I currently plan to make it so, during the Review, anyone write Reviews on a post even if normally blocked on commenting. Ideally I’d make it so they can also comment on Review comments. I haven’t shipped this feature yet but hopefully will soon)
Previously, I think I had mostly read this through the lens of “what worked for Elizabeth?” rather than actually focusing on which of this might be useful to me. I think that’s a tradeoff on the “write to your past self” vs “attempt to generalize” spectrum – generalizing in a useful way is more work.
When I reread it just now, I found the “Ways to Identify Fake Ambition” the most useful section (both for the specific advice of “these emotional reactions might correspond to those motivations”, and the meta-level advice of “check for your emotional reactions and see what they seem to be telling you.”
I’d kinda like to see a post that is just that section, with a bit of fleshing out to help people figure out when/why they should check for fake ambition (and how to relate to it). I think literally a copy-paste version would be pretty good, and I think there’s a more (well, um) ambitious version that does more interviewing with various people and seeing how the advice lands for them.I might incorporate this section more directly into my metastrategy workshops.
Well to be honest in the future there is probably mostly an AI tool that just beams wisdom directly into your brain or something.
I wrote about 1⁄3 of this myself fyi. (It was important to me to get it to a point where it was not just a weaksauce version of itself but where I felt like I at least might basically endorse it and find it poignant as a way of looking at things)
One way I parse this is “the skill of being present (may be) about untangling emotional blocks that prevent you from being present, more than some active action you take.”
It’s not like entangling emotional blocks isn’t tricky!
I don’t have a strong belief that this experience won’t generalize, but, I want to flag the jump between “this worked for me” and an implied “this’ll work for everyone/most-people.” (I expect most people would benefit from hearing this suggestion, just generally have a yellow-flag about some of the phrasings you have here)
Clarification (I’ll add this to the OP):
The ideal that I’m looking for are things that will take a smart researcher (like 95th percentile alignment researcher, i.e. there are somewhere between 10-30 people who might count) at least 30 minutes to solve the problem, and most alignment researchers maybe would have a 50% change of figuring it out in 1-3 hours.
The ideal is that people have to:
a) go through a period of planning, and replanning
b) spend at least some time feeling like the problem is totally opaque and they don’t have traction.
c) have to reach for tools that they don’t normally reach for.
It may be that we just don’t have evals at this level yet, and I might take what I can get, but, it’s what I’m aiming for.
I’m not trying to make an IQ test – my sense from the literature is that you basically can’t raise IQ through training. So many people have tried. This is very weird to me – subjectively it is just really obvious to me that I’m flexibly smarter in many ways than I was in 2011 when I started the rationality project, and this is due to me having a lot of habits I didn’t used to have. The hypotheses I currently have are:
You just have to be really motivated to do transfer learning, and a genuinely inspiring / good teacher, and it’s just really hard to replicate this sort of training scientifically
IQ is mostly measuring “fast intelligence”, because that’s what cost-effective to measure in large enough quantities to get a robust sample. i.e. it measures whether you can solve questions in like a few minutes which mostly depends on you being able to intuitively get it. It doesn’t measure your ability to figure out how to figure something out that requires longterm planning, which would allow a lot of planning skills to actually come into play.
Both seem probably at least somewhat true, but the latter one feels like a clearer story for why there would be potential (at least theoretically) in the space I’m exploring – IQ test take a few hours to take. It would be extremely expensive to do the theoretical statistically valid version of the thing I’m aiming at.
My explicit goal here is to train researchers who are capable of doing the kind of work necessary in worlds where Yudkowsky is right about the depth/breadth of alignment difficulty.