I think the driving motivator for seeking out high variance in groups of people to interact with is an implicit belief that my value system is malleable and a stong form of modesty concerning my beliefs about what values I should have. Over time I realized that my value system isn’t really all that malleable and my intuitions about it are much more reliable indicators than observing a random sample of people, therefore a much better strategy for fulfilling goals set by those values is to associate with people who share them.
tristanm
If the latter gets too large, then you start getting swarmed with people who want money and prestige but don’t necessarily understand how to contibute, who are incentivized to degrade the signal of what’s actually important.
During this decade the field of AI in general became one of the most prestigious and high-status academic fields to work in. But as far as I can tell, it hasn’t slowed down the rate of progress in advancing AI capability. If anything, it has sped it up—by quite a bit. It’s possible that a lot of newcomers to the field are largely driven by the prospect of status gain and money. And there are quite a few “AI” hype-driven startups that have popped up and seem doomed to fail, but despite this, it doesn’t seem to be slowing the pace of the most productive research groups. Maybe the key here is that if you suddenly increase the prestige of a scientific field by a dramatic amount, you are bound to get a lot of nonsense or fraudulent activity, but this might be constrained to being outside of serious research circles. And the most serious people working in the field are likely to be helped by the rising tide as well, due to increased visibility and funding to their labs and so on.
It’s also my understanding that the last few years (during the current AI boom) have been some of the most successful (financially and productively) for MIRI in their entire history.
I’m curious as to whether or not the rationalsphere/AI risk community has ever experimented with hiring people to work on serious technical problems who aren’t fully aligned with the values of the community or not fully invested in it already. It seems like ideological alignment is a major bottleneck to locating and attracting relevant skill levels and productivity levels, and there might be some benefit to being open about tradeoffs that favor skill and productivity at the expense of not being completely committed to solving AI risk.
(Re-writing this comment from the original to make my point a little more clear).
I think it is probably quite difficult to map the decisions of someone on a continuum from really bad to really good if you can’t simulate the outcomes of many different possible actions. There’s reason to suspect that the “optimal” outcome in any situation looks vastly better than even very good but slightly sub-optimal decisions, and vise-versa for the least optimal outcome.
In this case we observed a few people who took massive risks (by devoting their time and energy into understanding or developing a particular technology which very well may have turned out to be a boondoggle) receive massive rewards from the success of it, although it could have very well turned out differently, based on what everyone knew at the time. I think the arguments for cryptocurrency becoming sucessful that existed in the past were very compelling but they weren’t exactly airtight logical proofs (and still aren’t even now). Not winning hugely because a legitimately large risk wasn’t taken isn’t exactly “losing” (and while buying bitcoins when they were cheap wasn’t a large risk, investing time and energy into becoming knowledgable enough about crypto to know it was worth taking the chance may have been. A lot of the biggest winners were people who were close to the development of cryptocurrencies).
But even so, a few of these winners are close to the LW community and have invested in its development or some of its projects. Doesn’t that count for something? Can they be considered part of the community too? I see no reason to keep the definition so strict.
Mostly I just want people to stop bringing models about the other person’s motives or intentions into conversations, and if tabooing words or phrases won’t accomplish that, and neither will explicitly enforcing a norm, then I’m fine not going that route. It will most likely involve simply arguing that people should adopt a practice similar to what you mentoned.
Confusion in the sense of one or both parties coming to the table with incorrect models is a root cause, but this is nearly always the default situation. We ostensibly partake in a conversation in order to update our models to more accurate ones and reduce confusion. So while yes, a lack of confusion would make bad conversations less likely, it also just reduces the need for the conversation to begin with.
And here we’re talking about a specific type of conversation that we’ve claimed is a bad thing and should be prevented. Here we need to identify a different root cause besides “confusion” which was too general of a root cause to explain these specific types of conversations.
What I’m claiming as a candidate cause is that there are usually other underlying motives for a conversation besides resolving disagreement. In addition people are bringing models of the other person’s confusion / motives in to the discussion, and that’s what I argue is causing problems and is a practice that should be set aside.
I think the Kensho post did spawn demon threads and that these threads contained the characteristics I mentioned in my original comment.
I’m not really faulting all status games in general, only tactics which force them to become zero-sum. It’s basically unreasonable to ask that humans change their value systems so that status doesn’t play any role, but what we can do is alter the rules slightly so that outcomes we don’t like become improbable. If I’m accused of being uncharitable, I have no choice but to defend myself, because being seen as “an uncharitable person” is not something I want to be included in anyone’s models of me (even in the case where it’s true). Even in one-on-one coversations there’s no reason to disengage if this claim was made against me. Especially when it’s a person you trust or admire (more likely if it’s a private conversation) and therefore I care a lot what the other person thinks of me. That’s where the stickyness of demon threads comes from, where disengaging results in the loss of something for either party.
There’s a second type of demon thread where participants get dragged into dead ends that are very deep in, without a very clear map of where the conversation is heading. But I think these reduce to the ususal problems of identifying and resolving confusion, and can’t really be resolved by altering incentives / discussion norms.
If the goal is for conversations to be making epistemic progress, with the caveat that individual people have additional goals as well (such as obtaining or maintaining high status within their peer group), and Demon Threads “aren’t important” in the sense that they help neither of these goals, then it seems the solution would simply be better tricks participants in a discussion can use in order to notice when these are happening or likely to happen. But I think it’s pretty hard to actually measure how much status is up for grabs in a given conversation. I don’t think it’s literally zero—I remember who said what in a conversation and if they did or didn’t have important insights—but it’s definitely possible that different people come in with different weightings of importance of epistemic progress vs. being seen as intelligent or insightful. The key to the stickiness and energy-vacuum nature of the demon threads, I think, is that if social stakes are involved, they are probably zero-sum, or at least seen that way.
I have personally noticed that many of the candidate “Demon” threads contain a lot of specific phrases that sort of give away that social stakes are involved, and that there could be benefits to tabooing some of these phrases. To give some examples:
“You’re being uncharitable.”
“Arguing in bad faith.”
“Sincere / insincere.”
“This sounds hostile” (or other comments about tone or intent).
These phrases are usually but not always negative, as they can be used in a positive sense (i.e. charitable, good faith, etc.) but even in this case they are more often used to show support for a certain side, cheerleading, and so on. Generally, they have the characteristic of making a claim about or describing your opponent’s motives. How often is it actually necessary or useful to make such claims?
In the vast majority of situations, it is next to impossible to know the true motives of your debate partner or other conversation participants, and even in the best case scenario, poor models will be involved (combined with the fact that the internet tends to make this even more difficult). In addition, an important aspect of status games is that it is necessary to hide the fact that a status game is being played. Being “high-status” means that you are perceived as making an insightful and relevant point at the most opportune time. If someone in a conversation is being perceived as making status moves, that is equivalent to being perceived as low status. That means that the above phrases turn into weapons. They contain no epistemically useful information, and they are only being used to make the interaction zero-sum. Why would someone deliberately choose to make an interaction zero-sum? That’s a harder question, but my guess would be that it is a more aggressive tactic to get someone to back down from their position, or just our innate political instincts assuming the interaction is already zero-sum.
There is no need for any conversation to be zero-sum, necessarily. Even conversations where a participant is shown to be incorrect can lead to new insights, and so status benefits could even be conferred on the “losers” of these conversations. This isn’t denying social reality, it just means that it is generally a bad idea to make assumptions about someone else’s intent during a conversation, especially negative assumptions. I have seen these assumptions lead to a more productive discussion literally zero times.
So additional steps I might want to add:
Notice if you have any assumptions or models about your conversation partner’s intents. If yes—just throw them out. Even positive ones won’t really be useful, negative ones will be actively harmful.
Notice your own intents. It’s not wrong to want to gain some status from the interactions. But if you feel that if your partner wins, you lose, ask yourself why. Taking the conversation private might help, but you might also care about your status in the eyes of your partner, in which case turning the discussion private might not change this. Would a different framing or context allow you both to win?
I’m trying to decide whether or not I understand what “looking” is, and I think it’s possible I do, so I want to try and describe it, and hopefully get corrected if it turns out I’m very wrong.
Basically, there’s sort of a divide between “feeling” and “Feeling” and it’s really not obvious that there should be, since we often make category errors in referring to these things. On the one hand, you might have the subjective feeling of pain, like putting your hand on something extremely hot. Part of that feeling of pain is the very strong sensation on your hand. Another part of the pain is the sense that you should not do that. This Feeling is the part that sucks. This is the part that you don’t want.
It turns out that those two types of subjective experience aren’t one in the same and aren’t inseparable. For the vast majority of situations where you notice that one occurs you also notice the other. However, (and it’s a big however), there are some times where the first type appears without the second type. It just so happens that our brain is wired so that you never notice that specific situation. But it occurs frequently enough that if you could notice, if you could Look, you would immediately discover that it’s always been a factor of your experience. And that’s what I’m guessing Valentine means when he says “just look up, it’s so obvious!” It IS obvious, once you see it, but seeing it for the first time is probably hard.
To describe a situation where I think this is likely to occur, imagine accidentally stubbing your toe, feeling the pain from it, then shortly after being told some stunning news about the death of a loved one from someone else. In that brief moment where you are stunned by the news and your mind shifts to that new piece of information, it briefly loses the sense of suffering from the pain of the stubbed toe, although the sensation is still there. Once your mind has completed the shift, it may return to feeling the unpleasantness of the pain combined with whatever new feeling it received.
But importantly, it turns out your brain is doing things like the above constantly. I used that example only because the effect would be much more pronounced. But your mind does these awareness shifts so frequently and so quickly that it’s usually unlikely to be aware of the brief moment where there can be a sensation of something before the associated emotional response. Learning to Look is basically learning to detect when these shifts happen and catch them in the act, and that is why meditation is usually prescribed to make this more likely. It’s also about learning that this effect can be controlled to some degree if you have some mastery over your attention.
It also is not really limited to physical sensations. Any kind of thought or state of awareness may have associated positive or negative mental states, and these too can be sort of detached from in a similar way.
When phrased like the above, the benefits seem obvious. If you had the choice not to suffer, wouldn’t you take it? The reason I think Valentine may not state that so bluntly is that there is an equally obvious objection: if I could choose to not suffer on demand, what would prevent me from doing harmful things to myself or others? Would I even be aligned with my current goals? And I think that question needs an extremely careful answer.
I think the much less obvious and surprising answer is that you are still aligned with your previous goals, and may even be better equipped to reach them, but I don’t feel like I have the skills to really argue for this point, and will completely understand any skepticism towards this.
It’s very possible I’m describing some thing either completely different or at least much more mundane than Valentine is. The biggest factor that leads me to believe this is that Kensho seems to be a much more black and white, you either see it or you don’t sort of thing, whereas what I’m talking about seems to require a gradual process of noticing, recognizing and learning to influence.
We need to understand how the black box works inside, to make sure our version’s behavior is not just similar but based on the right reasons.
I think here “black-box” can be used to refer to two different things, one to refer to things in philosophy or science which we do not fully understand yet, and also to machine learning models like neural networks that seem to capture their knowledge in ways that are uninterpretable to humans.
We will almost certainly require the use of machine learning or AI to model systems that are beyond our capabilities to understand. This may include physics, complex economic systems, the invention of new technology, or yes, even human values. There is no guarantee that a theory that describes our own values can be written down and understood fully by us.
Have you ruled out any kind of theory which would allow you to know for certain that a “black-box” model is learning what you want it to learn, without understanding everything that it has learned exactly? I might not be able to actually formally verify that my neural network has learned exactly what I want it to (i.e. by extracting the knowledge out of it and comparing it to what I already know), but maybe I have formal proofs of the algorithm it is using and so I know its knowledge will be fairly robust under certain conditions. It’s basically the latter we need to be aiming for.
A sort of fun game that I’ve noticed myself playing lately is to try and predict the types of objections that people will give to these posts, because I think once you sort of understand the ordinary paranoid / socially modest mindset, they become much easier to predict.
For example, if I didn’t write this already, I would predict a slight possibility that someone would object to your implication that requiring special characters in passwords is unnecessary, and that all you need is high entropy. I think these types of objections could even contain some pretty good arguments (I have no idea if there are actually good arguments for it, I just think that it’s possible there are). But even if there are, it doesn’t matter, because objecting to that particular part of the dialogue is irrelevant to the core point, which is to illustrate a certain mode of thinking.
The reason this kind of objection is likely, in my view, is because it is focused on a specific object-level detail, and to a socially modest person, these kinds of errors are very likely to be observed, and to sort-of trigger an allergic reaction. In the modest mindset, it seems to be that making errors in specific details is evidence against whatever core argument you’re making that deviates from the currently mainstream viewpoint. A modest person sees these errors and thinks “If they are going to argue that they know better than the high status people, they at least better be right about pretty much everything else”.
I observed similar objections to some of your chapters in Inadequate Equilibria. For example, some people were opposed to your decision to leave out a lot of object-level details of some of the dialogues you had with people, such as the startup founders. I thought to myself “those object-level details are basically irrelevant, because these examples are just to illustrate a certain type of reasoning that doesn’t depend on the details”, but I also thought to myself “I can imagine certain people thinking I was insane for thinking those details don’t matter!” To a socially modest person, you have to make sure you’ve completely ironed-out the details before you challenge the basic assumptions.
I think a similar pattern to the one you describe above is at work here, and I suspect the point of this work is to show how the two might be connected. I think an ordinary paranoid person is making similar mistakes to a socially under-confident person. Neither will try to question their basic assumptions, because as the assumptions underlie almost all of their conclusions, to judge them as possibly incorrect is equivalent to saying that the foundational ideas the experts said in textbooks or lectures might be incorrect, which is to make yourself higher-status relative to the experts. Instead, a socially modest / ordinary paranoid person will turn that around on themselves and think “I’m just not applying principle A strongly enough” which doesn’t challenge the majority-accepted stance on principle A. To be ordinarily paranoid is to obsess over the details and execution. Social modesty is to not directly challenge the fundamental assumptions which are presided over by the high-status. The result of that is when a failure is encountered, the assumptions can’t be wrong, so it must have been a flaw in the details and execution.
The point is not to show that ordinary paranoia is wrong, or that challenging fundamental assumptions is necessarily good. Rather it’s to show that the former is basically easy and the latter is basically difficult.
My understanding of A/B testing is that you don’t need an explicit causal model , or a “big theory” in order to successfully use it, you mostly would be using intuitions gained from experience in order to test hypotheses like “users like the red page better than the blue page”, which has no explicit causal information.
Here you argue that intuitions gained from experience count as hypotheses just as much as causal theories do, and not only that, but that they tend to succeed more often than the big theories do. That depends on what you consider to be “success” I think. I agree that empirically gained intuitions probably have a lower failure rate than causal theories (you won’t do much worse than average) but what Eliezer is mainly arguing is that you won’t do much better than average, either.
And as far as you don’t mind just doing ok on average, that might be fine, then. But the main thing this book is grappling with is “how do I know when I can do a lot better than average?” And that seems to be dependent on whether or not you have a good “big theory” available.
I wonder if it would have been as frustrating if he had instead opened with “The following are very loosely based on real conversations I’ve had, with many of the details changed or omitted.” That’s something many writers do and get away with, for the very reason that sometimes you want to show that someone actually thinks what you’re claiming people think, but you don’t actually want to be adversarial to the people involved. Maybe it’s not the fairest to the specific arguments, but the alternative could quite possibly turn out worse, or cause a fairly large derail from the main point of the essay, when you start focusing on the individual details of each argument instead of whatever primary pattern you’re trying to tie them together with.
There’s a fundamental assumption your argument rests on which is a choice of prior: Assume that everyone’s credences in a given proposition is a distribution centered around the correct value an ideal agent would give to that proposition if they had access to all the information that was available and relevant to that proposition and had enough time and capacity to process that information to the fullest extent. Your arguments are sound given that the above is actually the correct prior, but I see most of your essay as arguing why modesty would be the correct stance given that the assumption is true, and less of it about why that prior is the best one to have in the vast majority of situations.
In practice, the kind of modesty we are actually interested in is the case where we belong to group A, and group A has formed some credence in proposition X through a process that involves argumentation, exchange and acquisition of information, verifying each other’s logical steps, and so on (there’s some modesty going on within group A). After a while of this process going on the group consensus around X has converged on a particular value. But there is also another group, group B, which is much larger than A and has been around for longer too. Group B has converged on a very different credence for X than group A. Should group A update their credence in X to be much closer to B’s? Your argument seems to say, yes, basically they should.
I think I would tend to agree with you if the above information was all I had available to me. For all I know, my group A is no more or less effective than group B at reaching the truth, and is not subject to any different systemic biases that would prevent the correct credences from being reached. Since B has been around for longer, and has more members, possibly with experts, I should consider them to have more likely converged to the correct credence in X than group A has.
However, in the cases that we tend to really care about, the places where we think immodesty might be reasonable, group A usually does have some reason to believe that group B is less effective at converging to the correct value. It could be due to systemic issues, inadequate equilibria, Moloch, whatever you want to call it. In other words, we have some reason to think that throwing out the default prior is ok.
So this is where I think the crux of the argument is: How strongly do we expect the distribution of credences on X for group B to not be centered around the correct value? In general I think the key is that A tends to search for the X that maximizes the above statement, in other words, the X that B is most likely to be wrong about. After A thinks it’s found the best X, then it asks if immodesty is ok in this specific case. When you factor in that search process, it makes it a little more likely to actually find a bias that makes the prior wrong. I think there’s an unstated assumption in your essay that implies that X is chosen randomly with respect to the likelihood of bias.
Expert Iteration From the Inside
I believe that equilibria has already arrived...and at no real surprise, since no preventive measures were ever put into place.
The reason this equilibria occurs is because there is a social norm that says “upvote if this post is both easy to understand and contains at least one new insight.” If a post contains lots of deep and valuable insights, this increases the likelihood that it is complex, dense, and hard to understand. Hard to understand posts often get mistaken for poor writing (or worse, will be put in a separate class and compared against academic writing) and will face higher scrutiny. Only rarely and with much effort will someone be able to successfully write things that are easy to understand and contain deep insights. (As an example, consider MIRI’s research papers, which are more likely to contain much more valuable progress towards a specific problem, but also recieve little attention, and are often compared against other academic works where they face an uphill battle to gain wider acceptance.)
The way around this, if you choose to optimize for social approval and prestige, is to write beautifully written posts that explain a relatively simple concept. Generally, it is much easier to be a brilliant writer than someone who uncovers truly original ideas. It’s much easier to use this strategy with our current reward system.
Therefore, what results is basically a lot of amazingly-written articles that very clearly explain a concept you probably could have learned somewhere else.
But we’re in for a real treat with this sequence, since it openly acknowledges that it’s hard to know if you’ve found a genuine insight. It’s going to get really meta...
I think the post could also be interpreted as saying, “when you select for rare levels of super-competence in one trait, you are selecting against competence in most other traits” or at least, “when you select for strong charisma and leadership ability, you are selecting for below average management ability.” It’s a little ambiguous about how far this is likely to generalize or just how strongly specific skills are expected to anti-correlate.
I think the central reason it’s possible for an individual to know something better than the solution currently prescribed by mainstream collective wisdom is that the vastness in the number of degrees of freedom in optimizing civilization guarantees that there will always be some potential solutions to problems that simply haven’t received any attention yet. The problem space is simply way, way too large to expect that even relatively easy solutions to certain problems are known yet.
While modesty may be appropriate in situations regarding a problem that is widely visible and considered urgent by society, I think even within this class of problems, there are still so many inefficiencies and non-optimalities that, if you go out looking for one, it’s likely you’ll be able to actually find one. The existence of people who actively go looking for these types of problems, like within Effective Altruism, may demonstrate this.
The stock market is a good example of a problem that is relatively narrow in scope and also receiving a huge amount of society’s collective brainpower. But I just don’t think there’s nearly enough brainpower to expect even most of the visible and urgent problems to already have adequate solutions, or to even have solutions proposed.
There may also be certain dynamics where there are trade-offs between, how much energy and effort does society have to spend in order to implement a specific solution, and how much would this subtract from the effort and energy currently needed to support the other mechanisms of civilization? This dynamic may result in the existence of problems that are easy to notice, perhaps even easy to define a solution for, but in practice immensely complex to implement.
For example, it’s within the power of individual non-experts to understand the basic causes of the Great Recession. And there may have been individuals who predicted it to occur. But it could still have been the case that, actually, it was not feasible for society to simply recognize this and change course quickly enough to avert the disaster.
But rather than for society to simply say, once a disaster becomes predictable, “yes we all know this is a problem, but we really don’t know what to do about it, or if it’s even possible to do anything about it”, the incentive structures are such that it’s easier to spend brainpower to come up with reasons why it’s not really that bad and perhaps the problem doesn’t even exist in the first place. Therefore the correct answer gets hidden away and the commonly accepted answer is incorrect.
In other words, modesty is most reasonable when the systems that support knowledge accumulation don’t filter out any correct answers.
I think avoiding status games is sort of like trying to reach probabilities of zero or one: Technically impossible, but you can get arbitrarily close, to the point where trying to measure the weight that status shifts are assigned within everyone’s decision making is lowered to be almost non-measurable.
I’m also not sure I would define “not playing the game” as within a group, making sure that everyone’s relative status is the same. This is simply a different status game, just with different objectives. It seems to me that what you suggest doing would simply open up a Pandora’s Box of undesirable epistemic issues. Personally, I want the people who consistently produce good ideas and articulate them well to have high status. And if they are doing it better than me, then I want them to have higher status than myself. I want higher status for myself too, naturally, but I channel that desire into practicing and maintaining as many characteristics that I believe aid the goals of the community. My goal is almost never to preserve egalitarian reputation at the expense of other goals, even among people I respect, since I fear that trying to elevate that goal to a high priority carries the risk of signal-boosting poor ideas and filtering out good ones. Maybe that’s not what you’re actually suggesting needs to be done, maybe your definition doesn’t include things like reputation, but does consider status in the sense of who gets to be socially dominant. I think what I consider my crux is that it’s less important to make sure that “mutual respect” and “consider equal in status, to whatever extent status actually means” mean the same thing, and more important that the “market” of ideas generated by open discourse maintains a reasonable distribution of reputation.
If you define “rationality” as having good meta level cognitive processes for carving the future into a narrower set of possibilities in alignment with your goals, then what you’ve described is simply a set of relatively poor heuristics for one specific set of goals, namely, the gaining of social status and approval. One can have that particular set of goals and still be a relatively good rationalist. Of course, where do you draw the line between “pseudo” and “actual” given that we are all utilizing cognitive heuristics to some degree? I see the line being drawn as sort of arbitrary.