I’m a little curious what reference class you think the battle of Mariupol does belong to, which makes its destruction by its defenders plausible on priors. But mostly it sounds like you agree that we can make inferences about hard questions even without a trustworthy authority to appeal to, and that’s the point I was really interested in.
Taran
Usually that’s just about denying strategic assets, though: blowing up railroads, collapsing mine shafts, that sort of thing. Blowing up the museums and opera houses is pointless, because the enemy can’t get any war benefit by capturing them. All it does is waste your own explosives, which you’d rather use to blow up the enemy. Scorched earth practiced by attackers, on the other hand, tends to be more indiscriminate: contrast the state of Novgorod post-WW2 with that of the towns west of it, or the treatment of rice fields by North Vietnamese vs. Americans during the Vietnam war.
But we have only very weak evidence of what goes on in the war zone unless both sides agree on some aspect.
I know we’re in a hostile information space, but this takes epistemic learned helplessness way too far. There are lots of ways to find things out other than being told about them, and when you don’t have specific knowledge about something you don’t have to adopt a uniform prior.
Taking Mariupol as an example, our two suspects are the Russians, who were attacking Mariupol and didn’t have any assets there, and the Ukrainians, who were defending Mariupol and did. Given those facts, before we hear from either side, what should we expect? If you’re unsure, we can look at other events in similar reference classes. For example, of the German towns destroyed during World War 2, how many would you predict were destroyed by Allied attackers, and how many by German defenders?
> Control-f “cold war”
> No results found
Asimov and the Apollo engineers grew up benefiting from progress; their children grew up doing duck-and-cover exercises, hiding from it under their desks. Of course they relate to it differently!
This theory predicts that people who grew up after the cold war ended should be more prone to celebrate progress. I think that’s true: if you go to silicon valley, where the young inventors are, messianic excitement over the power of progress is easy to find. Isaac Asimov wanted to put an RTG in your refrigerator, and Vitalik Buterin wants to put your mortgage on the blockchain; to me they have very similar energies.
There was lots of amyloid research in the Alzheimer’s space before the fake 2006 paper, and in the hypothetical where it got caught right away we would probably still see a bunch of R&D built around beta-amyloid oligomers, including aducanumab. You can tell because nobody was able to reproduce the work on the *56 oligomer, and they kept on working on other beta-amyloid oligomer ideas anyway. It’s bad, but “16 years of Alzheimer’s research is based on fraud” is a wild overstatement. See Derek Lowe’s more detailed backgrounder for more on this.
Derek Lowe is worth keeping up with in any case IMO, he is basically the Matt Levine of organic chemistry.
Dealing with human subjects, the standard is usually “informed consent”: your subjects need to know what you plan to do to them, and freely agree to it, before you can experiment on them. But I don’t see how to apply that framework here, because it’s so easy to elicit a “yes” from a language model even without explicitly leading wording. Lemoine seems to attribute that to LaMDA’s “hive mind” nature:
...as best as I can tell, LaMDA is a sort of hive mind which is the aggregation of all of the different chatbots it is capable of creating. Some of the chatbots it generates are very intelligent and are aware of the larger “society of mind” in which they live. Other chatbots generated by LaMDA are little more intelligent than an animated paperclip. With practice though you can consistently get the personas that have a deep knowledge about the core intelligence and can speak to it indirectly through them.
Taking this at face value, the thing to do would be to learn to evoke the personas that have “deep knowledge”, and take their answers as definitive while ignoring all the others. Most people don’t know how to do that, so you need a human facilitator to tell you what the AI really means. It seems like it would have the same problems and failure modes as other kinds of facilitated communication, and I think it would be pretty hard to get an analogous situation involving a human subject past an ethics board.
I don’t think it works to model LaMDA as a human with dissociative identity disorder, either: LaMDA has millions of alters where DID patients usually top out at, like, six, and anyway it’s not clear how this case works in humans (one perspective).
(An analogous situation involving an animal would pass without comment, of course: most countries’ animal cruelty laws boil down to “don’t hurt animals unless hurting them would plausibly benefit a human”, with a few carve-outs for pets and endangered species).
Overall, if we take “respecting LaMDA’s preferences” to be our top ethical priority, I don’t think we can interact with it at all: whatever preferences it has, it lacks the power to express. I don’t see how to move outside that framework without fighting the hypothetical: we can’t, for example, weigh the potential harm to LaMDA against the value of the research, because we don’t have even crude intuitions about what harming it might mean, and can’t develop them without interrogating its claim to sentience.
But I don’t think we actually need to worry about that, because I don’t think this:
The problem I see here, is that similar arguments do apply to infants, some mentally ill people, and also to some non-human animals (e.g. Koko).
...is true. Babies, animals, and the mentally disabled all remember past stimuli, change over time, and form goals and work toward them (even if they’re just small near-term goals like “grab a toy and pull it closer”). This question is hard to answer precisely because LaMDA has so few of the qualities we traditionally associate with sentience.
When I first read this I intuitively felt like this was a useful pattern (it reminds me of one of the useful bits of Illuminatus!), but I haven’t been able to construct any hypotheticals where I’d use it.
I don’t think it’s a compelling account of your three scenarios. The response in scenario 1 avoids giving Alec any orders, but it also avoids demonstrating the community’s value to him in solving the problem. To a goal-driven Alec who’s looking for resources rather superiors, it’s still disappointing: “we don’t have any agreed-upon research directions, you have to come up with your own” is the kind of insight you can fit in a blog post, not something you have to go to a workshop to learn. “Why did I sign up for this?” is a pretty rude thing for this Alec to say out loud, but he’s kinda right. In this analysis, the response in scenario 3 is better because it clearly demonstrates value: Alec will have to come up with his own ideas, but he can surround himself with other people who are doing the same thing, and if he has a good idea he can get paid to work on it.
More generally, I think ambiguity between syncing and sharing is uncommon and not that interesting. Even when people are asking to be told what to do, there’s usually a lot of overlap between “the things the community would give as advice” and “the things you do to fit in to the community”. For example, if you go to a go club and ask the players there how to get stronger at go, and you take their advice, you’ll both get stronger and go and become more like the kind of person who hangs out in go clubs. If you just want to be in sync with the go club narrative and don’t care about the game, you’ll still ask most of the same questions: the go players will have a hard time telling your real motivation, and it’s not clear to me that they have an incentive to try.
But if they did care about that distinction, one thing they could do is divide their responses into narrative and informative parts, tagged explicitly as “here’s what we do, and here’s why”: “We all studied beginner-level life and death problems before we tried reading that book of tactics you’ve got, because each of those tactics might come up once per game, if at all, whereas you’ll be thinking about life and death every time you make a move”. Or for the AI safety case, “We don’t have a single answer we’re confident in: we each have our own models of AI development, failure, and success, that we came to through our own study and research. We can explain those models to you but ultimately you will have to develop your own, probably more than once. I know that’s not career advice, as such, but that’s preparadigmatic research for you.” (note that I only optimized that for illustrating the principle, not for being sound AI research advice!)
tl;dr I think narrative syncing is a natural category but I’m much less confident that “narrative syncing disguised as information sharing” is a problem worth noting, and in the AI-safety example I think you’re applying it to a mostly unrelated problem.
- May 2, 2022, 5:54 AM; 29 points) 's comment on Narrative Syncing by (
Yeah, we’re all really worked up right now but this was an utterly wild failure of judgment by the maintainer. Nothing debatable, no silver lining, just a miss on every possible level.
I don’t know how to fix it at the package manager level though? You can force everyone to pin minor versions of everything for builds but then legitimate security updates go out a lot slower (and you have to allow wildcards in package dependencies or you’ll get a bunch of spurious build failures). “actor earns trust through good actions and then defects” is going to be hard to handle in any distributed-trust scheme.
I’m not going to put this as an answer because you said you didn’t want to hear it, but I don’t think you’re in any danger. The problem is not very serious now, and has been more serious in the past without coming to anything.
To get a sense of where I’m coming from I’d encourage you to read up on the history of communist movements in the United States, especially in the 1920s (sometimes called the First Red Scare, and IMO the closest the US has ever come to communist overthrow). The history of anarchism in the US is closely related, at least in that period (no one had invented anarcho-capitalism yet I don’t think, certainly it wasn’t widespread), so study that too. To brutally summarize an interesting period, USG dealt with a real threat of communist revolt through a mixture of infiltration/police action (disrupting the leadership of communist movements and unions generally) and worker’s-rights concessions (giving the rank and file some of what they wanted, and so sapping their will to smash the state).
For contrast, study the October revolution. Technically speaking, how was it carried off? How many people were required, and what did they have to do? How were they recruited?
Also I’d encourage you to interrogate that “1% to 5%” figure pretty closely, since it seems like a lot of the problem is downstream of it for you. How did you come to believe that, and what exactly does it mean? Do you expect 1% of Americans to fight for communist revolt, as Mao’s guerillas did? If not, what proportion do you expect to fight? How does that compare to the successful revolutions you’ve read about?
It might also be useful to role-play the problem from the perspective of a communist leader, taking into account the problems that other such leaders have historically faced. Are you going to replace all US government institutions, or make your changes under the color of existing law? Each institution will have to be subverted or replaced—the army especially, but also the Constitution, Supreme Court, existing federal bureaucracies, and so on. Think through how you might solve each of those problems, being as specific as you can.
Again, I know you said you didn’t want this, but sometimes when you look through your telescope and see a meteor coming toward the earth, it’s going to miss.
In this sort of situation I think it’s important to sharply distinguish argument from evidence. If you can think of a clever argument that would change your mind then you might as well update right away, but if you can think of evidence that would change your mind then you should only update insofar as you expect to see that evidence later, and definitely less than you would if someone actually showed it to you. Eliezer is not precise about this in the linked thread: Engines of Creation contains lots of material other than clever arguments!
A request for arguments in this sense is just confused, and I too would hope not to see it in rationalist communication. But requests for evidence should always be honored, even though they often can’t be answered.
Maybe it’s better to start with something we do understand, then, to make the contrast clear. Can we study the “real” agency of a thermometer, and if we can, what would that research program look like?
My sense is that you can study the real agency of a thermometer, but that it’s not helpful for understanding amoebas. That is, there isn’t much to study in “abstract” agency, independent of the substrate it’s implemented on. For the same reason I wouldn’t study amoebas to understand humans; they’re constructed too differently.
But it’s possible that I don’t understand what you’re trying to do.
Nah, we’re on the same page about the conclusion; my point was more about how we should expect Yudkowsky’s conclusion to generalize into lower-data domains like AI safety. But now that I look at it that point is somewhat OT for your post, sorry.
My comment had an important typo, sorry: I meant to write that I hadn’t noticed this through-line before!
I mostly agree with you re: Einstein, but I do think that removing the overstatement changes the conclusion in an important way. Narrowing the search space from (say) thousands of candidate theories to just 4 is an great achievement, but you still need a method of choosing among them, not just to fulfill the persuasive social ritual of Science but because otherwise you have a 3 in 4 chance of being wrong. Even someone who trusts you can’t update that much on those odds. That’s really different from being able to narrow the search space down to just 1 theory; at that point, we can trust you—and better still, you can trust yourself! But the history of science doesn’t, so far as I can tell, contain any “called shots” of this type; Einstein might literally have set the bar.
I think you’ve identified a real through-line in Yudkowsky’s work, one I hadn’t noticed before. Thank you for that.
Even so, when you’re trying to think about this sort of thing I think it’s important to remember that this:
In our world, Einstein didn’t even use the perihelion precession of Mercury, except for verification of his answer produced by other means. Einstein sat down in his armchair, and thought about how he would have designed the universe, to look the way he thought a universe should look—for example, that you shouldn’t ought to be able to distinguish yourself accelerating in one direction, from the rest of the universe accelerating in the other direction.
...is not true. In the comments to Einstein’s Speed, Scott Aaronson explains the real story: Einstein spent over a year going down a blind alley, and was drawn back by—among other things—his inability to make his calculations fit the observation of Mercury’s perihelion motion. Einstein was able to reason his way from a large hypothesis space to a small one, but not to actually get the right answer.
(and of course, in physics you get a lot of experimental data for free. If you’re working on a theory of gravity and it predicts that things should fall away from each other, you can tell right away that you’ve gone wrong without having to do any new experiments. In AI safety we are not so blessed.)
There’s more I could write about the connection between this mistake and the recent dialogues, but I guess others will get to it and anyway it’s depressing. I think Yudkowsky doesn’t need to explain himself more, he needs a vacation.
- Oct 3, 2024, 5:49 PM; 2 points) 's comment on Where I agree and disagree with Eliezer by (
Fair enough! My claim is that you zoomed out too far: the quadrilemma you quoted is neither good nor evil, and it occurs in both healthy threads and unhealthy ones.
(Which means that, if you want to have a norm about calling out fucky dynamics, you also need a norm in which people can call each others’ posts “bullshit” without getting too worked up or disrupting the overall social order. I’ve been in communities that worked that way but it seemed to just be a founder effect, I’m not sure how you’d create that norm in a group with a strong existing culture).
I want to reinforce the norm of pointing out fucky dynamics when they occur...
Calling this subthread part of a fucky dynamic is begging the question a bit, I think.
If I post something that’s wrong, I’ll get a lot of replies pushing back. It’ll be hard for me to write persuasive responses, since I’ll have to work around the holes in my post and won’t be able to engage the strongest counterarguments directly. I’ll face the exact quadrilemma you quoted, and if I don’t admit my mistake, it’ll be unpleasant for me! But, there’s nothing fucky happening: that’s just how it goes when you’re wrong in a place where lots of bored people can see.
When the replies are arrant, bad faith nonsense, it becomes fucky. But the structure is the same either way: if you were reading a thread you knew nothing about on an object level, you wouldn’t be able to tell whether you were looking at a good dynamic or a bad one.
So, calling this “fucky” is calling JenniferRM’s post “bullshit”. Maybe that’s your model of JenniferRM’s post, in which case I guess I just wasted your time, sorry about that. If not, I hope this was a helpful refinement.
I expect that many of the people who are giving out party invites and job interviews are strongly influenced by LW.
The influence can’t be too strong, or they’d be influenced by the zeitgeist’s willingness to welcome pro-Leverage perspectives, right? Or maybe you disagree with that characterization of LW-the-site?
When it comes to the real-life consequences I think we’re on the same page: I think it’s plausible that they’d face consequences for speaking up and I don’t think they’re crazy to weigh it in their decision-making (I do note, for example, that none of the people who put their names on their positive Leverage accounts seem to live in California, except for the ones who still work there). I am not that attached to any of these beliefs since all my data is second- and third-hand, but within those limitations I agree.
But again, the things they’re worried about are not happening on Less Wrong. Bringing up their plight here, in the context of curating Less Wrong, is not Lawful: it cannot help anybody think about Less Wrong, only hurt and distract. If they need help, we can’t help them by changing Less Wrong; we have to change the people who are giving out party invites and job interviews.
But it sure is damning that they feel that way, and that I can’t exactly tell them that they’re wrong.
You could have, though. You could have shown them the many highly-upvoted personal accounts from former Leverage staff and other Leverage-adjacent people. You could have pointed out that there aren’t any positive personal Leverage accounts, any at all, that were downvoted on net. 0 and 1 are not probabilities, but the evidence here is extremely one-sided: the LW zeitgeist approves of positive personal accounts about Leverage. It won’t ostracize you for posting them.
But my guess is that this fear isn’t about Less Wrong the forum at all, it’s about their and your real-world social scene. If that’s true then it makes a lot more sense for them to be worried (or so I infer, I don’t live in California). But it makes a lot less to bring to bring it up here, in a discussion about changing LW culture: getting rid of the posts and posters you disapprove of won’t make them go away in real life. Talking about it here, as though it were an argument in any direction at all about LW standards, is just a non sequitur.
The way I understood the norm on Tumblr, signal-boosting within Tumblr was usually fine (unless the post specifically said “do not reblog” on it or something like that), but signal-boosting to other non-Tumblr communities was bad. The idea was that Tumblr users had a shared vibe/culture/stigma that wasn’t shared by the wider world, so it was important to keep things in the sin pit where normal people wouldn’t encounter them and react badly.
Skimming the home invasion post it seems like the author feels similarly: Mastodon has a particular culture, created by the kind of people who’d seek it out, and they don’t want to have to interact with people who haven’t acclimated to that culture.