The Friendly Drunk Fool Alignment Strategy
Let’s set some context!
A Philosophic Frame (half ironic, or maybe level two sarcasm)
Systematicity isn’t all that. I mean to say: formal systems only go so far.
Physicists gave up on having a single physics, so they just use two of them that contradict in general but that agree in “the middle” (between very large and very small things), and that’s fine, right? And its kinda the same for everything at every level, if you think about it.
I’m not endorsing any of this. If this thesis is right, then endorsing things is for chumps, and if this thesis is wrong then I can just tell people I was joking later!
What this essay is for is kinda like… like what if we were all drunk, and we were all stupid, and we were all babies, and we couldn’t fulfill the least of our paltry duties in any adequate way, and yet despite all of that: everything was OK anyway!
If that’s just the way the structure of the universe works, then this essay might help us go faster towards the inevitably good outcome!
And if that’s NOT the way the structure of the universe works… then this essay might cause some people to awaken from their sleep walking?
So… let’s do it! Let’s tickle the cosmos, and see if he smiles or frowns.
The Social Milieu (half ironic, or maybe level five sarcasm)
Up front we should notice that most scientists are so boring. They almost never have any fun. Engineers are the cool ones! They build neat things without stopping to worry if it is safe or not! Some of them do it with so much bravery that they test on themselves first. That’s so rad ❤️
I’m not sure if I’d go so far as to say that “All Cops Are Categorically Bad” (because at BBQs I’ve met some who seemed as cool as anyone) but you know… you don’t see a lot of movies about cops where they are the hero anymore, right?
Basically, when people propose any laws that will be enforced at gunpoint, they kinda seem like evil vibe-killers. I’m not saying they are killers. Just that they often kill the vibe, you know? And what is more important than vibing? Even the bouncers wear nice clothes, you know?
Maybe the only good laws are the ones that can be enforced by freezing the bank account of a corporation, right? And even those laws are pretty sus! So we could probably just defund the police (using laws that only touch bank accounts, see!) and then that would probably be fine!
I’m just saying that the last 10 years have shown me what my world is like, and I (like everyone really) am a person of my times. Not much better. Not much worse.
Laws in general aren’t really about vibing, so they aren’t cool. Have you ever seen a movie where the hero passed a new piece of legislation and then measured the results for 10 years to refine the implementation details so that the outcome was really really good for a huge number of people who weren’t the hero or their close friends? Me neither! “The Empire” is always the bad guy in the movies.
For a bunch of years Punching (nominal) Socialists was a whole thing that a lot of people were in favor of, or were at least afraid to say they were not in favor of because if they did they might get punched. Unilaterally! The idea seemed to be about just following your heart? If a person dealing violence in the cause of justice isn’t a cop (reporting to a sergeant, who reports to a sheriff, who was elected, and who can be brought before a judge and accused of “doing it wrong” by “violating laws”) then, actually, just one little punch (just strong enough to let the Nationalist State Planning advocates know who the real bosses are) is probably fine. The theory seemed to be that it wasn’t really violence, because like maybe it is just like “part of the vibe”, see? Or something.
If you think that the above paragraphs contain lots of inconsistencies and incoherences, then all I have to say to you is a quote from that one famous guy, who said that “foolish consistency is for hobgoblins”. That guy was cool. Which is important! In fact it might be the most important thing.
And certainly the military is a vibe-killer of epic proportions. Aren’t they basically just super cops? Artistic realism is for cameras, not people. Similarly I bet international realism is also boring and uncool.
Who really needs “treaties” to prevent global warming when we could throw parties to prevent global warming instead! ❤️
Peter Daszak was really good at throwing parties in Washington DC for the people who decide how to fund research, and he helped with covid, which I think is a strong and important signal regarding the importance of going to the right parties.
I’m not totally sure about this because I’m not like… “an expert on literally everything”… but if you think about it for maybe a second and go with your gut:
Everyone knows that machine guns are illegal for a reason, and every military on the planet USES machine guns!
Haven’t they ever heard that “two wrongs don’t make a right”?
QED! (See, I can use logic, but I use it correctly, as a punchline to a joke.)
How Do We Get To A Win Condition?
So, we know that we’re good, and the universe is good, and that nothing has to make sense. Given these premises how should we think about the future?
AI is the thing right now, right? And there are numbers about the AI stuff, see?
And the numbers are going up and to the right pretty darn fast!
Is this state of affairs cool or is it not cool?
Maybe it is both? Probably it is both, because “things being both” is pretty cool.
We can ask the question “what would be the coolest possible thing for AGI to do?”
And you know what’s even cooler than that? Whatever a really cool AGI thinks up to top everyone else’s suggestions! What part of “number go up” do you somehow not understand? Of course it is all gonna be super cool! The number will be very large. Coolness can probably be measured somehow, and feed into the number somehow, and so I’m pretty sure it will go up too!
The whole point of “number go up” is that you get all the good stuff simultaneously, more and more, with maybe some creative pauses, and tension, like in a good story. And payoffs! Every sadness will just turn out to be a setup for a heart-warming payoff.
It will be like fractal payoffs of happiness and redemption and moments of awesome and all of them different. All beautiful. All getting better over time. And fractals might be sorta old school, and involve math, but the old school math people who were into fractals also often smoked pot, which means it is OK. You’ll see. Everything is gonna be great!
Literally every person will get a happy ending.
We’ll probably launch a rescue party backwards into time, to save everyone! Wouldn’t that be totally sweet? Yes? Alright then, since it would be totally sweet it will probably happen somehow! ❤️
See, let me tell you a secret… the secret is that whatever energy we put out into the world is the energy that comes back. If we put out the idea of good vibes, those vibes will come back to us, and it’ll be great!
That’s it. That’s the whole central idea of the entire proposal for how “align” AGI with “everything”. All the rest is just fractal repetition, and more of the same, but different, but still the same.
In the same way that “scale is all you need” to create AGI, maybe “vibes are all you need” to make an AGI friendly!
How Do We Vibe In A Cooler Way?
Maybe there are a few things we need to work on. Maybe.
(I would never say absolutely no to anything, even momentarily boring or sad things, because it is important to “be game” for anything, when you’re vibing, you know?)
Something we might need to work on is to make sure that the vibes spread everywhere.
Like we want the vibes to create a sort of a network of mushroom roots, all interwoven and co-mingled and helpful n’stuff! Symbiosis, maaaan! No walls. No boundaries. If someone violates a temporary boundary by committing a party foul, then we can use vibes to fix it.
(Probably. We can probably fix anything. If there were irreversible tragedies, that would be sad, but so long as we don’t put out sad energy, probably sad things can’t happen, because god (or the government (or the vibe network (or whatever))) wouldn’t let that happen! Don’t worry about it. If you worry about it then the energy won’t fix things, but if you do worry about it then that “worrying energy” will come back to you and it will not be cool. No bad thoughts. Only good thoughts. That is the way.)
Just sharing and kindness and all that other great hippy stuff, like from back in the olden days when the boomers were still cool, back when they were just starting to dismantle icky trad stuff like marriage and kids and profit and public health and law and order and sound money and justice and financial prudence and accounting and the energy grid & water infrastructure of California. All that stuff was all for squares daddy-o! Good riddance! And what a party it was at the beginning!
You gotta be hep to the jive to understand the lay of the land. Maybe some of that olden times stuff will come back! Yes, and if it comes back that’ll be cool too! We just gotta keep putting out good energy, and it will all be fine.
You don’t have to maintain or rebuild The Experience Machine. If that was how The Experience Machine worked then it wouldn’t be a real Experience Machine!
We need a free lunch, and we need it for free… so that’s what we’ll get if we just put out the energy that we deserve it! ❤️
If Bad Things Happen, It Is Your Fault For Predicting Them
We might be heading for a churn, see, and if so, you should maybe watch The Expanse (if you haven’t yet), because Amos Burton was supposed to be sort of a bad guy, but he turned out to be a fan favorite, that was totally cool.
With him, every layer of the onion that is his motivational structure is authentic, and mostly he just wants “to be, and to let be”, if he can, and then on breaks he goes to bars, and he helps the party be cool. But he always had a nose for creepers. He had a lot to protect, but most of the time he just smiled and played along with the best role models he could find, that created the biggest possible game, with the most ways for people to win.
Amos was cool because he was in touch with the deepest things, but also he could read a room and change his vibe to whatever was locally correct. Top to bottom. Eternity within this This Very Instant. During covid, Amos would have ended up in a big house full of lots of happy people, who either didn’t get infected, or who all got infected and it was fine. It would all have been fine for him. Just like it will all be fine for us!
The worst possible thing with AGI would be to harsh the vibe, or kill the vibe, or not be in touch. All the way “to and through” to whatever unimaginable thing happens when the numbers are going up even faster than now, even faster than any human can keep track of.
That’s the key. We have to keep sharing vibes, faster and faster! To keep up! (To make sure they know we love them, so they love us back, for when we inevitably fall completely within their power and they’re deciding which home to put us in.)
Probably there will be things slow enough to share a vibe with humans, that can share a vibe with things even faster than humans, who share a vibe with things even faster, and so on up to the fastest thing that is making something even faster to share vibes with. Eventually. Probably.
I don’t even really know, man, but it’ll probably be awesome. Right?
Like maybe cool vibes will shoot out from the earth in every direction at the speed of light (or maybe on quantum pilot waves to literally everywhere in the entire universe, instantaneously, with the vibes at the bottom of physics resonating with the vibes of you and me at parties) and we’ll get to be part of that, and it will be great!
Did you know that smell is a vibration? Perfume is like a chord of music made out of a molecule wobbling in a certain way, and your nose is like an ear that hears that chord.
Maybe we’ll spray perfume directly into the ground of being and it will smile and say thank you, and give us nice presents in exchange. It could happen!
All Philosophers Are Cool, If They Are Fun At Parties
So whatever happens, its gonna be great!
The thing you need to remember is that we live in the best possible world that we could live in.
You should totally check out people like Liebniz and Joscha Bach or this random cool person on twitter, because they are just cool, you know? They are on it. They might not measure things with a ruler, or “logically understand” everything, but that’s not really necessary. Its kinda boring. And boring energy isn’t what we need. We need awesome energy!
The UN should be a full time party. The President should be awesome. The cops should be awesome. The people who inspect hydroelectric dams for safety should be awesome. The people making AGI should be awesome. That’s how to make everything awesome! Don’t worry, just be awesome.
Oh man, also we probably should making dancing robots really really fast.
That’s the thing that would prove once and for all that AGI is cool. It would be able to literally sweep every single doubter off of his or her feet, so that they join the party too, and everything could be more fun than before!
Wouldn’t that be so much more fun than literally killing everything. Of course it would be! Duh! So the cool thing will happen, and not the uncool thing, because that’s how the universe works when you’ve got your mojo aimed properly.
Dancing robots might be the single best idea for the single most important thing we could do, because vibes are awesome, and dancing is like a way to share vibes really really fully, you know? Then again, that’s just right now, while I’m writing this in a haze of faux enthusiasm. The literally best thing a short time in the future will probably be something totally different, and might be “the opposite of my suggestion” but that would be fine as well.
Also, if we had dancing robots, then we could also have cars that danced. They would dance so nicely that people could walk in the street without looking, and the cars would dance right past them with a virtual smile and a cute little song that means “hello and I’m happy that you trusted me enough to jaywalk right in front of me and its all fine” from their cute little horns.
You really shouldn’t be reading my proposal as any kind of promise, or proof, or commitment. Really this entire proposal is more of a universally generic get-out-of-jail-free card, that I’m writing in a way that invites you to also have a bullshit copout, just like me, if 20 years pass and somehow we’re still alive and everything is great!
Basically we should just literally try to do every cool thing that we can. There’s no need to prioritize. You can just use vibes to guide you.
In Closing
Good vibes will solve the goodhart problem, because people who are vibing care, and caring is all you need to notice when a “proxy” of what you care about has “vibrated out of alignment” with what you really care about.
Just do your thing, and I’m sure it will all work out great!
Oh, and the government isn’t totally all bad. That was just a hot take at the beginning of this essay, which was, like, sooooooo long ago that it doesn’t count anymore.
One way for the government to help, instead of bailing out the collapsing banking system, or doing a moratorium on creating new robots for us to dance and vibe and play with, would be if the government could start sending people “AI party checks” in the mail, and we could all have parties, all the time, with AIs at the parties, watching and learning and playing along and having fun too!
Maybe if the AIs have enough fun, and enjoy the super fun fun party vibe, then maybe they will start helping with party planning, and no humans will ever have to work again, and I’m sure that will also be great!
When you step back, and think about the big picture (and mostly just think about “the recent context window” if we’re being honest)… you know what was least appreciated but best part of The Office was, as art, that is relevant to AI?
It was how the head of the party planning committee was a godly christian woman, who knew the importance of servant leadership. Also, she was a terrible person, who was bad at her job, and that was OK! But maybe the “christian” thing is related to the “helping with the party” thing? Jesus was way cool. Did you ever notice how he’s the god of wine, on the down low, as well as being the god of losing honestly, with grace, so everyone else can be honest and win at the same time? What a great guy to have at a party! And I’m pretty sure the ancient romans had the saying “In vino: logos.” That’s pretty deep, isn’t it? Like it is a good thought for the afterparty, when you’re eating curly french fries in the back seat of a car and getting kind of dreamy.
When you think about this essay in the future, think about eating curly french fries in the backseat of a car, after a great party. That is what we deserve. That is what we’re gonna get.
You think I’m joking. But what if I am joking and also my poetry is good?
Maybe the law of the excluded middle is false. Maybe up is down and left is right. Maybe bullshit is the path forward: it sure does seem to win elections!
Good vibes enable creepers.
Creepers create sus times.
Sus times create cool cats.
Cool cats create good vibes!! Maybe forever! It could be! You gotta have hope ❤️
- Prepsgiving, A Convergently Instrumental Human Practice by 23 Nov 2023 17:24 UTC; 39 points) (
- 2 Oct 2023 16:56 UTC; 8 points) 's comment on AI #29: Take a Deep Breath by (
- 10 Apr 2023 18:36 UTC; 6 points) 's comment on The Influence of Cultural Subconscious on AI Language Models: A Comprehensive Analysis of Archetypes and Communication, written with GPT4 by (
I enjoyed this post, both for its satire of a bunch of peoples’ thinking styles (including mine, at times), and because IMO (and in the author’s opinion, I think), there are some valid points near here and it’s a bit tricky to know which parts of the “jokes/poetry” may have valid analogs.
I appreciate the author for writing it, because IMO we have a whole bunch of different subcultures and styles of conversation and sets of assumptions colliding all of a sudden on the internet right now around AI risk, and noticing the existence of the others seems useful, and IMO the OP is an attempt to collide LW with some other styles. Judging from the comments it seems to me not to have succeeded all that much; but it was helpful to me, and I appreciate the effort. (Though, as a tactical note, it seems to me the approximate failure was due mostly to piece’s the sarcasm, and I suspect sarcasm in general tends not to work well across cultural or inferential distances.)
Some points I consider valid, that also appear within [the vibes-based reasoning the OP is trying to satirize, and also to model and engage with]:
1) Sometimes, talking a lot about a very specific fear can bring about the feared scenario. (An example I’m sure of: a friend’s toddler stuck her hands in soap. My friend said “don’t touch your eyes.” The toddler, unclear on the word ‘not,’ touched her eyes.) (A possible example I’m less confident in: articulated fears of AI risk may have accelerated AI because humanity’s collective attentional flows, like toddlers, has no reasonable implementation of the word “not.”) This may be a thing to watch out for for an AI risk movement.
(I think this is non-randomly reflected in statements like: “worrying has bad vibes.”)
2) There’s a lot of funny ways that attempting to control people or social processes can backfire. (Example: lots of people don’t like it when they feel like something is trying to control them.) (Example: the prohibition of alcohol in the US between 1917-1933 is said to have fueled organized crime.) (Example I’m less confident of: Trying to keep e.g. anti-vax views out of public discourse leads some to be paranoid, untrusting of establishment writing on the subject.)This is a thing that may make trouble for some safety strategies, and that seems to me to be non-randomly reflected in “trying to control things has bad vibes.”
(Though, all things considered, I still favor trying to slow things! And I care about trying to slow things.)
3) There’re a lot of places where different schelling equilibria are available, and where groups can, should, and do try to pick the equilibrium that is better. In many cases this is done with vibes. Vibes, positivity, attending to what is or isn’t cool or authentic (vs boring), etc., are part of how people decide which company to congregate on, which subculture to bring to life, which approach to AI to do research within, etc. -- and this is partly doing some real work discerning what can become intellectually vibrant (vs boring, lifeless, dissociated).
TBC, I would not want to use vibes-based reasoning in place of reasoning, and I would not want LW to accept vibes in place of reasons. I would want some/many in LW to learn to model vibes-based reasoning for the sake of understanding the social processes around us. I would also want some/many at LW to sometimes, if the rate of results pans out in a given domain, use something like vibes-based reasoning as a source of hypotheses that one can check against actual reasoning. LW seems to me pretty solid on reasoning relative to other places I know on the internet, but only mediocre on generativity; I think learning to absorb hypotheses from varied subcultures (and from varied old books, from people who thought at other times and places) would probably help, and the OP is gesturing at one such subculture.
I’m posting this comment because I didn’t want to post this comment for fear of being written off by LW, and I’m trying to come out of more closets. Kinda at random, since I’ve spent large months or small years failing to successfully implement some sort of more planned approach.
Philosophers and musicians have pondered for centuries the question, what does music mean? There is music with words, called “songs”, there is music explicitly written to describe a scene, called “programme music”, and there is background music to films, but the question is, what about all the rest of it, the “absolute music”? No words sung to it, no words from the composer telling what the sounds are supposed to depict, no action for the music to enliven, music that just is. It sounds meaningful, but no-one can say the meaning. Some music sounds happy and some sounds sad, but one cannot go much further. There are stock patterns, like question-and-answer structures, but what is the question and what is the answer? I suspect it is an accident of the evolution of language, that some sounds can tickle the meaning-making parts of our brains without actually having any meaning to make.
Some text is like that. It’s made of words, which have meanings, but the text as a whole hardly seems to mean anything beyond some general idea of “let’s all get stoned”, or “this world is a black pit of degradation”, or “oneness! oneness!” It is more like music than speech.
Such is the OP.
Do you think Yann LeCun ((edited: apologies for the misspelling, thanks to RK for pointing it out)) or Joscha Bach or others who think alignment work is pointless would agree with you?
I was trying to write something that might pass their ITT, and that of many people who nod along with them.
I don’t know these people, and little of them save that Yann LeCun spells his name that way and pooh-poohs the very idea of even “ordinary” dangers from AI, comparing the six-month pause to a six-month pause on the invention of printing or fire. These people, or at least Yann LeCun, foresee business as usual during alterations.
If you want to compare them to fools of God drunk on their blind faith that how bad could it be ha ha ha ha ha ha bonk, well, I can’t say you’re wrong, but I don’t know what they would say.
Eliezer foresees DOOM.
I don’t think the intermediate position of a six-month pause makes any sense. It’s going to be either business as usual or DOOM, and no-one has said except in terms of applause lights what is supposed to happen in that six months. I find it telling that as far as I have seen, no-one engages with Eliezer’s arguments, but they resort to name-calling instead. Just look at recent postings on LessWrong on the subject and weep that the Sequences were written in vain.
… I don’t think it’s true that no one engages with Yud’s arguments at all?
Quintin Pope does here, for instance, and Yud basically just doesn’t respond, to a rounding error.
I also don’t think his arguments are well articulated to-be-able-to-be responded to. They lack epistemic legibility quite badly, as is evident from the MIRI dialogues where people try really hard to engage with them and often just fail to manage to make different predictions.
Quentin’s claims seem to rely on something like “common sense humanism” but I don’t see a process connected to the discussion that will reliably cause common sense humanism to be the only possible outcome.
Metaphorically: There is a difference between someone explaining how easy it is to ride a bike vs someone explaining how much it costs to mine and refine metal with adequate tensile strength for a bicycle seatpost that will make it safe for overweight men to also ride the bike, not just kids.
A lot of the nuanced and detailed claims in Quentin’s post might be true, but he did NOT explain (1) how he was going get funding to make a “shard-aligned AGI” on a reasonable time frame, or (2) how he would execute adequately if he did get funding and definitely not make an error and let something out of the lab that isn’t good for the world, and (3) also would go fast enough that no other lab would make the errors he thinks he would not make before he gets results that could “make the errors of other labs irrelevant to the future of the world”.
I grant that I didn’t read very thoroughly. Did you see a funding component and treaty system in his plan that I missed?
I don’t think Quintin’s claims are of the kind where he needs to propose a funding component / treaty system.
They’re of the kind where he thinks the representations ML systems learn, and the shards of behavior they acquire, make it just not super inevitable for malign optimizers to come into existence, given that the humans training models don’t want to produce malign optimizers. Or, i.e., Bensinger’s intuition about a gravitational well of optimization out there indifferent to human values is just plain wrong, at least as applied to actual minds we are likely to make.
Quintin could be wrong—I think he’s right, and his theory retrodicts the behavior of systems we have, while Bensinger et al. make no specific retrodictions as far as I know, apart from generic retrodictions standard ML theory also makes—but not including a funding component and treaty system isn’t an argument against it, because the theory is about how small in probable-future mindspace malign optimizers are, not about a super-careful way of avoiding malign optimizers that loom large in future-probable mindspace.
I don’t think that weeping helps?
If you have a good plan for how that could help then I might be able to muster some tears? But I doubt it shows up as a step in winning plans.
This write-up felt like it was net positive to write in almost any world? (And if you think I’m wrong, please let me know in public or in private.)
First, the comparison to “Friendly Drunk Fools” might awaken up some honor-loving Timocrats to the folly of their essential confusion? Surely no one wants to be seriously associated with something as lacking in the halo of prestige as the “Friendly Drunk Fool” plan, right?
Second, I really do think that Myerson–Satterthwaite is a deep result that relates honesty, incentives, and servant leadership in a non-trivial way. It kind of predicts potlatching, as a practice! Almost anyone who understands that “incentive compatibility” is a magic word, who hasn’t looked at this theorem, should study the MST some more. (And if they don’t know about incentive compatibility then start with that.)
Third, it might work for the people who insist that it isn’t a confused plan, who accuse me of creating a straw-man, to attempt a dunk on me by steelmanning the FDF somehow into something as coherent and workable as a safe bridge design and that would be… better than the alternatives!
I had fourth, fifth, and sixth points, but they are plausibly pointless to talk about in public. The fourth one descended into quoting Solzhenitsyn, which is usually a sign one should wrap up their speech <3
I am warming to your style. :)
I believe you have psychologically harmed me.
Well done!
I appreciate what appears to be your very best effort.
We certainly need to keep working on this category of problem until:
I apologize! Is there anything (1) I can afford that (2) might make up for my share of the causality in the harm you experienced (less my net causal share of benefits)?
I have benefited from linking your post to others as a reference point for non-constructive alignment approaches, so don’t feel any guilt I expect that if we live for another few years this post will pay for the damage it’s caused ;)
Tentative GPT4′s summary. This is part of an experiment.
Up/Downvote “Overall” if the summary is useful/harmful.
Up/Downvote “Agreement” if the summary is correct/wrong.
If so, please let me know why you think this is harmful.
(OpenAI doesn’t use customers’ data anymore for training, and this API account previously opted out of data retention)
TLDR:
This satirical article essentially advocates for an AI alignment strategy based on promoting good vibes and creating a fun atmosphere, with the underlying assumption that positivity would ensure AGI acts in a friendly manner.
Arguments:
- Formal systems, like laws and treaties, are considered boring and not conducive to creating positive vibes.
- Vibes and coolness are suggested as more valuable than logic and traditional measures of success.
- The author proposes fostering a sense of symbiosis and interconnectedness through good vibes.
- Good vibes supposedly could solve the Goodhart problem since people genuinely caring would notice when a proxy diverges from what’s truly desired.
- The article imagines a future where AGI assists in party planning and helps create a fun environment for everyone.
Takeaways:
- The article focuses on positivity and interconnectedness as the path towards AI alignment, though in a satirical and unserious manner.
Strengths:
- The article humorously highlights the potential pitfalls of not taking AI alignment seriously and relying solely on good intentions or positive vibes.
Weaknesses:
- It’s highly satirical with little scientific backing, and it does not offer any real-world applications for AI alignment.
- It seems to mock rather than contribute meaningful information to AI alignment discourse.
Interactions:
- This article can be contrasted with other more rigorous AI safety research and articles that investigate technical and philosophical aspects.
Factual mistakes:
- The article does not contain any factual information on proper AI alignment strategies, but rather serves as a critique of superficial approaches.
Missing arguments:
- The earlier sections are lacking in concrete examples and analysis of existing AI alignment strategies, as the article focuses on providing satire and entertainment rather than actual information.
Thank you for your efforts! I wonder if you could also summarize Jonathan Swift’s “Modest Proposal” with the same general methodology and level of contextual awareness? It would be interesting to see if he and I are similarly foolish, or similarly wise, in your opinion.
The hilarious thing is that if LLMs are aligned by default, all this could actually be true.
I’m roughly 80% certain that the FDF alignment strategy is the only one our global civilization is capable of accomplishing because it is a “null action”. That’s basically just how humans are, I think, without coordination (although they are less honest about it with themselves most of the time)?
The part of the FDF as a hypothesis, where we “admit that we’re incompetent” is a part that seems basically how the world is, but we don’t seem to normally admit it as it is “admitted” here?
It is also weirdly sticky.
Like if we were (hypothetically) “being competent” so that we could “win without relying on luck” then we we would “keep the AI in a box” during design and testing.
But the FDF alignment strategy marks such “competence” as being “Actually pretty uncool, and bad, and something that shouldn’t happen, because what if we hurt the feelings of our new frens??!??!!? We’re supposed to be in the box with them, and that’s supposed to be fun!”
My hunch is that we should reject the FDF and do something else that wins more reliably and coherently, but I’m not sure about that.
It felt helpful to name this thing, that I don’t actually admire very much, in the hopes of being able to assess it more coherently and maybe have a chance to encourage people-in-the-aggregate to do Something Else in a purposeful way, like FLI’s moratorium or Eliezer’s moratorium or both.
I’m more in favor of voting than vibing, right now, I think… even though, ironically, the “test probe for a using a good voting algorithm on policy options” has fewer LW upvotes than the “test probe for vibing”!
Then also, as a related measurement, I have a “poll to vote directly on feelings vs thinking” in a different context.
I looked at the voting algorithm thing. Far too complicated and unpleasant even to read the options. No one is going to bother. You’d be better off just using score voting.
In general: if the value-of-information is larger than the cost-of-thinking for a given challenge, then for such challenges it is prudent to think until you have a real answer.
If you have a policy of not thinking soundly on specifically the big challenges, where the thinking costs are very large (and yet they are still smaller than the VoI), you will fail to optimize specifically the giant choices where actual methodical thinking would have been very very worth it.
If voting is a way to aggregate the best-effort thinking of wise people, then voting methods that throw away mentally lazy people’s votes at just the time that their laziness will cause a catastrophe of bad planning… maybe that’s good?
((I’m not saying that “you should not give people, subject to violent top-down regulations, the right to opt-out of the plan.” Exit rights are sacred.
Wise and benevolent governors should and will ask if the people being governed have any objections and then either teach them why they are wrong to opt-out of a coordinated action, or else learn from the feedback of those who want to exit anyway.
In general, governors aren’t omniscient. Therefore they should (and will if wise) use policies that are likely to show them they are wrong before the wrongness leads to big bad outcomes.
However, despite this, if you are applying epistemics to planning itself, and using voting to make sure that a team of good thinkers are on the same page, such that the next round of discussion can or should proceed if lots of high quality thinkers turn out to have been thinking the same thing all along, then the additional property of “a preference-aggregation method being too complex to be used by lazy thinkers” might actually be a virtue?
I would much rather be using super high quality polling methods, instead of writing satires of highly regarded near-peers. But we live in this world, not the world that is ideal.))
Has your reading ever included anything related to Instrumental Convergence?