On the topic of security mindset, the thing that the LW community calls “security mindset” isn’t even an accurate rendition of what computer security people would call security mindset. As noted by lc, actual computer security mindset is POC || GTFO, or trying to translate that into lesswrongesse, you do not have warrant to believe in something until you have an example of the thing you’re maybe worried about being a real problem because you are almost certain to be privileging the hypothesis.
iceman
- 16 Oct 2023 13:07 UTC; 8 points) 's comment on Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.) by (
- 15 Oct 2023 21:49 UTC; 3 points) 's comment on Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.) by (
- 16 Oct 2023 15:23 UTC; 2 points) 's comment on Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.) by (
Are AI partners really good for their users?
Compared to what alternative?
As other commenters have pointed out, the baseline is already horrific for men, who are suffering. Your comments in the replies seem to reject that these men are suffering. No, obviously they are.
But responding in depth would just be piling on and boring, so instead let’s say something new:
I think it would be prudent to immediately prohibit AI romance startups to onboard new users[..]
You do not seem to understand the state of the game board: AI romance startups are dead, and we’re already in the post-game.
character.ai was very popular around the second half of 2022, but near the end of it, the developers went to war with erotic role play users. By mid January 2023, character.ai is basically dead for not just sex talk, but also general romance. The developers added in a completely broken filter that started negatively impacting even non-sexual, non-romantic talk. The users rioted, made it the single topic on the subreddit for weeks, the developers refused to back down, and people migrated away. Their logo is still used as a joke on 4chan. It’s still around, but it’s not a real player in the romance game. (The hearsay I’ve heard was that they added these filters to satisfy payment providers.)
Replika was never good. I gave it a try early on, but as far as I could tell, it was not even a GPT-2 level model and leaned hard on scripted experiences. However, a lot of people found it compelling. It doesn’t matter because it too was forced to shut down by Italian regulators. They issued their ban on erotic role play on Valentine’s Day of all days and mods post links to the suicide hotline on their subreddit.
The point here is we already live in a world with even stricter regulations than you proposed, done backdoor through payment providers and app stores, or through jurisdiction shopping. This link won’t work unless you’re in EleutherAI, but asara explains the financial incentives against making waifu chatbots. So what has that actually lead to? Well, the actual meta, the thing people actually use for ai romantic partners, today, is one of:
-
Some frontend (usually TavernAI or its fork SillyTavern) which connects to the API of a general centralized provider (Claude or ChatGPT) and uses a jailbreak prompt (and sometimes a vector database if you have the right plugins) to summon your waifu. Hope you didn’t leak your OpenAI API key in a repo, these guys will find it. (You can see this tribe in the /aicg/ threads on /g/ and other boards).
-
Local models. We have LLaMA now and a whole slew of specialized fine tunes for it. If you want to use the most powerful open sourced llama v2 70B models, you can do that today with three used P40s ($270 each) or two used 3090s (about $700 each) or a single A6000 card with 48 GB of VRAM ($3500 for last generation). ~$800, $1400 and $3500 give a variety of price points for entry, and that’s before all the people who just rent a setup via one of the many cloud GPU providers. Grab a variant of KoboldAI depending on what model you want and you’re good to go. (You can see this tribe in the /lmg/ threads on /g/).
The actual outcome of the ban (which happened in the past) was the repurposing of Claude/ChatGPT and building dedicated setups to run chatbots locally with the cheapest option being about $800 in GPUs, along with a ton of know how around prompting character cards in a semi-standardized format that was derived from the old character.ai prompts. I will finish by saying that it’s a very LessWrongian error to believe you could just stop the proliferation of AI waifus by putting government pressure on a few startups when development seems to mostly be done decentralized by repurposing open language models and is fueled by a collective desire to escape agony.
Remember, not your weights, not your waifu.
-
So, I started off with the idea that Ziz’s claims about MIRI were frankly crazy...because Ziz was pretty clearly crazy (see their entire theory of hemispheres, “collapse the timeline,” etc.) so I marked most of their claims as delusions or manipulations and moved on, especially since their recounting of other events on the page where they talked about miricult (which is linked in OP) comes off as completely unhinged.
But Zack confirming this meeting happened and vaguely confirming its contents completely changes all the probabilities. I now need to go back and recalculate a ton of likelihoods here starting from “this node with Vassar saying this event happened.”
From Ziz’s page:
LessWrong dev Oliver Habryka said it would be inappropriate for me to post about this on LessWrong, the community’s central hub website that mostly made it. Suggested me saying this was defamation.
It’s obviously not defamation since Ziz believes its true.
<insert list of rationality community platforms I’ve been banned from for revealing the statutory rape coverup by blackmail payout with misappropriated donor funds and whistleblower silencing, and Gwen as well for protesting that fact.>
Inasmuch as this is true, this is weak Bayesian evidence that Ziz’s accusations are more true than false because otherwise you would just post something like your above response to me in response to them. “No, actually official people can’t talk about this because there’s an NDA, but I’ve heard second hand there’s an NDA” clears a lot up, and would have been advantageous to post earlier, so why wasn’t it?
The second half (just live off donations?) is also my interpretation of OP. The first half (workable alignment plan?) is my own intuition based on MIRI mostly not accomplishing anything of note over the last decade, and...
MIRI & company spent a decade working on decision theory which seems irrelevant if deep learning is the path (aside: and how would you face Omega if you were the sort of agent that pays out blackmail?). Yudkowsky offers to bet Demis Hassabis that Go won’t be solved in the short term. They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively. They predict fast takeoff and FOOM.
Ooops.
The answer was actually deep learning and not systems with utility functions. Go gets solved. Deep Learning systems don’t look like they FOOM. Stochastic Gradient Descent doesn’t look like it will treacherous turn. Yudkowsky’s dream of building the singleton Sysop is gone and was probably never achievable in the first place.
People double down with the “mesaoptimizer” frame instead of admitting that it looks like SGD does what it says on the tin. Yudkowsky goes on a doom media spree. They advocate for a regulatory regime that would be very easy to empower private interests over public interests. Enraging to me, there’s a pattern of engagement where it seems like AI Doomers will only interact with weak arguments instead of strong ones: Yud mostly argues with low quality e/accs on twitter where it’s easy to score Ws; it was mildly surprising when he even responded with “This is kinda long.” to Quinton Pope’s objection thread.
What should MIRI have done, had they taken the good sliver of The Sequences to heart? They should have said oops. The should have halted, melted and caught fire. They should have acknowledged that the sky was blue. They should have radically changed their minds when the facts changed. But that would have cut off their funding. If the world isn’t going to end from a FOOMing AI, why should MIRI get paid?
So what am I supposed to extract from this pattern of behaviour?
- 19 Jul 2023 19:39 UTC; 2 points) 's comment on A Hill of Validity in Defense of Meaning by (
- 19 Jul 2023 18:34 UTC; -1 points) 's comment on A Hill of Validity in Defense of Meaning by (
It’s not exactly the point of your story, but...
Probably the most ultimately consequential part of this meeting was Michael verbally confirming to Ziz that MIRI had settled with a disgruntled former employee, Louie Helm, who had put up a website slandering them.
Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren’t just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren’t at least partially true...or if someone were to go digging, they’d find things even more damning?
Those who are savvy in high-corruption equilibria maintain the delusion that high corruption is common knowledge, to justify expropriating those who naively don’t play along, by narratizing them as already knowing and therefore intentionally attacking people, rather than being lied to and confused.
Ouch.
[..]Regardless of the initial intent, scrupulous rationalists were paying rent to something claiming moral authority, which had no concrete specific plan to do anything other than run out the clock, maintaining a facsimile of dialogue in ways well-calibrated to continue to generate revenue.
Really ouch.
So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock. I donated a six figure amount to MIRI over the years, working my ass off to earn to give...and that’s it?
Fuck.
I remember being at a party in 2015 and asking Michael what else I should spend my San Francisco software engineer money on, if not the EA charities I was considering. I was surprised when his answer was, “You.”
That sounds like wise advice.
- 19 Jul 2023 4:40 UTC; 10 points) 's comment on A Hill of Validity in Defense of Meaning by (
Just to check, has anyone actually done that?
I’m thinking of a specific recent episode where [i can’t remember if it was AI Safety Memes or Connor Leahy’s twitter account] posted a big meme about AI Risk Deniers and this really triggered Alexandros Marinos. (I tried to use Twitter search to find this again, but couldn’t.)
It’s quite commonly used by a bunch of people at Constellation, Open Philanthropy and some adjacent spaces in Berkeley.
Fascinating. I was unaware it was used IRL. From the Twitter user viewpoint, my sense is that it’s mostly used by people who don’t believe in the AI risk narrative as a pejorative.
Why are you posting this here? My model is that the people you want to convince aren’t on LessWrong and that you should be trying to argue this on Twitter; you included screenshots from that site, after all.
(My model of the AI critics would be that they’d shrug and say “you started it by calling us AI Risk Deniers.”)
My understanding of your point is that Mason was crazy because his plans didn’t follow from his premise and had nothing to do with his core ideas. I agree, but I do not think that’s relevant.
I am pushing back because, if you are St. Petersberg Paradox-pilled like SBF and make public statements that actually you should keep taking double or nothing bets, perhaps you are more likely to make tragic betting decisions and that’s because of you’re taking certain ideas seriously. If you have galaxy brained the idea of the St. Petersberg Paradox, it seems like Alameda style fraud is +EV.
I am pushing back because, if you believe that you are constantly being simulated to see what sort of decision agent you are, you are going to react extremely to every slight and that’s because you’re taking certain ideas seriously. If you have galaxy brained the idea that you’re being simulated to see how you react, killing Jamie’s parents isn’t even really killing Jamie’s parents, it’s showing what sort of decision agent you are to your simulators.
In both cases, they did X because they believe Y which implies X seems like a more parsimonious explanation for their behaviour.
(To be clear: I endorse neither of these ideas, even if I was previously positive on MIRI style decision theory research.)
But then they go and (allegedly) waste Jamie Zajko’s parents in a manner that doesn’t further their stated goals at all and makes no tactical sense to anyone thinking coherently about their situation.
And yet that seems entirely in line with the “Collapse the Timeline” line of thinking that Ziz advocated.
Ditto for FTX, which, when one business failed, decided to commit multi-billion dollar fraud via their other actually successfully business, instead of just shutting down alameda and hoping that the lenders wouldn’t be able to repo too much of the exchange.
And yet, that seems like the correct action if you sufficiently bullet bite expected value and the St. Petersberg Paradox, which SBF did repeatedly in interviews.
I suggest a more straightforward model: taking ideas seriously isn’t healthy. Most of the attempts to paint SBF as not really an EA seem like weird reputational saving throws when he was around very early on and had rather deep conviction in things like the St. Petersburg Paradox...which seems like a large part of what destroyed FTX. And Ziz seemed to be one of the few people to take the decision theoretical “you should always act as if you’re being simulated to see what sort of decision agent you are” idea seriously...and followed that to their downfall. I read the Sequences, get convinced by the arguments within, donate a six figure sum to MIRI...and have basically nothing to show for it at pretty serious opportunity costs. (And that’s before considering Ziz’s pretty interesting claims about how MIRI spent donor money.)
In all of these cases, the problem was individual confidence in ideas, not social effects.
My model is instead that the sort of people who are there to fit in aren’t the people who go crazy; there are plenty of people in the pews who are there for the church but not the religion. The MOPs and Sociopaths seem to be much, much saner than the Geeks. If that’s right, rationality has something much more fundamentally wrong with it.
As a final note, looking back at how AI actually developed, it’s pretty striking that there aren’t really maximizing AIs out there. Does a LLM take ideas seriously? Do they have anything that we’d recognize as a ‘utility function’? It doesn’t look like it, but we were promised that the AIs were a danger because they would learn about the world and would then take their ideas about what would happen if they did X vs Y to minmax some objective function. But errors compound.
The passage is fascinating because the conclusion looks so self-evidently wrong from our perspective. Agents with the same goals are in contention with each other? Agents with different goals get along? What!?
Is this actually wrong? It seems to be a more math flavored restatement of Girardian mimesis, and how mimesis minimizes distinction which causes rivalry and conflict.
I was going to write something saying “no actually we have the word genocide to describe the destruction of a peoples,” but walked away because I didn’t think that’d be a productive argument for either of us. But after sleeping on it, I want to respond to your other point:
I don’t think the orthogonality thesis is true in humans (i.e. I think smarter humans tend to be more value aligned with me); and sometimes making non-value-aligned agents smarter is good for you (I’d rather play iterated prisoner’s dilemma with someone smart enough to play tit-for-tat than someone who can only choose between being CooperateBot or DefectBot).
My actual experience over the last decade is that some form of the above statement isn’t true. As a large human model trained on decades of interaction, my immediate response to querying my own next experience predictor in situations around interacting with smarter humans is: no strong correlation with my values and will defect unless there’s a very strong enforcement mechanism (especially in finance, business and management). (Presumably because in our society, most games aren’t iterated—or if they are iterated are closer to the dictator game instead of the prisoner’s dilemma—but I’m very uncertain about causes and am much more worried about previous observed outputs.)
I suspect that this isn’t going to be convincing to you because I’m giving you the output of a fuzzy statistical model instead of giving you a logical verbalized step by step argument. But the deeper crux is that I believe “The Rationalists” heavily over-weigh the second and under-weigh the first, when the first is a much more reliable source of information: it was generated by entanglement with reality in a way that mere arguments aren’t.
And I suspect that’s a large part of the reason why we—and I include myself with the Rationalists at that point in time—were blindsided by deep learning and connectionism winning: we expected intelligence to require some sort of symbolic reasoning and focusing on explicit utility functions and formal decision theory and maximizing things...and none of that seems even relevant to the actual intelligences we’ve made, which are doing fuzzy statistical learning on their training sets, arguably, just the way we are.
This is kind of the point where I despair about LessWrong and the rationalist community.
While I agree that he did not call for nuclear first strikes on AI centers, he said:
If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
and
Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.
Asking us to be OK with provoking a nuclear second strike by attacking a nation that is not actually a signatory to an international agreement banning building gpu clusters that’s building a gpu cluster is actually still bad, and whether the nukes fly as part of the first strike or the retaliatory second strike seems like a weird thing to get hung up on. Picking this nit feels like a deflection because what Eliezer said in the TIME article is still entirely deranged and outside international norms.
And emotionally, I feel really, really uncomfortable. Like, sort of dread in stomach uncomfortable.
- 10 Oct 2024 17:13 UTC; 1 point) 's comment on Joshua Achiam Public Statement Analysis by (
Yeah, see, my equivalent of making ominous noises about the Second Amendment is to hint vaguely that there are all these geneticists around, and gene sequencing is pretty cheap now, and there’s this thing called CRISPR, and they can probably figure out how to make a flu virus that cures Borderer culture by excising whatever genes are correlated with that and adding genes correlated with greater intelligence. Not that I’m saying anyone should try something like that if a certain person became US President. Just saying, you know, somebody might think of it.
Reading it again almost 7 years later, it’s just so fractaly bad. There are people out there with guns, while the proposed technology to CRISPR a flu that gene changes people’s genes is science fiction so they top frame is nonsense. The actual viral payload, if such a thing could exist, would be genocide of a people (no you do not need to kill people to be genocide, this is still a central example). The idea wouldn’t work for so many reasons: a) peoples are a genetic distribution cluster instead of a set of Gene A, Gene B, Gene C; b) we don’t know all of these genes; c) in other contexts, Yudkowsky’s big idea is the orthogonality thesis so focusing on making his outgroup smarter is sort of weird; d) actually, the minimum message length of this virus would be unwieldy even if we knew all of the genes to target to the point where I don’t know whether this would be feasible even if we had viruses that could do small gene edits; and of course, e) this is all a cheap shot where he’s calling for genocide over partisan politics which we can now clearly say: the Trump presidency was not a thing to call for a genocide of his voters over.
(In retrospect (and with the knowledge that these sorts of statements are always narrativizing a more complex past), this post was roughly the inflection point where I went gradually started moving from “Yudkowsky is a genius who is one of the few people thinking about the world’s biggest problems” to “lol, what’s Big Yud catastrophizing about today?” First seeing that he was wrong about some things meant that it was easier to think critically about other things he said, and here we are today, but that’s dragging the conversation in a very different direction than your OP.)
Over the years roughly between 2015 and 2020 (though I might be off by a year or two), it seemed to me like numerous AI safety advocates were incredibly rude to LeCun, both online and in private communications.
I think this generalizes to more than LeCun. Screencaps of Yudkowsky’s Genocide the Borderers Facebook post still circulated around right wing social media in response to mentions of him for years, which makes forming any large coalition rather difficult. Would you trust someone who posted that with power over your future if you were a Borderer or had values similar to them?
(Or at least it was the goto post until Yudkowsky posted that infanticide up to 18 months wasn’t bad in response to a Caplan poll. Now that’s the post used to dismiss anything Yudkowsky says.)
Redwood Research used to have a project about trying to prevent a model from outputting text where a human got hurt, which IIRC, they did primarily by trying to fine tunes and adversarial training. (Followup). It would be interesting to see if one could achieve better results then they did at the time through subtracting some sort of hurt/violence vector.
Firstly, it suggests that open-source models are improving rapidly because people are able to iterate on top of each other’s improvements and try out a much larger number of experiments than a small team at a single company possibly could.
Widely, does this come as a surprise? I recall back to the GPT2 days where the 4chan and Twitter users of AIDungeon discovered various prompting techniques we use today. More access means more people trying more things, and this should already be our base case because of how open participation in open source has advanced and improved OSS projects.
I’m worried that up until now, this community has been too focused on the threat of big companies pushing capabilities ahead and not focused enough on the threat posed by open-source AI. I would love to see more discussions of regulations in order to mitigate this risk. I suspect it would be possible to significantly hamper these projects by making the developers of these projects potentially liable for any resulting misuse.
I have no idea how you think this would work.
First, any attempt at weakening liability waivers will cause immediate opposition by the entire software industry. (I don’t even know under what legal theory of liability this would even operate.) Remember under American law, code is free speech. So...second, in the case you’re somehow (somehow!) able to pass something (while there’s a politicized and deadlocked legislature) where a coalition that includes the entire tech industry is lobbying against it and there isn’t an immediate prior restraint to speech challenge...what do you think you’re going to do? Go after the mostly anonymous model trainers? A lot of these people are random Joe Schmoes with no assets. Some of the SD model trainers which aren’t anonymous already have shell corporations set up, both to shield their real identities and to preemptively tank liability in case of artist nuisance lawsuits.
I have a very strong bias about the actors involved, so instead I’ll say:
Perhaps LessWrong 2.0 was a mistake and the site should have been left to go read only.
My recollection was that the hope was to get a diverse diaspora to post in one spot again. Instead of people posting on their own blogs and tumblrs, the intention was to shove everyone back into one room. But with a diverse diaspora, you can have local norms to a cluster of people. But now when everyone is trying to be crammed into one site, there is an incentive to fight over global norms and attempt to enforce them on others.
This response is enraging.
Here is someone who has attempted to grapple with the intellectual content of your ideas and your response is “This is kinda long.”? I shouldn’t be that surprised because, IIRC, you said something similar in response to Zack Davis’ essays on the Map and Territory distinction, but that’s ancillary and AI is core to your memeplex.
I have heard repeated claims that people don’t engage with the alignment communities’ ideas (recent example from yesterday). But here is someone who did the work. Please explain why your response here does not cause people to believe there’s no reason to engage with your ideas because you will brush them off. Yes, nutpicking e/accs on Twitter is much easier and probably more hedonic, but they’re not convincible and Quinton here is.
But POC||GTFO is really important to constraining your expectations. We do not really worry about Rowhammer since the few POCs are hard, slow and impractical. We worry about Meltdown and other speculative execution attacks because Meltdown shipped with a POC that read passwords from a password manager in a different process, was exploitable from within Chrome’s sandbox, and my understanding is that POCs like that were the only reason Intel was made to take it seriously.
Meanwhile, Rowhammer is maybe a real issue but is so hard to pull off consistently and stealthily that nobody worries about it. My recollection was when it was first discovered, people didn’t panic that much because there wasn’t warrant to panic. OK, so there was a problem with the DRAM. OK, what are the constraints on exploitation? Oh, the POCs are super tricky to pull off and will often make the machine hard to use during exploitation?
A POC provides warrant to believe in something.