StellaAthena

Karma: 617

StellaAthena Jan 26, 2024, 1:20 AM
5 points
4
in reply to: Haiku’s comment on: RAND report finds no effect of current LLMs on viability of bioterrorism attacks
This is one area where I hope the USG will be able to exert coercive force to bring companies to heel. Early access evals, access to base models, and access to training data seem like no-brainers from a regulatory POV.

StellaAthena Jan 25, 2024, 9:45 PM
4 points
3
in reply to: seed’s comment on: RAND report finds no effect of current LLMs on viability of bioterrorism attacks
I think you’re misrepresenting Gwern’s argument. He’s arguing that terrorists are not optimizing for killing the most people. He makes no claims about whether terrorists are scientifically incompetent.

StellaAthena Jan 25, 2024, 9:38 PM
9 points
0
in reply to: habryka’s comment on: RAND report finds no effect of current LLMs on viability of bioterrorism attacks
Thanks! I like your title more :)

RAND report finds no effect of current LLMs on viability of bioterrorism attacks

StellaAthenaJan 25, 2024, 7:17 PM

94 points

14 comments1 min readLW link

(www.rand.org)

StellaAthena Jan 16, 2024, 2:32 AM
8 points
−5
in reply to: kave’s comment on: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

It seems helpful to me if policy discussions can include phrases like “the evidence suggests that if the current ML systems were trying to deceive us, we wouldn’t be able to change them not to”.

I take this as evidence that TurnTrout’s fears about this paper are well-grounded. This claim is not meaningfully supported by the paper, but I expect many people to repeat it as if it is supported by the paper.

StellaAthena Dec 31, 2023, 1:18 PM
3 points
2
in reply to: habryka’s comment on: Integrity in AI Governance and Advocacy
We ended up talking about this in DMs, but to gist of it is:
Back in June Hoagy opened a thread in our “community research projects” channel and the work migrated there. Three of the five authors of the [eventual paper](https://arxiv.org/abs/2309.08600) chose to have EleutherAI affiliation (for any work we organize with volunteers, we tell them they’re welcome to use an EleutherAI affiliation on the paper if they like) and we now have an entire channel dedicated to future work. I believe Hoagy has two separate paper ideas currently in the works and over a half dozen people working on them.

StellaAthena Dec 31, 2023, 3:59 AM
4 points
0
in reply to: habryka’s comment on: Integrity in AI Governance and Advocacy
Ooops. It appeared that I deleted my comment (deeming it largely off-topic) right as you were replying. I’ll reproduce the comment below, and then reply to your question.

I separately had a very weird experience with them on the Long Term Future Fund where Conor Leahy applied for funding for Eleuther AI. We told him we didn’t want to fund Eleuther AI since it sure mostly seemed like capabilities-research but we would be pretty interested in funding AI Alignment research by some of the same people. He then confusingly went around to a lot of people around EleutherAI and told them that “Open Phil is not interested in funding pre-paradigmatic AI Alignment research and that that is the reason why they didn’t fund Eleuther AI”. This was doubly confusing and misleading because Open Phil had never evaluated a grant to Eleuther AI (Asya who works at Open Phil was involved in the grant evaluation as a fund member, but nothing else), and of course the reason he cited had nothing to do with the reason we actually gave. He seems to have kept saying this for a long time even after I think someone explicitly corrected the statement to

While this anecdote is largely orthogonal to the broader piece, I remembered that this existed randomly today and wanted to mention that Open Phil has recommended a 2.6 M/3 years grant to EleutherAI to pursue interpretability research. It was a really pleasant and very easy experience: Nora Belrose (head of interpretability) and I (head of everything) talked with them about some of our recent and on-going work such as Eliciting Latent Predictions from Transformers with the Tuned Lens, Eliciting Latent Knowledge from Quirky Language Models, and Sparse Autoencoders Find Highly Interpretable Features in Language Models very interesting and once they knew we had shared areas of interest it was a really easy experience.

I had no vibes along the lines of “oh we don’t like EleutherAI” or “we don’t fund pre-paradigmatic research.” It was a surprise to some people at Open Phil that we had areas of overlapping interest, but we spent like half an hour clarifying our research agenda and half an hour talking about what we wanted to do next and people were already excited.

StellaAthena Nov 2, 2023, 10:55 PM
9 points
15
in reply to: Chris_Leong’s comment on: Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk

I agree that a control group is vital for good science. Nonetheless, I think that such an experiment is valuable and informative, even if it doesn’t meet the high standards required by many professional science disciplines. I believe in the necessity of acting under uncertainty. Even with its flaws, this study is sufficient evidence for us to want to enact temporary regulation at the same time as we work to provide more robust evaluations.

But… this study doesn’t provide evidence that LLMs increase bioweapon risk.

StellaAthena Oct 9, 2023, 4:09 PM
1 point
0
in reply to: ryan_greenblatt’s comment on: Introducing the Center for AI Policy (& we’re hiring!)
It doesn’t let the government institute prior restraint on speech.

StellaAthena Sep 8, 2023, 8:37 PM
4 points
3
in reply to: Thomas Larsen’s comment on: Introducing the Center for AI Policy (& we’re hiring!)

So far, I’m confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn’t true, we’ll either rethink our proposals or remove this claim from our advocacy efforts.

It seems to me like you’ve received this feedback already in this very thread. The fact that you’re going to edit the claim to basically say “this doesn’t effect most people because most people don’t work on LLMs” completely dodges the actual issue here, which is that there’s a large non-profit and independent open source LLM community that this would heavily impact.

I applaud your honestly in admitting one approach you might take is to “remove this claim from our advocacy efforts,” but am quite sad to see that you don’t seem to care about limiting the impact of your regulation to potentially dangerous models.

StellaAthena Aug 30, 2023, 5:52 AM
14 points
0
in reply to: Holly_Elmore’s comment on: Introducing the Center for AI Policy (& we’re hiring!)
Nora didn’t say that this proposal is harmful. Nora said that if Zach’s explanation for the disconnect between their rhetoric and their stated policy goals is correct (namely that they don’t really know what they’re talking about) then their existence is likely net-harmful.

That said, yes requiring everyone who wants to finetune LLaMA 2 get a license would be absurd and harmful. la3orn and gallabyres articulate some reasons why in this thread.

Another reason is that it’s impossible to enforce, and passing laws or regulations and then not enforcing them is really bad for credibility.

Another reason is that the history of AI is a history of people ignoring laws and ethics so long as it makes them money and they can afford to pay the fines. Unless this regulation comes with fines so harsh that they remove all possibility of making money off of models, OpenAI et al. won’t be getting licenses. They’ll just pay the fines while small scale and indie devs (who allegedly the OP is specifically hoping to not impact) screech their work to a halt and wait for the government to tell them it’s okay for them to continue to do their work.

Also, such a regulation seems like it would be illegal in the US. While the government does have wide latitude to regulate commercial activities that impact multiple states, this is rather specifically a proposal that would regulate all activity (even models that never get released!). I’m unaware of any precedent for such an action, can you name one?

StellaAthena Aug 30, 2023, 3:40 AM
51 points
28
on: Introducing the Center for AI Policy (& we’re hiring!)

CAIP is also advised by experts from other organizations and is supported by many volunteers.

Who are the experts that advise you? Are claims like “our proposals will not impede the vast majority of AI developers” vetted by the developers you’re looking to avoid impacting?

StellaAthena Jul 30, 2023, 10:04 PM
3 points
0
on: Specific Arguments against open source LLMs?
It’s always interesting to see who has legitimacy in the eyes of mainstream media. The “other companies” mentioned are EleutherAI and Open Future, both of whom co-authored the letter, and LAION who signed it. All three orgs are major players in the open source AI space, and EAI & LAION are arguably bigger than GitHub and CC given that this is specifically about the impact of the EU AI Act on open source large scale AI R&D. Of course, MSN’s target audience hasn’t heard of EleutherAI or LAION.

Note that other orgs have also done blog posts on this topic: EleutherAI (co-written by me), Hugging Face, Creative Commons, Open Future.

StellaAthena Jun 12, 2023, 6:00 PM
5 points
0
in reply to: Max H’s comment on: Manifold Predicted the AI Extinction Statement and CAIS Wanted it Deleted
It’s extremely difficult to create a fraudulent company and get it listed on the NYSE. Additionally, the Exchange can and does stop trading on both individual stocks and the exchange as a whole, though due to the downstream effects on consumer confidence this is only done rarely.
I don’t know what lessons one should learn from the stock market regarding MM, but I don’t think we should rush to conclude MM shouldn’t intervene or shouldn’t be blamed for not intervening.

StellaAthena Jun 6, 2023, 10:23 PM
13 points
7
on: Terry Tao is hosting an “AI to Assist Mathematical Reasoning” workshop
I don’t understand the community obsession with Tao and recruiting him to work on alignment. This is a thing I hear about multiple times a year with no explanation of why it would be desirable other than “he’s famous for being very smart.”

I also don’t see why you’d think there’s be an opportunity to do this… it’s an online event, which heavily limits the ability to corner him in the hallway. It’s not even clear to me that you’d have an opportunity to speak with him… he’s moderating several discussions and panels, but any submitted questions to said events would go to the people actually in the discussions not the moderator.

Can you elaborate on what you’re actually thinking this would look like?

StellaAthena May 9, 2023, 6:30 AM
2 points
0
in reply to: VojtaKovarik’s comment on: White House Announces “New Actions to Promote Responsible AI Innovation”
Red teaming has always been a legitimate academic thing? I don’t know what background you’re coming from but… you’re very far off.

But yes, the event organizers will be writing a paper about it and publishing the data (after it’s been anonymized).

StellaAthena May 5, 2023, 12:21 AM
1 point
0
in reply to: RomanS’s comment on: White House Announces “New Actions to Promote Responsible AI Innovation”
What deployed LLM system does Tesla make that you think should be evaluated alongside ChatGPT, Bard, etc?

StellaAthena May 4, 2023, 11:48 PM
23 points
2
in reply to: M. Y. Zuo’s comment on: White House Announces “New Actions to Promote Responsible AI Innovation”
Hi, I’m helping support the event. I think that some mistranslation happened by a non-AI person. The event is about having humans get together and do prompt hacking and similar on a variety of models side-by-side. ScaleAI built the app that’s orchestrating the routing of info, model querying, and human interaction. Scale’s platform isn’t doing the evaluation itself. That’s being done by users on-site and then by ML and security researchers analyzing the data after the fact.

StellaAthena Mar 22, 2023, 12:14 AM
4 points
1
in reply to: cousin_it’s comment on: My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”
I think there’s a mistake here which kind of invalidates the whole post. Ice cream is exactly the kind of thing we’ve been trained to like. Liking ice cream is very much the correct response.

Everything outside the training distribution has some value assigned to it. Merely the fact that we like ice cream isn’t evidence that something’s gone wrong.

StellaAthena Mar 8, 2023, 1:32 PM
1 point
0
in reply to: Aaron_Scher’s comment on: The Waluigi Effect (mega-post)
I agree completely. This is a plausible explanation, but it’s one of many plausible explanations and should not be put forward as a fact without evidence. Unfortunately, said evidence is impossible to obtain due to OpenAI’s policies regarding access to their models. When powerful RLHF models begin to be openly released, people can start testing theories like this meaningfully.

StellaAthena

RAND re­port finds no effect of cur­rent LLMs on vi­a­bil­ity of bioter­ror­ism attacks

RAND report finds no effect of current LLMs on viability of bioterrorism attacks