gwern

Karma: 71,489

https://gwern.net

gwern 31 May 2024 21:35 UTC
61 points
21
in reply to: TurnTrout’s comment on: MIRI 2024 Communications Strategy

In summer 2022, Quintin Pope was explaining the results of the ROME paper to Eliezer. Eliezer impatiently interrupted him and said “so they found that facts were stored in the attention layers, so what?”. Of course, this was exactly wrong—Bau et al. found the circuits in mid-network MLPs. Yet, there was no visible moment of “oops” for Eliezer.

I think I am missing context here. Why is that distinction between facts localized in attention layers and in MLP layers so earth-shaking Eliezer should have been shocked and awed by a quick guess during conversation being wrong, and is so revealing an anecdote you feel that it is the capstone of your comment, crystallizing everything wrong about Eliezer into a story?

gwern 31 May 2024 17:19 UTC
7 points
0
in reply to: Said Achmiz’s comment on: Web-surfing tips for strange times
An additional example: for the past several years, maybe 5+, you have been unable to copy-paste the list of authors on every Elsevier research paper’s HTML version in Firefox. Works fine in Chrome and Safari AFAIK (not to mention every other academic website I can think of). As I am frequently doing so, this is a problem on an almost literally daily basis.

After 6 months of back and forth and quite a lot of BS from the Elsevier end, they have finally fixed this today.

gwern 31 May 2024 2:09 UTC
4 points
0
on: One way violinists fail
There are people who are anxious or who have imposter syndrome or make defensive excuses in every field, but this sounds like the situation in violin-playing is far beyond not just classical music but any other field I can think of. What makes violin this extreme?

gwern 31 May 2024 1:55 UTC
36 points
7
in reply to: Dana’s comment on: OpenAI: Helen Toner Speaks

Altman’s failed attempt to transition to chairman/advisor to YC

Of some relevance in this context is that Altman has apparently for years been claiming to be YC Chairman (including in filings to the SEC): https://www.bizjournals.com/sanfrancisco/inno/stories/news/2024/04/15/sam-altman-y-combinator-board-chair.html

gwern 29 May 2024 15:01 UTC
7 points
5
in reply to: jacquesthibs’s comment on: jacquesthibs’s Shortform
This doesn’t really seem like a meaningful question. Of course “AI” will be “scaffolded”. But what is the “AI”? It’s not a natural kind. It’s just where you draw the boundaries for convenience.

An “AI” which “reaches out to a more powerful AI” is not meaningful—one could say the same thing of your brain! Or a Mixture-of-Experts model, or speculative decoding (both already in widespread use). Some tasks are harder than others, and different amounts of computation get brought to bear by the system as a whole, and that’s just part of the learned algorithms it embodies and where the smarts come from. Or one could say it of your computer: different things take different paths through your “computer”, ping-ponging through a bunch of chips and parts of chips as appropriate.

Do you muse about living in a world where for ‘scarcity of compute’ reasons your computer is a ‘scaffolded computer world’ where highly intelligent chips will essentially delegate tasks to weaker chips so long as it knows that the weaker (maybe highly specialized ASIC) chip is capable of reliably doing that task...? No. You don’t care about that. That’s just details of internal architecture which you treat as a black box.

(And that argument doesn’t protect humans for the same reason it didn’t protect, say, chimpanzees or Neanderthals or horses. Comparative advantage is extremely fragile.)

gwern 28 May 2024 19:21 UTC
6 points
1
in reply to: Bogdan Ionut Cirstea’s comment on: Bogdan Ionut Cirstea’s Shortform
It’s not an infinite set and returns diminish, but that’s true of regular capabilities too, no? And you can totally imagine new kinds of dangerous capabilities; every time LLMs gain a new modality or data source, they get a new set of vulnerabilities/dangers. For example, once Sydney went live, you had a whole new kind of dangerous capability in terms of persisting knowledge/attitudes across episodes of unrelated users by generating transcripts which would become available by Bing Search. This would have been difficult to test before, and no one experimented with it AFAIK. But after seeing indirect prompt injections in the wild and possible amplification of the Sydney personae, now suddenly people start caring about this once-theoretical possibility and might start evaluating it. (This is also a reason why returns don’t diminish as much, because benchmarks ‘rot’: quite aside from intrinsic temporal drift and ceiling issues and new areas of dangers opening up, there’s leakage, which is just as relevant to dangerous capabilities as regular capabilities—OK, you RLHFed your model to not provide methamphetamine recipes and this has become a standard for release, but it only works on meth recipes and not other recipes because no one in your org actually cares and did only the minimum RLHF to pass the eval and provide the execs with excuses, and you used off-the-shelf preference datasets designed to do the minimum, like Facebook releasing Llama… Even if it’s not leakage in the most literal sense of memorizing the exact wording of a question, there’s still ‘meta-leakage’ of overfitting to that sort of question.)

gwern 28 May 2024 16:53 UTC
4 points
1
in reply to: Bogdan Ionut Cirstea’s comment on: Bogdan Ionut Cirstea’s Shortform
There’s still lots and lots of demand for regular capability evaluation, as we keep discovering new issues or LLMs keep tearing through them and rendering them moot, and the cost of creating a meaningful dataset like GPQA keeps skyrocketing (like ~$1000/item) compared to the old days where you could casually Turk your way to questions LLM would fail (like <$1/item). Why think that the dangerous subset would be any different? You think someone is going to come out with a dangerous-capabilities eval in the next year and then that’s it, it’s done, we’ve solved dangerous-capabilities eval, Mission Accomplished?

gwern 27 May 2024 14:42 UTC
3 points
1
in reply to: waveman’s comment on: Spaghetti Towers
http://www.laputan.org/mud/mud.html is a great writeup on big balls of mud.

gwern 26 May 2024 17:44 UTC
28 points
20
on: Notifications Received in 30 Minutes of Class

I asked students if they would want to press a magic button that would permanently delete all social media and messaging apps from the phones of their friend groups if nobody knew it was them. I got only a couple takers. There was more (but far from majority) enthusiasm for deleting all such apps from the whole world. I suspect rates would have been higher if I had asked this as an anonymous written question, but probably not much higher.

An anonymous response, or better yet, a list deletion or Bayesian truth serum question, would certainly be worth doing.

It’s also striking that you get anywhere close to a majority for that question. (And consistent with Haidt and the other willingness-to-pay experiment about the harms of social media being a systemic problem, where you are unable to optout on an individual, or even very narrow friend-group basis, because that still leaves the rest of society using it.) Imagine asking that question about, say, indoor plumbing or electricity or vaccines? I suspect even much more controversial technologies like cars or video games would still get many fewer votes for deletion at any scale.

gwern 24 May 2024 21:50 UTC
2 points
2
in reply to: mako yass’s comment on: What comes after Roam’s renaissance?
Subconscious sounds like a different codebase and group, which is not paying any money to Roam, so I would define it as ‘not the Roam community’ and illustrating the problem for anyone who wants to do a startup on ‘Roam 2’.

gwern 24 May 2024 21:16 UTC
6 points
1
in reply to: Celer’s comment on: Common misconceptions about OpenAI
Hilton has posted on Twitter that he is no longer bound: https://x.com/JacobHHilton/status/1794090554730639591

gwern 24 May 2024 21:14 UTC
6 points
2
in reply to: lukehmiles’s comment on: jacquesthibs’s Shortform
I would not consider her claims worth including in a list of top items for people looking for an overview, as they are hard to verify or dubious (her comments are generally bad enough to earn flagging on their own), aside from possibly the inheritance one—as that should be objectively verifiable, at least in theory, and lines up better with the other items.

gwern 24 May 2024 20:32 UTC
38 points
13
in reply to: peterbarnett’s comment on: peterbarnett’s Shortform

I don’t really think that this is super important for “fragility of value”-type concerns, but probably is important for people who think we will easily be able to understand the features/internals of LLMs

I’m not surprised if the features aren’t 100% clean, because this is after all a preliminary research prototype of a small approximation of a medium-sized version of a still sub-AGI LLM.

But I am a little more concerned that this is the first I’ve seen anyone notice that the cherrypicked, single, chosen example of what is apparently a straightforward, familiar, concrete (literally) concept, which people have been playing with interactively for days, is clearly dirty and not actually a ‘Golden Gate Bridge feature’. This suggests it is not hard to fool a lot of people with an ‘interpretable feature’ which is still quite far from the human concept. And if you believe that it’s not super important for fragility-of-value because it’d have feasible fixes if noticed, how do you know anyone will notice?

gwern 23 May 2024 16:30 UTC
4 points
0
in reply to: Unnamed’s comment on: robo’s Shortform
Yes, I did read the paper. And those are still extremely small effects and almost no predictability, no matter how you graph them (note the axis truncation and sparsity), even before you get into the question of “what is the causal status of any of these claims and why we are assuming that interest groups in support precede rather than follow success?”

gwern 23 May 2024 15:25 UTC
6 points
2
in reply to: robo’s comment on: jacquesthibs’s Shortform

Collecting evidence in that points in only one direction just sets off huge warning lights 🚨🚨🚨🚨 I can’t quiet.

Yes, it should. And that’s why people are currently digging so hard in the other direction, as they begin to appreciate to what extent they have previously had evidence that only pointed in one direction and badly misinterpreted things like, say, Paul Graham’s tweets or YC blog post edits or ex-OAer statements.

gwern 23 May 2024 15:21 UTC
4 points
0
in reply to: Mitchell_Porter’s comment on: robo’s Shortform

compared to the opinion of economic elites

Note that’s also a way of saying the economic elites had little effect too, in the “Twice nothing is still nothing” sort of way: https://80000hours.org/podcast/episodes/spencer-greenberg-stopping-valueless-papers/#importance-hacking-001823

...But one example that comes to mind is this paper, “Testing theories of American politics: Elites, interest groups, and average citizens.” The basic idea of the paper was they were trying to see what actually predicts what ends up happening in society, what policies get passed. Is it the view of the elites? Is it the view of interest groups? Or is it the view of what average citizens want?

And they have a kind of shocking conclusion. Here are the coefficients that they report: Preference of average citizens, how much they matter, is 0.03. Preference of economic elites, 0.76. Oh, my gosh, that’s so much bigger, right? Alignment of interest groups, like what the interest groups think, 0.56. So almost as strong as the economic elites. So it’s kind of a shocking result. It’s like, “Oh my gosh, society is just determined by what economic elites and interest groups think, and not at all by average citizens,” right?

Rob Wiblin: I remember this paper super well, because it was covered like wall-to-wall in the media at some point. And I remember, you know, it was all over Reddit and Hacker News. It was a bit of a sensation.

Spencer Greenberg: Yeah. So this often happens to me when I’m reading papers. I’m like, “Oh, wow, that’s fascinating.” And then I come to like a table in Appendix 7 or whatever, and I’m like, “What the hell?”

And so in this case, the particular line that really throws me for a loop is the R² number. The R² measures the percentage of variance that’s explained by the model. So this is a model where they’re trying to predict what policies get passed using the preferences of average citizens, economic elites, and interest groups. Take it all together into one model. Drum roll: what’s the R²? 0.07. They’re able to explain 7% of the variance of what happens using this information.

Rob Wiblin: OK, so they were trying to explain what policies got passed and they had opinion polls for elites, for interest groups, and for ordinary people. And they could only explain 7% of the variation in what policies got up? Which is negligible.

Spencer Greenberg: So my takeaway is that they failed to explain why policies get passed. That’s the result. We have no idea why policies are getting passed.

gwern 22 May 2024 19:29 UTC
3 points
−1
in reply to: ryan_greenblatt’s comment on: A concrete bet offer to those with short AI timelines
Also an issue is that if MATH is contaminated, you’d think GSM8k would be contaminated too, but Scale just made a GSM1k and in it, GPT/Claude are minimally overfit (although in both of these papers, the Chinese & Mistral models usually appear considerably more overfit than GPT/Claude). Note that Scale made extensive efforts to equalize difficulty and similarity of the GSM1k with GSM8k, which this Consequent AI paper on MATH does not, and discussed the methodological issues which complicate re-benchmarking.

gwern 21 May 2024 21:52 UTC
17 points
4
in reply to: gwern’s comment on: OpenAI: Facts from a Weekend

I suspect there is much more to this thread, and it may tie back to Superalignment & broken promises about compute-quotas.

The Superalignment compute-quota flashpoint is now confirmed. Aside from Jan Leike explicitly calling out compute-quota shortages post-coup (which strictly speaking doesn’t confirm shortages pre-coup), Fortune is now reporting that this was a serious & longstanding issue:

...According to a half-dozen sources familiar with the functioning of OpenAI’s Superalignment team, OpenAI never fulfilled its commitment to provide the team with 20% of its computing power.

Instead, according to the sources, the team repeatedly saw its requests for access to graphics processing units, the specialized computer chips needed to train and run AI applications, turned down by OpenAI’s leadership, even though the team’s total compute budget never came close to the promised 20% threshold.

The revelations call into question how serious OpenAI ever was about honoring its public pledge, and whether other public commitments the company makes should be trusted. OpenAI did not respond to requests to comment for this story.

...It was a task so important that the company said in its announcement that it would commit “20% of the compute we’ve secured to date over the next four years” to the effort. But a half-dozen sources familiar with the Superalignment team’s work said that the group was never allocated this compute. Instead, it received far less in the company’s regular compute allocation budget, which is reassessed quarterly.

One source familiar with the Superalignment team’s work said that there were never any clear metrics around exactly how the 20% amount was to be calculated, leaving it subject to wide interpretation. For instance, the source said the team was never told whether the promise meant “20% each year for four years” or “5% a year for four years” or some variable amount that could wind up being “1% or 2% for the first three years, and then the bulk of the commitment in the fourth year.” In any case, all the sources Fortune spoke to for this story confirmed that the Superalignment team was never given anything close to 20% of OpenAI’s secured compute as of July 2023.

OpenAI researchers can also make requests for what is known as “flex” compute—access to additional GPU capacity beyond what has been budgeted—to deal with new projects between the quarterly budgeting meetings. But flex requests from the Superalignment team were routinely rejected by higher-ups, these sources said.

Bob McGrew, OpenAI’s vice president of research, was the executive who informed the team that these requests were being declined, the sources said, but others at the company, including chief technology officer Mira Murati, were involved in making the decisions. Neither McGrew nor Murati responded to requests to comment for this story.

While the team did carry out some research—it released a paper detailing its experiments in successfully getting a less powerful AI model to control a more powerful one in December 2023—the lack of compute stymied the team’s more ambitious ideas, the source said. After resigning, Leike on Friday published a series of posts on Twitter in which he criticized his former employer, saying “safety culture and processes have taken a backseat to shiny products.” He also said that “over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.”

5 sources familiar with the Superalignment team’s work backed up Leike’s account, saying that the problems with accessing compute worsened in the wake of the pre-Thanksgiving showdown between Altman and the board of the OpenAI nonprofit foundation.

...One source disputed the way the other sources Fortune spoke to characterized the compute problems the Superalignment team faced, saying they predated Sutskever’s participation in the failed coup, plaguing the group from the get-go.

While there have been some reports that Sutskever was continuing to co-lead the Superalignment team remotely, sources familiar with the team’s work said this was not the case and that Sutskever had no access to the team’s work and played no role in directing the team after Thanksgiving. With Sutskever gone, the Superalignment team lost the only person on the team who had enough political capital within the organization to successfully argue for its compute allocation, the sources said.

...The people who spoke to Fortune did so anonymously, either because they said they feared losing their jobs, or because they feared losing vested equity in the company, or both. Employees who have left OpenAI have been forced to sign separation agreements that include a strict non-disparagement clause that says the company can claw back their vested equity if they criticize the company publicly, or if they even acknowledge the clause’s existence. And employees have been told that anyone who refuses to sign the separation agreement will forfeit their equity as well.

gwern 21 May 2024 3:26 UTC
32 points
15
in reply to: habryka’s comment on: Open Thread Spring 2024
It would be bad, I agree. (An NDA about what he worked on at OA, sure, but then being required to never say anything bad about OA forever, as a regulator who will be running evaluations etc...?) Fortunately, this is one of those rare situations where it is probably enough for Paul to simply say his OA NDA does not cover that—then either it doesn’t and can’t be a problem, or he has violated the NDA’s gag order by talking about it and when OA then fails to sue him to enforce it, the NDA becomes moot.

gwern 21 May 2024 2:30 UTC
11 points
1
on: [Linkpost] Statement from Scarlett Johansson on OpenAI’s use of the “Sky” voice, that was shockingly similar to her own voice.
Sam Altman has apparently provided a statement to NPR apropos of https://www.npr.org/2024/05/20/1252495087/openai-pulls-ai-voice-that-was-compared-to-scarlett-johansson-in-the-movie-her , quoted on Twitter by the NPR journalist (second):

...In response, Sam Altman has now issued a statement saying “Sky is not Scarlett Johansson’s, and it was never intended to resemble hers.”

“We cast the voice actor behind Sky’s voice before any outreach to Ms. Johansson. Out of respect for Ms. Johansson, we have paused using Sky’s voice in our products. We are sorry to Ms. Johansson that we didn’t communicate better”, Sam Altman wrote.”

To clarify a few things about the letters from ScarJo’s lawyers: They weren’t cease and desist notices. It’s not initiating a lawsuit. The letters sought clarity about what exactly was fed to the model to produce the distinctive “Sky” voice.

(Yeah, the Altman statement seems to be emailed to journalists, despite being short. Not sure why he’s not just tweeting like previously.)