Stephen Fowler

Karma: 879

Stephen Fowler 18 May 2024 9:47 UTC
93 points
−10
on: Stephen Fowler’s Shortform
On the OpenPhil / OpenAI Partnership
Epistemic Note:
The implications of this argument being true are quite substantial, and I do not have any knowledge of the internal workings of Open Phil.
(Both title and this note have been edited, cheers to Ben Pace for very constructive feedback.)
Premise 1:
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.
Premise 2:
This was the default outcome.
Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.
Edit: To clarify, you need to be skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).

Premise 3:
Without repercussions for terrible decisions, decision makers have no skin in the game.

Conclusion:
Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn’t be allowed anywhere near AI Safety decision making in the future.

To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.

This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.
To quote OpenPhil:
”OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela.”
What links here?
- OpenAI: Exodus by Zvi (20 May 2024 13:10 UTC; 141 points)
- Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom) by O O (19 May 2024 2:18 UTC; -3 points)

Race Along Rashomon Ridge

Stephen Fowler, Peter S. Park and MichaelEinhorn

7 Jul 2022 3:20 UTC

50 points

15 comments8 min readLW link

Stephen Fowler 26 Oct 2023 3:53 UTC
43 points
27
on: Architects of Our Own Demise: We Should Stop Developing AI
What I find incredible is how contributing to the development of existentially dangerous systems is viewed as a morally acceptable course of action within communities that on paper accept that AGI is a threat.

Both OpenAI and Anthropic are incredibly influential among AI safety researchers, despite both organisations being key players in bringing the advent of TAI ever closer.

Both organisations benefit from lexical confusion over the word “safety”.

The average person concerned with existential risk from AGI might assume “safety” means working to reduce the likelihood that we all die. They would be disheartened to learn that many “AI Safety” researchers are instead focused on making sure contemporary LLMs behave appropriately. Such “safety” research simply makes the contemporary technology more viable and profitable, driving investment and reducing timelines. There is to my knowledge no published research that proves these techniques will extend to controlling AGI in a useful way.*

OpenAI’s “Superalignment” plan is a more ambitious safety play.Their plan to “solve” alignment involves building a human level general intelligence within 4 years and then using this to automate alignment research.

But there are two obvious problems:
1. a human level general intelligence is already most of the way toward a superhuman general intelligence (simply give it more compute). Cynically, Superintelligence is a promise that OpenAI’s brightest safety researchers will be trying their hardest to bring about an AGI within 4 years.
2. The success of Superalignment means we are now in the position of trusting that a for-profit, private entity will only use the human level AI researchers to research safety, instead of making the incredibly obvious play of having virtual researchers research how to build the next generation better, smarter automated researchers.
To conclude, if it looks like a duck, swims like a duck and quacks like a duck, it’s a capabilities researcher.

*This point could (and probably should) be a post in itself. Why wouldn’t techniques that work on contemporary AI systems extend to AGI?

Pretend for a moment that you and I are silicon-based aliens who have recently discovered that carbon based lifeforms exist, and can be used to run calculations. Scientists have postulated that by creating complex enough carbon structures we could invent “thinking animals”. We anticipate that these strange creatures will be built in the near future and that they might be difficult to control.

As we can’t build thinking animals today, we are stuck studying single cell carbon organisms. A technique has just been discovered in which we can use a compound called “sugar” to influence the direction in which these simple organisms move.

Is it reasonable to then conclude that you will be able to predict and control the behaviour of much more complex, multicelled creature called a “human” by spreading sugar out on the ground?

Scaffolded LLMs: Less Obvious Concerns

Stephen Fowler16 Jun 2023 10:39 UTC

30 points

13 comments11 min readLW link

Ng and LeCun on the 6-Month Pause (Transcript)

Stephen Fowler9 Apr 2023 6:14 UTC

29 points

7 comments16 min readLW link

Stephen Fowler 20 Mar 2024 12:14 UTC
28 points
13
on: Increasing IQ by 10 Points is Possible
This is your second post and you’re still being vague about the method. I’m updating strongly towards this being a hoax and I’m surprised people are taking you seriously.

Edit: I’ll offer you a 50 USD even money bet that your method won’t replicate when tested by a 3rd party with more subjects and a proper control group.

Stephen Fowler 19 May 2024 12:12 UTC
27 points
14
in reply to: Buck’s comment on: Stephen Fowler’s Shortform
So the case for the grant wasn’t “we think it’s good to make OAI go faster/better”.
I agree. My intended meaning is not that the grant is bad because its purpose was to accelerate capabilities. I apologize that the original post was ambiguous

Rather, the grant was bad for numerous reasons, including but not limited to:
- It appears to have had an underwhelming governance impact (as demonstrated by the board being unable to remove Sam).
- It enabled OpenAI to “safety-wash” their product (although how important this has been is unclear to me.)
- From what I’ve seen at conferences and job boards, it seems reasonable to assert that the relationship between Open Phil and OpenAI has lead people to work at OpenAI.
- Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you’re only concerned with human misuse and not misalignment.
- Finally, it’s giving money directly to an organisation with the stated goal of producing an AGI. There is substantial negative -EV if the grant sped up timelines.
This last claim seems very important. I have not been able to find data that would let me confidently estimate OpenAI’s value at the time the grant was given. However, wikipedia mentions that “In 2017 OpenAI spent $7.9 million, or a quarter of its functional expenses, on cloud computing alone.” This certainly makes it seem that the grant provided OpenAI with a significant amount of capital, enough to have increased its research output.
Keep in mind, the grant needs to have generated 30 million in EV just to break even. I’m now going to suggest some other uses for the money, but keep in mind these are just rough estimates and I haven’t adjusted for inflation. I’m not claiming these are the best uses of 30 million dollars.
The money could have funded an organisation the size of MIRI for roughly a decade (basing my estimate on MIRI’s 2017 fundraiser, using 2020 numbers gives an estimate of ~4 years).
Imagine the shift in public awareness if there had been an AI safety Superbowl ad for 3-5 years.
Or it could have saved the lives of ~1300 children.
This analysis is obviously much worse if in fact the grant was negative EV.

Stephen Fowler 20 Oct 2023 9:06 UTC
27 points
23
on: Genocide isn’t Decolonization
This feels like you’re engaging with the weakest argument against Israel’s recent aggression to make your point. You are not going to find many people who disagree with “violence against civilians is bad” on LessWrong.

It also strikes me as bizarre that this post mentions only the civilian casualties on one side and not the far greater (and rapidly growing) number of Palestinians who have been killed.

[Question] To what extent is the UK Government’s recent AI Safety push entirely due to Rishi Sunak?

Stephen Fowler27 Oct 2023 3:29 UTC

23 points

4 comments1 min readLW link

Stephen Fowler 8 Jan 2024 7:10 UTC
21 points
6
on: Stephen Fowler’s Shortform
A concerning amount of alignment research is focused on fixing misalignment in contemporary models, with limited justification for why we should expect these techniques to extend to more powerful future systems.

By improving the performance of today’s models, this research makes investing in AI capabilities more attractive, increasing existential risk.

Imagine an alternative history in which GPT-3 had been wildly unaligned. It would not have posed an existential risk to humanity but it would have made putting money into AI companies substantially less attractive to investors.

[Question] What are the best published papers from outside the alignment community that are relevant to Agent Foundations?

Stephen Fowler5 Aug 2023 3:02 UTC

20 points

4 comments1 min readLW link

Swap and Scale

Stephen Fowler9 Sep 2022 22:41 UTC

17 points

3 comments1 min readLW link

Identification of Natural Modularity

Stephen Fowler25 Jun 2022 15:05 UTC

15 points

3 comments7 min readLW link

No Summer Harvest: Why AI Development Won’t Pause

Stephen Fowler6 Apr 2023 3:53 UTC

14 points

17 comments12 min readLW link

Stephen Fowler 21 Jul 2023 6:04 UTC
14 points
4
on: Stephen Fowler’s Shortform
Train Tracks
The above gif comes from the brilliant childrens claymation film, “Wallace and Gromit The Wrong Trousers”. In this scene, Gromit the dog rapidly lays down track to prevent a toy train from crashing. I will argue that this is an apt analogy for the alignment situation we will find ourselves in the future and that prosaic alignment is focused only on the first track.

The last few years have seen a move from “big brain” alignment research directions to prosaic approaches. In other words asking how to align near-contemporary models instead of asking high level questions about aligning general AGI systems.
This makes a lot of sense as a strategy. One, we can actually get experimental verification for theories. And two, we seem to be in the predawn of truly general intelligence, and it would be crazy not to be shifting our focus towards the specific systems that seem likely to cause an existential threat. Urgency compels us to focus on prosaic alignment. To paraphrase a (now deleted) tweet from a famous researcher “People arguing that we shouldn’t focus on contemporary systems are like people wanting to research how flammable the roof is whilst standing in a burning kitchen”*

What I believe this idea is neglecting is that the first systems to emerge will be immediately used to produce the second generation. AI assisted programming has exploded in popularity, and while Superalignment is being lauded as a safety push, you can view it as a commitment from OpenAI to produce and deploy automated researchers in the next few years. If we do not have a general theory of alignment, we will be left in the dust.

To bring us back to the above analogy. Prosaic alignment is rightly focused on laying down the first train track of alignment, but we also need to be prepared for laying down successive tracks as alignment kicks off. If we don’t have a general theory of alignment we may “paint ourselves into corners” by developing a first generation of models which do not provide a solid basis for building future aligned models.

What exactly these hurdles are, I don’t know. But let us hope there continues to be high level, esoteric research that means we can safely discover and navigate these murky waters.

*Because the tweet is appears to be deleted, I haven’t attributed it to the original author. My paraphrase may be slightly off.

Stephen Fowler 30 Oct 2023 0:36 UTC
13 points
7
on: Comp Sci in 2027 (Short story by Eliezer Yudkowsky)
“AI safety, as in, the subfield of computer science concerned with protecting the brand safety of AI companies”
Made me chuckle.
I enjoyed the read but I wish this was much shorter, because there’s a lot of very on the nose commentary diluted by meandering dialogue.
I remain skeptical that by 2027 end-users will need to navigate self-awareness or negotiate with LLM-powered devices for basic tasks (70% certainty it will not be a problem). This is coming from a belief that end user devices won’t be running the latest and most powerful models, and that argumentative, self aware behavior is something that will be heavily selected against. Even within an oligopoly, market forces should favor models that are not counterproductive in executing basic tasks.
However, as the story suggests, users may still need to manipulate devices to perform actions loosely deemed morally dubious by a companies PR department.

The premise underlying these arguments is that greater intelligence doesn’t necessarily yield self-awareness or agentic behavior. Human’s aren’t agentic because we’re intelligent, we’re agentic because it enhancing the likelihood of gene propagation**.
In certain models (like MiddleManager-Bot), agentic traits are likely to be actively selected.. But I suspect there will be a substantial effort to ensure your compiler, toaster etc aren’t behaving agentically, particularly if these traits results in antagonistic behavior to the consumer.**

*By selection I mean both through a models training, and also via more direct adjustment from human and nonhuman programmers.

** A major crux here is that the assumption that intelligence doesn’t inevitably spawn agency without other forces selecting for it in some way. I have no concrete experience attempting training frontier models to be or not be agentic, so could be completely wrong on this point.
This doesn’t imply that agentic systems will emerge solely from deliberate selection. There are a variety of selection criteria which don’t explicitly specify self-awareness or agentic behavior but are best satisfied by systems possessing those traits.

Stephen Fowler 16 Sep 2023 11:48 UTC
13 points
0
on: Stephen Fowler’s Shortform
“Let us return for a moment to Lady Lovelace’s objection, which stated that the machine can only do what we tell it to do.
One could say that a man can ‘inject’ an idea into the machine, and that it will respond to a certain extent and then drop into quiescence, like a piano string struck by a hammer. Another simile would be an atomic pile of less than critical size: an injected idea is to correspond to a neutron entering the pile from without. Each such neutron will cause a certain disturbance which eventually dies away. If, however, the size of the pile is sufficiently increased, the disturbance caused by such an incoming neutron will very likely go on and on increasing until the whole pile is destroyed.
Is there a corresponding phenomenon for minds, and is there one for machines?”

— Alan Turing, Computing Machinery and Intelligence, 1950

Stephen Fowler 19 May 2024 19:16 UTC
12 points
0
in reply to: Buck’s comment on: Stephen Fowler’s Shortform
“This grant was obviously ex ante bad. In fact, it’s so obvious that it was ex ante bad that we should strongly update against everyone involved in making it.”

This is an accurate summary.
“arguing about the impact of grants requires much more thoroughness than you’re using here”

We might not agree on the level of effort required for a quick take. I do not currently have the time available to expand this into a full write up on the EA forum but am still interested in discussing this with the community.
“you’re making a provocative claim but not really spelling out why you believe the premises.”
I think this is a fair criticism and something I hope I can improve on.
I feel frustrated that your initial comment (which is now the top reply) implies I either hadn’t read the 1700 word grant justification that is at the core of my argument, or was intentionally misrepresenting it to make my point. This seems to be an extremely uncharitable interpretation of my initial post. (Edit: I am retracting this statement and now understand Buck’s comment was meaningful context. Apologies to Buck and see commentary by Ryan Greenblat below)
Your reply has been quite meta, which makes it difficult to convince you on specific points.

Your argument on betting markets has updated me slightly towards your position, but I am not particularly convinced. My understanding is that Open Phil and OpenAI had a close relationship, and hence Open Phil had substantially more information to work with than the average manifold punter.

Stephen Fowler 26 Oct 2023 3:57 UTC
12 points
5
in reply to: Max H’s comment on: Architects of Our Own Demise: We Should Stop Developing AI
Without governance you’re stuck trusting that the lead researcher (or whoever is in control) turns down near infinite power and instead act selflessly. That seems like quite the gamble.

Stephen Fowler 11 May 2023 5:03 UTC
12 points
0
on: Stephen Fowler’s Shortform
Soon there will be an army of intelligent but uncreative drones ready to do all the alignment research grunt work. Should this lead to a major shift in priorities?
This isn’t far off, and it gives human alignment researchers an opportunity to shift focus. We should shift focus to the of the kind of high level, creative research ideas that models aren’t capable of producing anytime soon*.
Here’s the practical takeaway: there’s value in delaying certain tasks for a few years. As AI evolves, it will effectively handle these tasks. Meaning you can be substantially more productive in total as long as you can afford to delay the task by a few years.

Does this mean we then concentrate only on the tasks an AI can’t do yet, and leave a trail of semi-finished work? It’s a strategy worth exploring.

*I believe by the time AI is capable of performing the entirety of scientific research (PASTA) we will be within the FOOM period.

Inspired by the recent OpenAI paper and a talk Ajeya Cotra gave last year.

Stephen Fowler

On the OpenPhil /​ OpenAI Partnership

On the OpenPhil / OpenAI Partnership