Do you know if there have been any concrete implications (ie. someone giving Daniel a substantial amount of money) from the discussion?′
Stephen Fowler
I think this is an important discussion to have but I suspect this post might not convince people who don’t already share similar beliefs.
1. I think the title is going to throw people off.
I think what you’re actually saying “stop the current strain of research focused on improving and understanding contemporary systems which has become synonymous with the term AI safety” but many readers might interpret this as if you’re saying “stop research that is aimed at reducing existential risks from AI”. It might be best to reword it as “stopping prosaic AI safety research”.
In fairness, the first, narrower definition of AI Safety certainly describes a majority of work under the banner of AI Safety. It certainly seems to be where most of the funding is going and describes the work done at industrial labs. It is certainly what educational resources (like the AI Safety Fundamentals course) focus on.
2. I’ve had a limited number of experiences informally having discussions with researchers on similar ideas (not necessarily arguing for stopping AI safety research entirely though). My experience is that people either agree immediately or do not really appreciate the significance of concerns about AI safety research largely being on the wrong track. Convincing people in the second category seems to be rather difficult.
To summarize what I’m trying to convey:
I think this is a crucial discussion to have and it would be beneficial to the community to write this up into a longer post if you have the time.
Thank you, this explains my error. I’ve retracted that part of my response.
(I’m the OP)
I’m not trying to say “it’s bad to give large sums of money to any group because humans have a tendency to to seek power.”
I’m saying “you should be exceptionally cautious about giving large sums of money to a group of humans with the stated goal of constructing an AGI.”
You need to weight any reassurances they give you against two observations:
The commonly observed pattern of individual humans or organisations seeking power (and/or wealth) at the expense of the wider community.
The strong likelihood that there will be an opportunity for organisations pushing ahead with AI research to obtain incredible wealth or power.
So, it isn’t “humans seek power therefore giving any group of humans money is bad”. It’s “humans seek power” and, in the specific case of AI companies, there may be incredibly strong rewards for groups that behave in a self-interested way.
The general idea I’m working off is that you need to be skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).
“In particular, it emphasized the importance of distributing AI broadly;1 our current view is that this may turn out to be a promising strategy for reducing potential risks”
Yes, I’m interpreting the phrase “may turn out” to be treating the idea with more seriousness than it deserves.
Rereading the paragraph, it seems reasonable to interpret it as politely downplaying it, in which case my statement about Open Phil taking the idea seriously is incorrect.
This does not feel super cruxy as the the power incentive still remains.
“This grant was obviously ex ante bad. In fact, it’s so obvious that it was ex ante bad that we should strongly update against everyone involved in making it.”
This is an accurate summary.“arguing about the impact of grants requires much more thoroughness than you’re using here”
We might not agree on the level of effort required for a quick take. I do not currently have the time available to expand this into a full write up on the EA forum but am still interested in discussing this with the community.“you’re making a provocative claim but not really spelling out why you believe the premises.”
I think this is a fair criticism and something I hope I can improve on.
I feel frustrated that your initial comment (which is now the top reply) implies I either hadn’t read the 1700 word grant justification that is at the core of my argument, or was intentionally misrepresenting it to make my point. This seems to be an extremely uncharitable interpretation of my initial post. (Edit: I am retracting this statement and now understand Buck’s comment was meaningful context. Apologies to Buck and see commentary by Ryan Greenblat below)Your reply has been quite meta, which makes it difficult to convince you on specific points.
Your argument on betting markets has updated me slightly towards your position, but I am not particularly convinced. My understanding is that Open Phil and OpenAI had a close relationship, and hence Open Phil had substantially more information to work with than the average manifold punter.
So the case for the grant wasn’t “we think it’s good to make OAI go faster/better”.
I agree. My intended meaning is not that the grant is bad because its purpose was to accelerate capabilities. I apologize that the original post was ambiguous
Rather, the grant was bad for numerous reasons, including but not limited to:It appears to have had an underwhelming governance impact (as demonstrated by the board being unable to remove Sam).
It enabled OpenAI to “safety-wash” their product (although how important this has been is unclear to me.)
From what I’ve seen at conferences and job boards, it seems reasonable to assert that the relationship between Open Phil and OpenAI has lead people to work at OpenAI.
Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you’re only concerned with human misuse and not misalignment.
Finally, it’s giving money directly to an organisation with the stated goal of producing an AGI. There is substantial negative -EV if the grant sped up timelines.
This last claim seems very important. I have not been able to find data that would let me confidently estimate OpenAI’s value at the time the grant was given. However, wikipedia mentions that “In 2017 OpenAI spent $7.9 million, or a quarter of its functional expenses, on cloud computing alone.” This certainly makes it seem that the grant provided OpenAI with a significant amount of capital, enough to have increased its research output.
Keep in mind, the grant needs to have generated 30 million in EV just to break even. I’m now going to suggest some other uses for the money, but keep in mind these are just rough estimates and I haven’t adjusted for inflation. I’m not claiming these are the best uses of 30 million dollars.
The money could have funded an organisation the size of MIRI for roughly a decade (basing my estimate on MIRI’s 2017 fundraiser, using 2020 numbers gives an estimate of ~4 years).
Imagine the shift in public awareness if there had been an AI safety Superbowl ad for 3-5 years.
Or it could have saved the lives of ~1300 children.
This analysis is obviously much worse if in fact the grant was negative EV.
That’s a good point. You have pushed me towards thinking that this is an unreasonable statement and “predicted this problem at the time” is better.
On the OpenPhil / OpenAI Partnership
Epistemic Note:
The implications of this argument being true are quite substantial, and I do not have any knowledge of the internal workings of Open Phil.(Both title and this note have been edited, cheers to Ben Pace for very constructive feedback.)
Premise 1:
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.Premise 2:
This was the default outcome.Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.
Edit: To clarify, you need to be skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).
Premise 3:
Without repercussions for terrible decisions, decision makers have no skin in the game.
Conclusion:
Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn’t be allowed anywhere near AI Safety decision making in the future.
To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.
This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.To quote OpenPhil:
”OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela.”
This is your second post and you’re still being vague about the method. I’m updating strongly towards this being a hoax and I’m surprised people are taking you seriously.
Edit: I’ll offer you a 50 USD even money bet that your method won’t replicate when tested by a 3rd party with more subjects and a proper control group.
You are given a string s corresponding to the Instructions for the construction of an AGI which has been correctly aligned with the goal of converting as much of the universe into diamonds as possible.
What is the conditional Kolmogorov Complexity of the string s’ which produces an AGI aligned with “human values” or any other suitable alignment target.
To convert an abstract string to a physical object, the “Instructions” are read by a Finite State Automata, with the state of the FSA at each step dictating the behavior of a robotic arm (with appropriate mobility and precision) with access to a large collection of physical materials.
Tangential.
Is part of the motivation behind this question to think about the level of control that a super-intelligence could have on a complex system if it was only able to only influence a small part of that system?
I was not precise enough in my language and agree with you highlighting that what “alignment” means for LLM is a bit vague. While people felt Sydney Bing was cool, if it was not possible to reign it in it would have made it very difficult for Microsoft to gain any market share. An LLM that doesn’t do what it’s asked or regularly expresses toxic opinions is ultimately bad for business.
In the above paragraph understand “aligned” to mean in the concrete sense of “behaves in a way that is aligned with it’s parent companies profit motive”, rather than “acting in line with humanities CEV”. To rephrase the point I was making above, I feel much of (a majority even) of today’s alignment research is focused on the the first definition of alignment, whilst neglecting the second.
A concerning amount of alignment research is focused on fixing misalignment in contemporary models, with limited justification for why we should expect these techniques to extend to more powerful future systems.
By improving the performance of today’s models, this research makes investing in AI capabilities more attractive, increasing existential risk.
Imagine an alternative history in which GPT-3 had been wildly unaligned. It would not have posed an existential risk to humanity but it would have made putting money into AI companies substantially less attractive to investors.
Nice post.
”Membranes are one way that embedded agents can try to de-embed themselves from their environment.”
I would like to hear more elaboration on “de-embedding”. For agents who which are embedded in and interact directly with the physical world, I’m not sure that a process of de-embedding is well defined.
There are fundamental thermodynamic properties of agents that are relevant here. Discussion of agent membranes could also include an analysis of how the environment and agent do work on each other via the mebrane, and how the agent dissipates waste heat and excess entropy to the environment.
“Day by day, however, the machines are gaining ground upon us; day by day we are becoming more subservient to them; more men are daily bound down as slaves to tend them, more men are daily devoting the energies of their whole lives to the development of mechanical life. The upshot is simply a question of time, but that the time will come when the machines will hold the real supremacy over the world and its inhabitants is what no person of a truly philosophic mind can for a moment question.”
— Samuel Butler, DARWIN AMONG THE MACHINES, 1863
An additional distinction between contemporary and future alignment challenges is that the latter concerns the control of physically deployed, self aware system.
Alex Altair has previously highlighted that they will (microscopically) obey time reversal symmetry[1] unlike the information processing of a classical computer program. This recent paper published in Entropy[2] touches on the idea that a physical learning machine (the “brain” of a causal agent) is an “open irreversible dynamical system” (pg 12-13).- ^
Altair A. “Consider using reversible automata for alignment research” 2022
- ^
Milburn GJ, Shrapnel S, Evans PW. “Physical Grounds for Causal Perspectivalism” Entropy. 2023; 25(8):1190. https://doi.org/10.3390/e25081190
- ^
Feedback wanted!
What are your thoughts on the following research question:
”What nontrivial physical laws or principles exist governing the behavior of agentic systems.”
(Very open to feedback along the lines of “hey that’s not really a research question”)
At the risk of missing something obvious, any distributed quantum circuit without a measurement step it is not possible for Kevin and Charlie to learn anything about the plaintext per the no cloning theorem.
Eavesdropping in the middle of the circuit should lead to measurable statistical anomalies due to projecting the state onto the measurement basis.
(I’ll add a caveat that I am talking about theoretical quantum circuits and ignoring any nuances that emerge from their physical implementations.)
Edit:
On posting, I think I realize my error.
We need Kevin and Charlie to not have knowledge of the specific gates that they are implementing as well.