That’s a good point. You have pushed me towards thinking that this is an unreasonable statement and “predicted this problem at the time” is better.
Stephen Fowler
Very Spicy Take
Epistemic Note:
Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion.Premise 1:
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.Premise 2:
This was the default outcome.Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.
Premise 3:
Without repercussions for terrible decisions, decision makers have no skin in the game.
Conclusion:
Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn’t be allowed anywhere near AI Safety decision making in the future.
To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.
This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.To quote OpenPhil:
”OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela.”
This is your second post and you’re still being vague about the method. I’m updating strongly towards this being a hoax and I’m surprised people are taking you seriously.
Edit: I’ll offer you a 50 USD even money bet that your method won’t replicate when tested by a 3rd party with more subjects and a proper control group.
You are given a string s corresponding to the Instructions for the construction of an AGI which has been correctly aligned with the goal of converting as much of the universe into diamonds as possible.
What is the conditional Kolmogorov Complexity of the string s’ which produces an AGI aligned with “human values” or any other suitable alignment target.
To convert an abstract string to a physical object, the “Instructions” are read by a Finite State Automata, with the state of the FSA at each step dictating the behavior of a robotic arm (with appropriate mobility and precision) with access to a large collection of physical materials.
Tangential.
Is part of the motivation behind this question to think about the level of control that a super-intelligence could have on a complex system if it was only able to only influence a small part of that system?
I was not precise enough in my language and agree with you highlighting that what “alignment” means for LLM is a bit vague. While people felt Sydney Bing was cool, if it was not possible to reign it in it would have made it very difficult for Microsoft to gain any market share. An LLM that doesn’t do what it’s asked or regularly expresses toxic opinions is ultimately bad for business.
In the above paragraph understand “aligned” to mean in the concrete sense of “behaves in a way that is aligned with it’s parent companies profit motive”, rather than “acting in line with humanities CEV”. To rephrase the point I was making above, I feel much of (a majority even) of today’s alignment research is focused on the the first definition of alignment, whilst neglecting the second.
A concerning amount of alignment research is focused on fixing misalignment in contemporary models, with limited justification for why we should expect these techniques to extend to more powerful future systems.
By improving the performance of today’s models, this research makes investing in AI capabilities more attractive, increasing existential risk.
Imagine an alternative history in which GPT-3 had been wildly unaligned. It would not have posed an existential risk to humanity but it would have made putting money into AI companies substantially less attractive to investors.
Nice post.
”Membranes are one way that embedded agents can try to de-embed themselves from their environment.”
I would like to hear more elaboration on “de-embedding”. For agents who which are embedded in and interact directly with the physical world, I’m not sure that a process of de-embedding is well defined.
There are fundamental thermodynamic properties of agents that are relevant here. Discussion of agent membranes could also include an analysis of how the environment and agent do work on each other via the mebrane, and how the agent dissipates waste heat and excess entropy to the environment.
“Day by day, however, the machines are gaining ground upon us; day by day we are becoming more subservient to them; more men are daily bound down as slaves to tend them, more men are daily devoting the energies of their whole lives to the development of mechanical life. The upshot is simply a question of time, but that the time will come when the machines will hold the real supremacy over the world and its inhabitants is what no person of a truly philosophic mind can for a moment question.”
— Samuel Butler, DARWIN AMONG THE MACHINES, 1863
An additional distinction between contemporary and future alignment challenges is that the latter concerns the control of physically deployed, self aware system.
Alex Altair has previously highlighted that they will (microscopically) obey time reversal symmetry[1] unlike the information processing of a classical computer program. This recent paper published in Entropy[2] touches on the idea that a physical learning machine (the “brain” of a causal agent) is an “open irreversible dynamical system” (pg 12-13).- ^
Altair A. “Consider using reversible automata for alignment research” 2022
- ^
Milburn GJ, Shrapnel S, Evans PW. “Physical Grounds for Causal Perspectivalism” Entropy. 2023; 25(8):1190. https://doi.org/10.3390/e25081190
- ^
Feedback wanted!
What are your thoughts on the following research question:
”What nontrivial physical laws or principles exist governing the behavior of agentic systems.”
(Very open to feedback along the lines of “hey that’s not really a research question”)
Yes, perhaps there could be a way having dialogues edited for readability.
I strongly downvoted Homework Answer: Glicko Ratings for War. The reason is because it’s appears to be a pure data dump that isn’t intended to be actually read by a human. As it is a follow up to a previous post it might have been better as a comment or edit on the original post linking to your github with the data instead.
Looking at your post history, I will propose that you could improve the quality of your posts by spending more time on them. There are only a few users who manage to post multiple times a week and consistently get many upvotes.
When you say you were practising Downwell for the course of a month, how many hours was this in total?
Is this what you’d cynically expect from an org regularizing itself or was this a disappointing surprise for you?
I strongly believe that, barring extremely strict legislation, one of the initial tasks given to the first human level artificial intelligence will be to work to develop more advanced machine learning techniques. During this period we will see unprecedented technological developments and any many alignment paradigms rooted in the empirical behavior of the previous generation of systems may no longer be relevant.
I predict most humans choose to reside in virtual worlds and possibly have their brain altered to forget that it’s not real.
“AI safety, as in, the subfield of computer science concerned with protecting the brand safety of AI companies”
Made me chuckle.
I enjoyed the read but I wish this was much shorter, because there’s a lot of very on the nose commentary diluted by meandering dialogue.
I remain skeptical that by 2027 end-users will need to navigate self-awareness or negotiate with LLM-powered devices for basic tasks (70% certainty it will not be a problem). This is coming from a belief that end user devices won’t be running the latest and most powerful models, and that argumentative, self aware behavior is something that will be heavily selected against. Even within an oligopoly, market forces should favor models that are not counterproductive in executing basic tasks.
However, as the story suggests, users may still need to manipulate devices to perform actions loosely deemed morally dubious by a companies PR department.
The premise underlying these arguments is that greater intelligence doesn’t necessarily yield self-awareness or agentic behavior. Human’s aren’t agentic because we’re intelligent, we’re agentic because it enhancing the likelihood of gene propagation**.In certain models (like MiddleManager-Bot), agentic traits are likely to be actively selected.. But I suspect there will be a substantial effort to ensure your compiler, toaster etc aren’t behaving agentically, particularly if these traits results in antagonistic behavior to the consumer.**
*By selection I mean both through a models training, and also via more direct adjustment from human and nonhuman programmers.
** A major crux here is that the assumption that intelligence doesn’t inevitably spawn agency without other forces selecting for it in some way. I have no concrete experience attempting training frontier models to be or not be agentic, so could be completely wrong on this point.
This doesn’t imply that agentic systems will emerge solely from deliberate selection. There are a variety of selection criteria which don’t explicitly specify self-awareness or agentic behavior but are best satisfied by systems possessing those traits.
Is there reason to think the “double descent” seen in observation 1 relates to the traditional “double descent” phenomena?
My initial guess is no.
So the case for the grant wasn’t “we think it’s good to make OAI go faster/better”.
I agree. My intended meaning is not that the grant is bad because its purpose was to accelerate capabilities. I apologize that the original post was ambiguous
Rather, the grant was bad for numerous reasons, including but not limited to:
It appears to have had an underwhelming governance impact (as demonstrated by the board being unable to remove Sam).
It enabled OpenAI to “safety-wash” their product (although how important this has been is unclear to me.)
From what I’ve seen at conferences and job boards, it seems reasonable to assert that the relationship between Open Phil and OpenAI has lead people to work at OpenAI.
Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you’re only concerned with human misuse and not misalignment.
Finally, it’s giving money directly to an organisation with the stated goal of producing an AGI. There is substantial negative -EV if the grant sped up timelines.
This last claim seems very important. I have not been able to find data that would let me confidently estimate OpenAI’s value at the time the grant was given. However, wikipedia mentions that “In 2017 OpenAI spent $7.9 million, or a quarter of its functional expenses, on cloud computing alone.” This certainly makes it seem that the grant provided OpenAI with a significant amount of capital, enough to have increased its research output.
Keep in mind, the grant needs to have generated 30 million in EV just to break even. I’m now going to suggest some other uses for the money, but keep in mind these are just rough estimates and I haven’t adjusted for inflation. I’m not claiming these are the best uses of 30 million dollars.
The money could have funded an organisation the size of MIRI for roughly a decade (basing my estimate on MIRI’s 2017 fundraiser, using 2020 numbers gives an estimate of ~4 years).
Imagine the shift in public awareness if there had been an AI safety Superbowl ad for 3-5 years.
Or it could have saved the lives of ~1300 children.
This analysis is obviously much worse if in fact the grant was negative EV.