Senior Researcher, Convergence Analysis.
Visiting Professor, Texas A&M University
Senior Researcher, Convergence Analysis.
Visiting Professor, Texas A&M University
Thank you for this comment!
I think your point that “The problem here is that fine-tuning easily strips any safety changes and easily adds all kinds of dangerous things (as long as capability is there).” is spot on and maps to my intuitions about the weaknesses of fine-tuning and one of strongest points in favor of the significant risks to open-sourcing foundation models.
I appreciate your suggestions for other methods of auditing that could possibly work such as a model being run within a protected framework and open-sourcing encrypted weights. I think these allow for something like risk mitigations for partial open-sourcing but would be less feasible for fully open sourced models where weights represented by plain tensors would be more likely to be available
Your comment is helpful and gave me some additional ideas to consider. Thanks!
One thing I would add is that the idea I had in mind for auditing was more of a broader process than a specific tool. The paper I mention to support this idea of a healthy ecosystem for auditing foundation models is “Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing.” Here the authors point to an auditing process that would guide a decision of whether or not to release a specific model and the types of decision points, stakeholders, and review process that might aid in making this decision. At the most abstract level the process includes scoping, mapping, artifact collection, testing, reflection, and post-audit decisions of whether or not to release the model.
Thanks for the comment!
I think your observation that biological evolution is a slow, blind, and undirected process is fair. We try to make this point explicit in our section on natural selection (as a main evolutionary selection pressure for biological evolution) where we say “The natural processes for succeeding or failing in survival and reproduction – natural and sexual selection – are both blind and slow.”
For our contribution here we are not trying to dispute this. Instead we’re seeking to find analogies to the ways in which machine evolution, which we define as “the process by which machines change over successive generations,” may have some underlying similar mechanisms that we can apply to understand how machines change over successive generations.
To your point that, “Machine learning algorithms, which are the relevant machines here, aren’t progressing in this pattern of dumb experiments which occasionally get lucky,” I agree. To understand this process better and as distinct from biological evolution and natural selection, we propose the notion of artificial selection. The idea of artificial selection is that machines are responding in part to natural selection pressures but that the evolutionary pressures are different here, which is why we give them a different name. We describe artificial selection in a way that I think corresponds closely to your concern. We say:
“For an analogy to natural selection we have chosen the term artificial selection which is driven in large part by human culture, human artifacts, and individual humans.… Artificial selection also highlights the ways in which this selection pressure applies more generally to human artifacts. Human intention and human design have shifted the pace of evolution of artifacts, including machines, rocketing forward by comparison to biological evolution.”
All of this to say, I agree that the comparison is pretty inexact. We were not going for an exact comparison. We were attempting to make it clear that machines and machine learning are influenced by a very different evolutionary selection process, which should lead to different expectations about the process by which machines change over successive generation. Our hope was not for the analogy to be exact to biological evolution, but rather to use components of biological evolution such as natural selection, inheritance, mutation, and recombination as familiar biological processes to explore potential parallels to machine evolution.
This is great! Thanks for sharing. I hope you continue to do these.
This discussion considers a relatively “flat”, dynamic organization of systems. The open-agency model[13] considers flexible yet relatively stable patterns of delegation that more closely correspond to current developments.
I have a questions here that I’m curious about:
I wonder if you have any additional thoughts about the “structure” of the open agencies that you imagine here. Flexible and relatively stable patterns of delegation seem to be important dimensions. You mention here that the discussion focuses on “flat” organization of systems, but I’m wondering if we might expect more “hierarchical” relationships if we incorporate things like proposer/critic models as part of the role architecture.
We want work flows that divide tasks and roles because of the inherent structure of problems, and because we want legible solutions. Simple architectures and broad training facilitate applying structured roles and workflows to complex tasks. If the models themselves can propose the structures (think of chain-of-thought prompting), so much the better. Planning a workflow is an aspect of the workflow itself.
I think this has particular promise, and it’s an area I would be excited to explore further. As I mentioned in a previous comment on your The Open Agency Model piece, I think this is a rich area of exploration for the different role architectures, roles, and tasks that would need to be organized to ensure both alignment and capabilities. As I mentioned there, I think there are specific areas of study that may contribute meaningfully to how we might do that. However, these fields have their own limitations, and the analogy to human agents fulfilling these role architectures (organizations in traditional human coordination sense) is not perfect. And on this note, I’m quite interested to see the capabilities of LLMs creating structured roles and workflows to complex tasks that then other LLMs could be simulated to fulfill.
Thanks for this post, and really, this series of posts. I had not been following along, so I started with the ““Reframing Superintelligence” + LLMs + 4 years” and worked my way back to here.
I found your initial Reframing Superintelligence report very compelling back when I first came across it, and still do. I also appreciate your update post referenced above.
The thought I’d like to offer here is that it strikes me that your ideas here are somewhat similar to what both Max Weber and Herbert Simon proposed we should do with human agents. After reading your Reframing Superintelligence report, I wrote a post here that noted that it led me to think more about this idea that human “bureaucrats” have a specific roles they play that are directed at a somewhat stable set of tasks. To me, this is a similar idea to what you’re suggesting here with the Open Agency model.
Here’s that post, it’s from 2021: Controlling Intelligent Agents The Only Way We Know How: Ideal Bureaucratic Structure (IBS).
In that post I also not some of Andrew Critch’s work that I think is somewhat in this direction as well. In particular, I think this piece may contribute to these ideas here as well: What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs).
All of this to say, I think there may be some lessons here for your Open Agency Model that build from studies of public agencies, organization studies, public administration, and governance. One of the key questions across these fields is how to align human agents to performs roles and bounded tasks in alignment with the general goals of an agency.
There are of course limitations to the human agent analogy, but given LLMs agent simulating capacities, defining roles and task structures within agencies for an agent to accomplish may benefit from what we’ve learned about managing this task with human agents.
For this task, I think Weber’s notion of creating a “Beamte” to fulfill specialized roles within the bureaucracy is a nice starting point for how to prompt or craft bounded agents that might fulfill specific roles as part of an open agency. And to highlight these specific elements, I include them below as a direct quote from the Controlling Intelligent Agents The Only Way We Know How: Ideal Bureaucratic Structure (IBS) piece:
“Weber provides 6 specific features of the IBS (he calls it the Modern Bureaucracy) including:
The principle of fixed competencies
The principle of hierarchically organized positions
Actions and rules are written and recorded
In-depth specialist training needed for agents undertaking their position
The position is full time and occupies all the professional energy of the agent in that position.
The duties of the position are based on general learnable rules and regulation, which are more or less firm and more or less comprehensive
Weber goes on to argue that a particular type of agent, a beamte, is needed to fulfill the various positions specialization demands for processing information and executing actions. So what does the position or role of the beamte demand?
The position is seen as a calling and a profession
The beamte (the agent) aims to gain and enjoy a high appreciation by people in power
The beamte is nominated by a higher authority
The beamte is a lifetime position
The beamte receives a regular remuneration
The beamte are organized into a professional track.”
Anyways, this is just a potential starting point for ideas around how to create an open agency of role architectures that might be populated by LLM simulations to accomplish concrete tasks.
Thanks for this post. As I mentioned to both of you, it feels a little bit like we have been ships passing one another in the night. I really like your idea here of loops and the importance of keeping humans within these loops, particularly at key nodes in the loop or system, to keep Moloch at bay.
I have a couple scattered points for you to consider:
In my work in this direction, I’ve tried to distinguish between roles and tasks. You do something similar here, which I like. To me, the question often should be about what specific tasks should be automated as opposed to what roles. As you suggest, people within specific roles bring their humanity with them to the role. (See: “Artificial Intelligence, Discretion, and Bureaucracy”)
One term I’ve used to help think about this within the context of organizations is the notion of discretion. This is the way in which individuals use of their decision making capacity within a defined role. It is this discretion that often allows individuals holding those roles to shape their decision making in a humane and contextualized way. (See: “Artificial discretion as a tool of governance: a framework for understanding the impact of artificial intelligence on public administration”)
Elsewhere, coauthors and I have used the term administrative evil to examine the ways in which substituting machine decision making for human decision making dehumanizes the decision making process exacerbating the risk of administrative evil be perpetuated by an organization. (See: Artificial Intelligence and Administrative Evil”)
One other line of work has looked at how the introduction of algorithms or machine intelligence within the loop changes the shape of the loop, potentially in unexpected ways, leading to changes in inputs in decision making throughout the loop. That is machine evolution influences organization (loop) evolution. (See: Machine Intelligence, Bureaucracy, and Human Control” & “Artificial Intelligence, bureaucratic form, and discretion in public service”)
I like the inclusion of the work on Cyborgism. It seems to me that in someways we’ve already become Cyborgs to match the complexity of the loops in which we work and play together. as they’ve already evolved in response to machine evolution. In theory at least, it does seem that a Cyborg approach could help overcome some of the challenges presented by Moloch and failed attempts at coordination.
Finally, your focus on loops reminded me of “Godel, Escher, Bach” and Hofstadter’s focus there and in his “I am A Strange Loop.” I like how you apply the notion to human organizations here. It would be interesting to think about different types of persistent loops as a ways of describing different organizational structures, goals, resources, etc.
I’m hoping we can discuss together sometime soon. I think we have a lot of interest overlap here.
Thanks for this post! Hope the comments are helpful.
I was interested in seeing what the co-writing process would create. I also wanted to tell a story about technology in a different way, which I hope compliments the other stories in this part of the sequence. I also just think it’s fun to retell a story that was originally told from the point of view of future intelligent machines back in 1968, and then to use a modern intelligent machine to write that story. I think it makes a few additional points about how stable our fears have been, how much the technology has changed, and the plausibility of the story itself.
I love that response! I’ll be interested to see how quickly it strikes others. All the actual text that appears within the story is generated by ChatGPT with the 4.0 model. Basically, I asked ChatGPT to co-write a brief story. I had it pause throughout and ask for feedback in revisions. Then, at the end of the story it generated with my feedback along the way, I asked it to fill in some more details and examples, which it did. I asked for minor changes in these in style and specific type as well.
I’d be happy to directly send you screenshots of the chat as well.
Thanks for reading!
Thanks for the response! I appreciate the clarification on both point 1 and 2 above. I think they’re fair criticisms. Thanks for pointing them out.
Thank you for providing a nice overview of our Frontier AI Regulation: Managing Emerging Risks to Public Safety that was just released!
I appreciate your feedback, both the positive and critical parts. I’m also glad you think the paper should exist and that it is mostly a good step. And, I think your criticism is fair. Let me also note that I do not speak for the authorship team. We are quite a diverse group from academia, labs, industry, nonprofits, etc. It was no easy task to find common ground across everyone involved.
I think the AI Governance space is difficult in part because different political actors have different goals, even when sharing significant overlap in interests. As I saw it, the goal of this paper was to bring together a wide group of interested individuals and organizations to see if we could come to points of agreement on useful immediate next governance steps. In this way, we weren’t seeking “ambitious” new policy tools, we were seeking for areas of agreement across the diverse stakeholders currently driving change in the AI development space. I think this is a significantly different goal than the Model Evaluation for Extreme Risks paper that you mention, which I agree is another important entry in this space. Additionally, one of the big differences, I think, between our effort and the model evaluation paper, is we are more focused on what governments in particular should consider doing from their available toolkits, where it seems to me that model evaluation paper is more about what companies and labs themselves should do.
A couple of other thoughts:
I don’t think it’s completely accurate that “It doesn’t suggest government oversight of training runs or compute.” As part of the suggestion around licensing we mention that the AI development process may require oversight by an agency. But, in fairness, it’s not a point that we emphasize.
I think the following is a little unfair. You say: “This is overdeterminedly insufficient for safety. “Not complying with mandated standards and ignoring repeated explicit instructions from a regulator” should not be allowed to happen, because it might kill everyone. A single instance of noncompliance should not be allowed to happen, and requires something like oversight of training runs to prevent. Not to mention that denying market access or threatening prosecution are inadequate. Not to mention that naming-and-shaming and fining companies are totally inadequate. This passage totally fails to treat AI as a major risk. I know the authors are pretty worried about x-risk; I notice I’m confused.” Let me explain below.
I’m not sure there’s such a thing as “perfect compliance.” I know of no way to ensure that “a single instance of noncompliance should not be allowed to happen.” And, I don’t think that’s necessary for current models or even very near term future models. I think the idea here is that we setup a standard regulatory process in advance of AI models that might be capable enough to kill everyone and shape the development of the next sets of frontier models. I do think there’s certainly a criticism here that naming and shaming, for example, is not a sufficiently punitive tool, but may have more impact on leading AI labs that one might assume.
I hope this helps clear up some of your confusion here. To recap: I think your criticism that the tools are not ambitious is fair. I don’t think that was our goal. I saw this project as a way of providing tools for which there is broad agreement and that given the current state of AI models we believe would help steer AI development and deployment in a better direction. I do think that another reading of this paper is that it’s quite significant that this group agreed on the recommendations that are made. I consider it progress in the discussion of how to effectively govern increasingly power AI models, but it’s not the last word either. :)
Thanks again for sharing and for providing you feedback on these very important questions of governance.
As you likely know by now, I think the argument that “Technological Progress = Human Progress” is clearly more complicated than is sometimes assumed. AI is very much already embedded in society and the existing infrastructure makes further deployment even easier. As you say, “more capability dropped into parts of a society isn’t necessarily a good thing.”
One of my favorite quotes from the relationship between technological advancement and human advancement is from Aldous Huxley below:
“Today, after two world wars and three major revolutions, we know that there is no necessary correlation between advanced technology and advanced morality. Many primitives, whose control over their environment is rudimentary, contrive nonetheless to be happy, virtuous, and, within limits, creative. Conversely, the members of civilized societies, possessed of the technological resources to exercise considerable control over their environment, are often conspicuously unhappy, maladjusted, and uncreative; and though private morals are tolerably good, collective behavior is savage to the point of fiendishness. In the field of international relations the most conspicuous difference between men of the twentieth century and the ancient Assyrians is that the former have more efficient methods of committing atrocities and are able to destroy, tyrannize, and enslave on a larger scale.
The truth is that all an increase in man’s ability to control his environment can do for him is merely to modify the situation in which, by other than technological means, individuals and groups attempt to make specifically human progress in creativeness, morality, and happiness. Thus the city-dwelling factory worker may belong, biologically speaking, to a more progressive group than does the peasant; but it does not follow that he will find it any easier to be happy, good, and creative. The peasant is confronted by one set of obstacles and handicaps; the industrial worker, by another set. Technological progress does not abolish obstacles; it merely changes their nature. And this is true even in cases where technological progress directly affects the lives and persons of individuals.”
— The Divine Within: Selected Writings on Enlightenment by Aldous Huxley, Huston Smith https://a.co/a0BFqOM
Thanks for the comment, David! It also caused me to go back and read this post again, which sparked quite a few old flames in the brain.
I agree that a collection of different approaches to ensuring AI alignment would be interesting! This is something that I’m hoping (now planning!) to capture in part with my exploration of scenario modeling that’s coming down the pipe. But, a brief overview of the different analytical approaches to AI alignment, would be helpful (if it doesn’t already exist in an updated form that I’m unaware of).
I agree with your insight that Weber’s description here can be generalized to moral and judicial systems for society. I suspect if we went looking into Weber’s writing we might find similar analogies here as well.
I agree with your comment on the limitations of hierarchy for human bureaucracies. Fixed competencies and hierarchical flows benefit from bottom up information flows and agile adaptation. However, I think this reinforces my point about machine beamte and AGI controlled through this method. For the same sorts of benefits of agility and modification by human organizations, you might think that we would want to restrict these things for machine agents to deliberately sacrifice benefits from adaptation in favor of aligned interests and controllability.
Thanks for the feedback! I can imagine some more posts in this direction non the future.
Thank you for this post! As I may have mentioned to you both, I had not followed this line of research until the two of you brought it to my attention. I think the post does an excellent job describing the trade offs around interpretability research and why we likely want to push it in certain, less risky directions. In this way, I think the post is a success in that it is accessible and lays out easy to follow reasoning, sources, and examples. Well done!
I have a couple of thoughts on the specific content as well where I think my intuitions converge or diverge somewhat:
I think your intuition of focusing on human interpretability of AI as being more helpful to safety than on AI interpretability of AI is correct and it seems to me that AI interpretability of AI is a pretty clear pathway to automating AI R&D which seems fraught with risk. It seems that that this “machine-to-machine translation” is already well underway. It may also be the case that this is something of an inevitable path forward at this point.
I’m happy to persuaded otherwise, but I think the symbol grounding case is less ambiguous than you seem to suppose. It’s also not completely clear to me what you mean by symbol grounding, but if it roughly means something like “grounding current neural net systems with symbols that clearly represent specific concepts” or “representing fuzzy knowledge graphs within a neural net with more clearly identifiable symbols” this seems to me to more heavily weigh on the side of a positive thing. This seems like a significant increase in controllability over leading current approaches that, without these forms of interpretability, are minds unknown. I do take your point that this complementary symbolic approach may help current systems break through some hard limit on capabilities, but it seems to me that if these capabilities are increased by the inclusion of a symbolic grounding that these increased capabilities may still be more controllable than a version of a model with even less capabilities that doesn’t include them.
I don’t think I agree that any marginal slowdown in capabilities, de facto, helps with alignment and safety research. (I don’t think this is your claim, but it is the claim of the anonymous expert from the 80k study.) It seems to me to be much more about what type of capabilities we are talking about. For example, I think on the margin a slightly more capable multi-modal LLM that is more capable in the direction of making itself more interpretable to humans would likely be a net positive for the safety of those systems.
Thanks again for this post! I found it very useful and thought-provoking. I also find myself now concerned with the direction of interpretability research as well. I do hope that people in this area will follow your advice and choose their research topics carefully and certainly focus on methods that improve human interpretability of AI.
Thank you for this post! As I mentioned to both of you, I like your approach here. In particular, I appreciate the attempt to provide some description of how we might optimize for something we actually want, something like wisdom.
I have a few assorted thoughts for you to consider:
I would be interested in additional discussion around the inherent boundedness of agents that act in the world. I think self-consistency and inter-factor consistency have some fundamental limits that could be worth exploring within this framework. For example, might different types of boundedness systematically undermine wisdom in ways that we can predict or try and account for? You point out that these forms of consistency are continuous, which I think is a useful step in this direction
I’m wondering about feedback mechanisms for a wise agent in this context. For example, it would be interesting to know a little more about how a wise agent incorporates feedback into its model from the consequences of its own actions. I would be interested to see more in this direction in any future posts.
It strikes me that this post titled “Agency from a Causal Perspective” (https://www.alignmentforum.org/posts/Qi77Tu3ehdacAbBBe/agency-from-a-causal-perspective) might be of some particular interest to your approach here.
Excellent post here! I hope the comments are helpful!
Thank you for the comment and for reading the sequence! I posted Chapter 7 Welcome to Analogia! (https://www.lesswrong.com/posts/PKeAzkKnbuwQeuGtJ/welcome-to-analogia-chapter-7) yesterday and updated the main sequence page just now to reflect that. I think this post starts to shed some light on ways of navigating this world of aligning humans to the interests of algorithms, but I doubt it will fully satisfy your desire for a call to action.
I think there are both macro policies and micro choices that can help.
At the macro level, there is an over accumulation of power and property by non-human intelligences (machine intelligences, large organizations, and mass market production). The best guiding remedy here that I’ve found comes from Huxley. The idea is pretty straightforward in theory: spread the power and property around and in the direction away from these non-human intelligences and towards as many humans as possible. This seems to be the only reasonable cure to the organized lovelessness and its consequence of massive dehumanization.
At the micro level, there is some practical advice in Chapter 7 that also originates with Huxley. The suggestion here is that to avoid being an algorithmically aligned human, choose to live filled with love, intelligence, and freedom as your guide posts. Pragmatically, one must live in the present, here and now, to experience those things fully.
I hope this helps, but I’m not sure it will.
The final thing I’d add at this point is that I think there’s something to reshaping our technological narratives around machine intelligence away from its current extractive and competitive logics and directed more generally towards humanizing and cooperative logics. The Erewhonians from Chapter 7 (found in Samuel Butler’s Erewhon) have a more extreme remedy: stop technological evolution, turn it backwards. But short of global revolution, this seems like proposing that natural evolution should stop.
I’ll be editing these 7 chapters, adding a new introduction and conclusion, and publishing Part I as a standalone book later this year. And as part of that process, I intend to spend more time continuing to think about this.
Thanks again for reading and for the comment!
There is a growing academic field of “governance” that exists that would variously be described as a branch of political science, public administration, or policy studies. It is a relatively small field, but has several academic journals where that fit the description of the literature you’re looking for. The best of these journals, in my opinion, is Perspectives on Public Management & Governance (although it has a focus on public governance structures to a fault of ignoring corporate governance structures).
In addition to this, there is a 50 chapter OUP AI Governance Handbook that I’ve co-edited with leading scholars from Economics, Political Science, International Affairs, and other fields of social science that are interested in these exact ideal governance questions as you describe them. 10 of the chapters are currently available, but I also have complete copies of essentially every chapter that I would be happy to share directly with you or anyone else that comments here and is interested. Here’s the Table of Contents. I’m certainly biased, but I think this book contains the cutting edge dialogue around both how ideal governance may be applied to controlling AI and how the development of increasingly powerful AI presents new opportunities and challenges for ideal governance.
I have contributed to these questions both from trying to understand what might be the elements of ideal governance structures and processes for Social Insurance programs, AI systems, and Space settlement ideal governance and to understand what are the concerns of integrating autonomous and intelligent decision making systems into our current governance structures and processes.
I think there are some helpful insights into how to make governance adaptive (reset/jubilee as you described it) and for defining the elements of the hierarchy (various levels) of the governance structure. The governance literature looks at micro/meso/macro levels of governance structures to help illustrate how some governance elements are best described and understand at different levels of emergence or description. Another useful construct from governance scholars is that of discretion or breadth of choice set given to an agent carrying out the required decision making of the various governance entities, this is where much of my own interest lies, which you can see in work I have with colleagues on topics including discretion, evolution of bureaucratic form, artificial discretion, administrative evil, and artificial bureaucrats. This work builds on the notion of bounded rational actors and how they execute decisions in response to constitutional rule and insertional structures. Here & here in the AI Governance Handbook, with colleagues we look at how Herbert Simon and Max Weber’s classic answers to these ideal governance questions hold in a world with machine intelligence, and we examine what new governance tools, structures, and processes may be needed now. I’ve also done some very initial work here on the Lesswrong forum looking at how Weber’s ideal bureaucratic structure might be helpful for considering how to control intelligent machine agents
In brief recap, there is a relatively small interdisciplinary field/community of scholars looking at these questions, it is a community that has done some brainstorming, some empirical work, and used some economics-style thinking to address some of these ideal governance questions. There are also some classic works that touch on these topics as well around thinkers such as Max Weber, Herbert Simon, and Elinor Ostrom.
I hope this is helpful. I’m sure I’ve focused too much on my own work here, but I hope the Handbook in particular gives you some sense of some of the work out there. I would be happy to connect you with other writers and thinkers who I believe are taking these questions of ideal governance seriously. I find these to be among the most interesting and important questions for our moment in time.
Thank you! I’m looking forward to the process of writing it, synthesizing my own thoughts, and sharing them here. I’ll also be hoping to receive your insightful feedback, comments, and discussion along the way!
Thank you for this post, Kyoung-Cheol. I like how you have used Deep Mind’s recent work to motivate the discussion of the consideration of “authority as a consequence of hierarchy” and that “processing information to handle complexity requires speciality which implies hierarchy.”
I think there is some interesting work on this forum that captures these same types of ideas, sometimes with similar language, and sometimes with slightly different language.
In particular, you may find the recent post from Andrew Critch on “Power dynamics as a blind spot or blurry spot in our collective world-modeling, especially around AI” to sympathetic to core pieces of your argument here.
It also looks like Kaj Sotala is having some similar thoughts on adjustments to game theory approaches that I think you would find interesting.
I wanted to share with you an idea that remains incomplete, but I think there is an interesting connection between Kaj Sotala’s discussion of non-agent and multi-agent models of the mind and Andrew Critch’s robust agent-agnostic processes that connects with your ideas here and the general points I make in the IBS post.
Okay, finally, I had been looking for the most succinct quote from Herbert Simon’s description of complexity and I found it. At some point, I plan to elaborate more on how this connects to control challenges more generally as well, but I’d say that we would both likely agree with Simon’s central claim in the final chapter of The Sciences of the Artificial:
“Thus my central theme is that complexity frequently takes the form of hierarchy and that hierarchic systems have some common properties independent of their specific content. Hierarchy, I shall argue, is one of the central structural schemes that the architect of complexity uses.”
Glad you decided to join the conversation here. There are lots of fascinating conversation that are directly related to a lot of the topics we discuss together.
Thanks for this comment. I agree there is some ambiguity here on the types of risks that are being considered with respect to the question of open-sourcing foundation models. I believe the report favors the term “extreme risks” which is defined as “risk of significant physical harm or disruption to key societal functions.” I believe they avoid the terms of “extinction risk” and “existential risk,” but are implying something not too different with their choice of extreme risks.
For me, I pose the question above as:
What I’m looking for is something like “total risk” versus “total benefit.” In other words, if we take all the risks together, just how large are they in this context? In part I’m not sure if the more extreme risks really come from open sourcing the models or simply from the development and deployment of increasingly capable foundation models.
I hope this helps clarify!