CEO at Conjecture.
I don’t know how to save the world, but dammit I’m gonna try.
CEO at Conjecture.
I don’t know how to save the world, but dammit I’m gonna try.
Thanks!
As I have said many, many times before, Conjecture is not a deep shift in my beliefs about open sourcing, as it is not, and has never been, the position of EleutherAI (at least while I was head) that everything should be released in all scenarios, but rather that some specific things (such as LLMs of the size and strength we release) should be released in some specific situations for some specific reasons. EleutherAI would not, and has not, released models or capabilities that would push the capabilities frontier (and while I am no longer in charge, I strongly expect that legacy to continue), and there are a number of things we did discover that we decided to delay or not release at all for precisely such infohazard reasons. Conjecture of course is even stricter and has opsec that wouldn’t be possible at a volunteer driven open source community.
Additionally, Carper is not part of EleutherAI and should be considered genealogically descendant but independent of EAI.
Thanks for this! These are great questions! We have been collecting questions from the community and plan to write a follow up post addressing them in the next couple of weeks.
I initially liked this post a lot, then saw a lot of pushback in the comments, mostly of the (very valid!) form of “we actually build reliable things out of unreliable things, particularly with computers, all the time”. I think this is a fair criticism of the post (and choice of examples/metaphors therein), but I think it may be missing (one of) the core message(s) trying to be delivered.
I wanna give an interpretation/steelman of what I think John is trying to convey here (which I don’t know whether he would endorse or not):
“There are important assumptions that need to be made for the usual kind of systems security design to work (e.g. uncorrelation of failures). Some of these assumptions will (likely) not apply with AGI. Therefor, extrapolating this kind of thinking to this domain is Bad™️.” (“Epistemological vigilance is critical”)
So maybe rather than saying “trying to build robust things out of brittle things is a bad idea”, it’s more like “we can build robust things out of certain brittle things, e.g. computers, but Godzilla is not a computer, and so you should only extrapolate from computers to Godzilla if you’re really, really sure you know what you’re doing.”
I think this is something better discussed in private. Could you DM me? Thanks!
This is a genuinely difficult and interesting question that I want to provide a good answer for, but that might take me some time to write up, I’ll get back to you at a later date.
Yes, we do expect this to be the case. Unfortunately, I think explaining in detail why we think this may be infohazardous. Or at least, I am sufficiently unsure about how infohazardous it is that I would first like to think about it for longer and run it through our internal infohazard review before sharing more. Sorry!
Answered here.
Redwood is doing great research, and we are fairly aligned with their approach. In particular, we agree that hands-on experience building alignment approaches could have high impact, even if AGI ends up having an architecture unlike modern neural networks (which we don’t believe will be the case). While Conjecture and Redwood both have a strong focus on prosaic alignment with modern ML models, our research agenda has higher variance, in that we additionally focus on conceptual and meta-level research. We’re also training our own (large) models, but (we believe) Redwood are just using pretrained, publicly available models. We do this for three reasons:
Having total control over the models we use can give us more insights into the phenomena we study, such as training models at a range of sizes to study scaling properties of alignment techniques.
Some properties we want to study may only appear in close-to-SOTA models—most of which are private.
We are trying to make products, and close-to-SOTA models help us do that better. Though as we note in our post, we plan to avoid work that pushes the capabilities frontier.
We’re also for-profit, while Redwood is a nonprofit, and we’re located in London! Not everyone lives out in the Bay :)
For the record, having any person or organization in this position would be a tremendous win. Interpretable aligned AGI?! We are talking about a top .1% scenario here! Like, the difference between egoistical Connor vs altruistic Connor with an aligned AGI in his hands is much much smaller than Connor with an aligned AGI and anyone, any organization or any scenario, with a misaligned AGI.
But let’s assume this.
Unfortunately, there is no actual functioning reliable mechanism by which humans can guarantee their alignment to each other. If there was something I could do that would irreversibly bind me to my commitment to the best interests of mankind in a publicly verifiable way, I would do it in a heartbeat. But there isn’t and most attempts at such are security theater.
What I can do is point to my history of acting in ways that, I hope, show my consistent commitment to doing what is best for the longterm future (even if of course some people with different models of what is “best for the longterm future” will have legitimate disagreements with my choices of past actions), and pledge to remain in control of Conjecture and shape its goals and actions appropriately.
On a meta-level, I think the best guarantee I can give is simply that not acting in humanity’s best interest is, in my model, Stupid. And my personal guiding philosophy in life is “Don’t Be Stupid”. Human values are complex and fragile, and while many humans disagree about many details of how they think the world should be, there are many core values that we all share, and not fighting with everything we’ve got to protect these values (or dying with dignity in the process) is Stupid.
Probably. It is likely that we will publish a lot of our interpretability work and tools, but we can’t commit to that because, unlike some others, we think it’s almost guaranteed that some interpretability work will lead to very infohazardous outcomes. For example, obvious ways in which architectures could be trained more efficiently, and as such we need to consider each result on a case by case basis. However, if we deem them safe, we would definitely like to share as many of our tools and insights as possible.
We would love to collaborate with anyone (from academia or elsewhere) wherever it makes sense to do so, but we honestly just do not care very much about formal academic publication or citation metrics or whatever. If we see opportunities to collaborate with academia that we think will lead to interesting alignment work getting done, excellent!
Looks good to me, thank you Loppukilpailija!