I think it makes a huge difference that most cybersecurity desasters only cost money (or cause damage to a company’s reputation and loss of confidential information of customers) while a biosecurity desaster can kill a lot of people. This post seems to ignore this?
Herb Ingram
Besides thinking it fascinating and perhaps groundbreaking, I don’t really have original insights to offer. The most interesting democracies on the planet in my opinion are Switzerland and Taiwan. Switzerland shows what a long and sustained cultural development can do. Taiwan shows the potential for reform from within and innovation.
There’s a lot of material to read, in particular the events after the sunflower movement in Taiwan. Keeping links within lesswrong: https://www.lesswrong.com/posts/5jW3hzvX5Q5X4ZXyd/link-digital-democracy-is-within-reach and https://www.lesswrong.com/posts/x6hpkYyzMG6Bf8T3W/swiss-political-system-more-than-you-ever-wanted-to-know-i
What’s missing in this discussion is why one is talking to the “bad faith” actor in the first place.
If you’re trying to get some information and the “bad faith” actor is trying to deceive you, you walk away. That is, unless you’re sure that you’re much smarter or have some other information advantage that allows you to get new useful information regardless. The latter case is extremely rare.
If you’re trying to convince the “bad faith” actor, you either walk away or transform the discussion into a negotiation (it arguably was a negotiation in the first place). The post is relevant for this case. In such situations, people often pretend to be having an object level discussion although all parties know it’s a negotiation. This is interesting.
Even more interesting, Politics: you’re trying to convince an amateur audience that you’re right and someone else is wrong. The other party will almost always act “in bad faith” because otherwise the discussion would be taking place without an audience. You can walk away while accusing the other party of bad faith but the audience can’t really tell if you were “just about to loose the argument” or if you were arguing “less in bad faith than the other party”, perhaps because the other party is losing the argument. Crucially, given that both parties are compelled to argue in bad faith, the audience is to some extent justified in not being moved by any object level arguments since they mostly cannot check if they’re valid. They keep to the opinions they have been holding and the opinions of people they trust.
In this case, it might be worth it to move from the above situation, where the object level being discussed isn’t the real object-level issue, as in the bird example, to one where a negotiation is taking place that is transparent to the audience. However, this is only possible if there is a competent fourth party arbitrating, as the competing parties really cannot give up the advantage of “bad faith”. That’s quite rare.
An upside: If the audience is actually interested in the truth, however, and if it can overcome the tribal drive to flock to “their side”, they can maybe force the arguing parties to focus on the real issue and make object-level arguments in such a way that the audience can become competent enough to judge the arguments.Doing this is a huge investment of time and resources. It may be helped by all parties acknowledging the “bad faith” aspect of the situation and enforcing social norms that address it. This is what “debate culture” is supposed to do but as far as I know never really has.
My takeaway: don’t be too proud of your debate culture where everyone is “arguing in good faith”, if it’s just about learning about the word. This is great, of course, but doesn’t really solve the important problems.
Instead, try to come up with a debate culture (debate systems?) that can actually transform a besides-the-point bad-faith apparent disagreement into a negotiation where the parties involved can afford to make their true positions explicitly known. This is very hard but we shouldn’t give up. For example, some of the software used to modernize democracy in Taiwan seems like an interesting direction to explore.
I think any outreach must start with understanding where the audience is coming from. The people most likely to make the considerable investment of “doing outreach” are in danger of being too convinced of their position and thinking it obvious; “how can people not see this?”.
If you want to have a meaningful conversation with someone and interest them in a topic, you need to listen to their perspective, even if it sounds completely false and missing the point, and be able to empathize without getting frustrated. For most people to listen and consider any object level arguments about a topic they don’t care about, there must first be a relationship of mutual respect, trust and understanding. Getting people to consider some new ideas, rather than convincing them of some cause, is already a very worthy achievement.
Indeed, systems controlling the domestic narrative may become sophisticated enough that censorship plays no big role. No regime is more powerful and enduring than one which really knows what poses a danger to it and what doesn’t, one which can afford to use violence, coercion and censorship in the most targeted and efficient way. What a small elite used to do to a large society becomes something that the society does to itself. However, this is hard and I assume will remain out of reach for some time. We’ll see what develops faster: sophistication of societal control and the systems through which it is achieved, or technology for censorship and surveillance. I’d expect at least a “transition period” of censorship technology spreading around the world as all societies that successfully use it become sophisticated enough to no longer really need it.
What seems more certain is that AI will be very useful for influencing societies in other countries, where the sophisticated domestically optimal means aren’t possible to deploy. This goes very well with exporting such technology.
Uncharitably, “Trust the Science” is a talking point in debates that have some component which one portrays as “fact-based” and which one wants to make an “argument” about based on the authority of some “experts”. In this context, “trust the science” means “believe what I say”.
Charitably, it means trusting that thinking honestly about some topic, seeking truth and making careful observations and measurements actually leads to knowledge, that knowledge is inteligibly attainable. This isn’t obvious, which is why there’s something there to be trusted. It means trusting that the knowledge gained this way can be useful, that it’s worth at least hearing out people who seem or claim to have it, that it’s worth stopping for a moment to honestly question one’s own motivations and priors, the origins of one’s beliefs and to ponder the possible consequences in case of failure, whenever one willingly disbelieves or dismisses that knowledge. In this context, “trust the science” means “talk to me and we’ll figure it out”.
There’s a big difference between philosophy and thinking about unlikely scenarios in the future that are very different from our world. In fact, those two things have little overlap. Although it’s not always clear, (I think) this discussion isn’t about aesthetics, or about philosophy, it’s about scenarios that are fairly simple to judge but have so many possible variations, and are so difficult to predict, that is seems pointless to even try. This feeling of futility is the parallel with philosophy, much of which just digests and distills questions into more questions, never giving an answer, until a question is no longer philosophy and can be answered by someone else.
The discussion is about whether or not human civilization will distroy itself due to negligence and lack of ability to cooperate. This risk may be real or imagined. You may care about future humans or not. But that doesn’t make this neither philosophy nor aesthetics. The questions are very concrete, not general, and they’re fairly objective (people agree a lot more on whether civilization is good than on what beauty is).
-
I really don’t know what you’re saying. To attack an obvious straw man and thus give you at least some starting point for explaining further: Generally, I’d be extremely sceptical of any claim about some tiny coherent group of people understanding something important better than 99% of humans on earth. To put it polemically, for most such claims, either it’s not really important (maybe we don’t really know if it is?), it won’t stay that way for long, or you’re advertising for a cult. The phrase “truly awakened” doesn’t bode well here… Feel free to explain what you actually meant rather than responding to this.
-
Assuming these “ideologies” you speak of really exist in a coherent fashion, I’d try to summarize “Accelerationist ideology” as saying: “technological advancement (including AI) will accelerate a lot, change the world in unimaginable ways and be great, let’s do that as quickly as possible”, while “AI safety (LW version)” as saying “it might go wrong and be catastrophic/unrecoverable; let’s be very careful”. If anything, these ideas as ideologies are yet to get out into the world and might never have any meaningful impact at all. They might not even work on their own as ideologies (maybe we mean different things by that word).
So why are the origins interesting? What do you hope to learn from them? What does it matter if one of those is an “outgrowth” of one thing more than some other? It’s very hard for me to evaluate something like how “shallow” they are. It’s not like there’s some single manifesto or something. I don’t see how that’s a fruitful direction to think about.
No offense, this reads to me as if it was deliberately obfuscated or AI-generated (I’m sure you didn’t do either of these, this is a comment on writing style). I don’t understand what you’re saying. Is it “LW should focus on topics that academia neglects”?
I also didn’t understand at all what the part starting with “social justice” is meant to tell me or has to do with the topic.
There has been some talk recently about long “filler-like” input (e.g. “a a a a a [...]”) somewhat derailing GPT3&4, e.g. leading them to output what seems like random parts of it’s training data. Maybe this effect is worth mentioning and thinking about when trying to use filler input for other purposes.
- 13 Aug 2023 1:39 UTC; 6 points) 's comment on LLMs are (mostly) not helped by filler tokens by (
just in case it turns out he’s heir to a giant fortune or something.
That seems like a highly dubious explanation to me. I guess, the woman’s honest account (or what you’d get by examining her state of mind) would say that she does it as a matter of habit, aiming to be nice and conform to social conventions.
If that’s true, the question becomes where the convention comes from and what maintains it despite the naively plausible benefits one might hope to gain by breaking it. I don’t claim to understand this (that would hint at understanding a lot of human culture at a basic level). However, I strongly suspect the origins of such behavior (and what maintains it) to be social. I.e., a good explanation of why the woman has come to act this way involves more than two people. That might involve some sort of strategic deception, but consider that most people in fact want to be lied to in such situations. An explanation must go a lot deeper than that kind of strategic deception.
While I completely agree in the abstract, I think there’s a very strong tendency for systems-of-thought, such as propagated on this site, to become cult-like. There’s a reason why people outside the bubble criticize LW for building a cult. They see small signs of it happening and also know/feel the general tendency for it, which always exists in auch a context and needs to be counteracted.
As you point out, the concrete ways of thinking propagated here aren’t necessarily the best for all situations and it’s another very deep can of worms to be able to tell which situations are which. Also, it attracts people (such as myself to some degree) who enjoy armchair philosophizing without actually ever trying to do anything useful with that. Akrasia is one thing, not even expecting to do anything useful with some knowledge and pursuing it as a kind of entertainment is another still.
So there’s two ways to frame the message: one is saying that “rationality is about winning”, which is a definition that’s very hard to attack but also vague in it’s immediate and indisputable consequences for how one should think, and also makes it hard to tell if “one is doing it right”.
The other way is to impose some more concrete principles and risk them becoming simplified, ritualized, abused and distorted to a point where they might do net harm. This way also makes it impossible to develop the epistemology further. You pick some meta-level and propose rules for thinking at that level which people eventually and inevitably propagate and defend with the fervor of religious belief. It becomes impossible to improve the epistemology at that point.
The meme (“rationality”) has to be about something in order to spread and also needs some minimum amount of coherence. “It’s about winning” seems to do this job quite well and not too well.
Unfortunately for this scheme, I would expect rendering time for AI videos to eventually be faster than real time. So, as the post implies, even if we had a reasonably good way to prove posteriority, this may not do to certify videos as “non-AI” for long.
On the other hand, as long as rendering AI videos is slower than real time, poof of priority alone might go a long way. You can often argue that prior to some point in time you couldn’t reasonably have known what kind of video you should fake.
The “analog requirement” reminds me of physical unclonable functions, which might have some cross-pollination with this issue. I couldn’t think of a way to make use of them but maybe someone else will.
I guess it depends on whether this post found anything at all that can be called questionable security practice. Maybe it didn’t but the author was also no cybersecurity expert. Upon reflection, my earlier judgement was premature and the phrasing overconfident.
In general, I assume that OpenAI would view a serious hack as quite catastrophic, as it might e.g. leak their model (not an issue in this case), severely damage their reputation and undermine their ongoing attempt at regulatory capture. However, such situations didn’t prevent shoddy security practices in countless cybersecurity desasters.
I guess for this feature even the most serious vulnerabilities “just” lead to some Azure VMs being hacked, which has no relevance for AI safety. It might still be indicative of OpenAIs approach to security, which usually isn’t so nuanced within organizations as to differ wildly between applications where stakes are different. So it’s interesting how secure the system really is, which we won’t know how untill someone hacks it or some whistleblower emerges.
Some of my original reasoning was this:
You might argue that the “inner sandbox” is only used to limit resource use (for users who do not bother jailbreaking it) and to examine how users will act, as well as how badly exactly the LLM itself will fare against jailbreaking. In this case studying how people jailbreak it may be an integral part of the whole feature.
However, even if that is the case, to count as “security mindset”, the “outer sandbox” has to be extremely good and OpenAI needs to be very sure that it is. To my (very limited) knowledge im cybersecurity, it’s an unusual idea that you can reconcile very strong security requirements with purposely not using every opportunity to make it more secure. Maybe the idea that comes closest would be a “honeypot”, which this definitely isn’t.
So that suggests they purposely took a calculated security risk for some mixture of research and commercial reasons, which they weren’t compelled to do. Depending on how dangerous they really think such AI models are or may soon become, how much what they learn from the experiment benefits future security and how confident they are in the outer sandbox, the calculated risk might make sense. Assuming by default that the outer sandbox is “normal indusitry standard”, it’s incompatible with the level of worry they claim when pursuing regulatory capture.
I agree. To me, the most interesting aspects of this (quite interesting and well-executed) exercise are getting a glimpse into OpenAI’s approach to cybersecurity, as well as the potentially worrying fact that GPT3 made meaningful contributions to finding the “exploits”.
Given what was found out here, OpenAI’s security approach seems to be “not terrible” but also not significantly better than what you’d expect from an average software company, which isn’t necessarily encouraging because those get hacked all the time. It’s definitely not what people here call “security mindset”, which casts doubt on OpenAI’s claim to be “taking the dangers very seriously”. I’d expect to hear about something illegal being done with one of these VMs before too long, assuming they continue and expand the service, which I expect they will.
I’m sure there are also security experts (both at OpenAi and elsewhere) looking into this. Given OpenAI’s PR strategy, they might be able to shut down such services “due to emerging security concerns” without much reputational damage. (Many companies are economically compelled to keep services running that they know are compromised or that have known vulnerabilities and instead pretend not to know about them or at least not inform customers as long as possible.) Not sure how much e.g. Microsoft would push back on that. All in all, security experts finding something might be taken seriously.
I’m increasingly worried (while ascribing a decent chance, mind you, that “AI might well go about as bad for us as most of history but not worse”) about what happens when GPT-X has hacking skills that are, say, on par with the median hacker. Being able to hack easy-ish targets at scale might not be something the internet can handle, potentially resulting in, e.g , an evolutionary competition between AIs to build a super-botnet.
Someone who you’re likely to trade with (either because they offer you a trade or because they are around when you want to trade) are on average more experienced than you at trading. So trades available to you are disproportionately unfavorable and you cannot figure out which ones “are likely to lead to favorable trades in the future”, by assumption that they are incomparable.
This is what you mean by “trades are often adversarialy chosen” in (1.), right? I don’t understand why or in what situation you’re dismissing that argument in (1.).
There can be a lot of other reasons to avoid incomparable trades. In accepting a trade where you don’t clearly gain anything, you’re taking a risk to be cheated and reveal information to others about your preferences, which can risk social embarrassment and might enable others to cheat you in the future. You’re investing the mental effort to evaluate these things despite already having decided that you don’t stand to gain anything.
An interesting counterexample are social contexts where trading is an established and central activity. For example, people who exchange certain collectibles. In such a context, people feel that the act of trading itself has positive value and thus will make incomparable trades.
I think this situation is somewhat analogous to betting. Most people (cultures?) are averse to betting in general. Risk aversion and the known danger of gambling addiction explains aversion to betting for money/valuables. However, many people also strongly dislike betting without stakes. In some social contexts (horse racing, LW) betting is encouraged, even between “incomparable options”, where the odds correctly reflect your credence.
In such cases, most people seem to consider it impolite to answer an off-hand probability estimate by offering a bet. It is understood perhaps as questioning their credibility/competence/sincerity, or as an attempt to cheat them when betting for money. People will decline the bet but maintain their off-hand estimate. This might very well make sense, especially if they don’t explicitly value “training to make good probability estimates”, and perhaps for some of the same reasons as apply to trades?
Who is the target audience for this?
I doubt anyone has been calling themselves a “doomer”. There are people on this site who wouldn’t ever get called that but I haven’t seen anyone else here label anyone a “doomer” yet. So it seems that you’re left with people who don’t frequent this site and would probably dismiss your arguments as “a doomer complaining about being called a doomed”?
Did I miss people call each other “doomer” on LW? Did you also post something like this on Twitter?
To me, the arguments from both sides, both arguing for and against worrying about existential risk from AI, make sense. People have different priors and biased access to information. However, even if everyone agreed on all matters of fact that can be currently established, the disagreement would persist. The issue is that predicting the future is very hard and we can’t expect to be in any way certain what will happen. I think the interesting difference between how people “pro” and “contra” AI-x-risk think about this is in dealing with this uncertainty.
Imagine you have a model of the world, which is the best model you have been able to come up with after trying very hard. This model is about the future and predicts catastrophe unless something is done about it now. It’s impossible to check if the model holds up, other than by waiting until it’s too late. Crucially, your model seems unlikely to make true predictions: it’s about the future and rests on a lot of unverifiable assumptions. What do you do?
People “pro-x-risk” might say: “we made the best model we could make, it says we should not build AI. So let’s not do that, at least until our models are improved and say it’s safe enough to try. The default option is not to do something that seems very risky.”.
The opponents might say: “this model is almost certainly wrong, we should ignore what it says. Building risky stuff has kinda worked so far, let’s just see what happens. Besides, somebody will do it anyway.”
My feeling when listening to eleborate and abstract discussions is that people mainly disagree on this point. “What’s the default action?” or, in other words, “who has the burden of proof?”. That proof is basically impossible to give for either side.
It’s obviously great that people are trying to improve their models. That might get harder to do the more politicized the issue becomes.
I really have no idea, probably a lot?
I don’t quite see what you’re trying to tell me. That one (which?) of my two analogies (weather or RTS) is bad? That you agree or disagree with my main claim that “evaluating the relative value of an intelligence advantage is probably hard in real life”?
Your analogy doesn’t really speak to me because I’ve never tried to start a company and have no idea what leads to success, or what resources/time/information/intelligence helps how much.
What point are you trying to make? I’m not sure how that relates to what I was trying to illustrate with the weather example. Assuming for the moment that you didn’t understand my point.
The “game” I was referring to was one where it’s literally all-or-nothing “predict the weather a year from now”, you get no extra points for tomorrow’s weather. This might be artificial but I chose it because it’s a common example of the interesting fact that chaos can be easier to control than simulate.
Another example. You’re trying to win an election and “plan long-term to make the best use of your intelligence advantage”, you need to plan and predict a year ahead. Intelligence doesn’t give you a big advantage in predicting tomorrow’s polls given today’s polls. I can do that reasonably well, too. In this contest, resources and information might matter a lot more than intelligence. Of course, you can use intelligence to obtain information and resources. But this bootstrapping takes time and it’s hard to tell how much depending where you start off.
Formally proving that some X you could realistically build has property Y is way harder than building an X with property Y. I know of no exceptions (formal proof only applies to programs and other mathematical objects). Do you disagree?
I don’t understand why you expect the existence of a “formal math bot” to lead to anything particularly dangerous, other than by being another advance in AI capabilities which goes along other advances (which is fair I guess).
Human-long chains of reasoning (as used for taking action in the real world) neither require nor imply the ability to write formal proofs. Formal proofs are about math and making use of math in the real world requires modelling, which is crucial, hard and usually very informal. You make assumptions that are obviously wrong, derive something from these assumptions, and make an educated guess that the conclusions still won’t be too far from the truth in the ways you care about. In the real world, this only works when your chain of reasoning is fairly short (human-length), just as arbitrarily complex and long-term planning doesn’t work, while math uses very long chains of reasoning. The only practically relevant application so-far seems cryptography because computers are extremely reliable and thus modeling is comparatively easy. However, plausibly it’s still easier to break some encryption scheme than to formally prove that your practically relevant algorithm could break it.
LLMs that can do formal proof would greatly improve cybersecurity across the board (good for delaying some scenarios of AI takeover!). I don’t think they would advance AI capabilities beyond the technological advances used to build them and increasing AI hype. However, I also don’t expect to see useful formal proofs about useful LLMs in my lifetime (you could call this “formal interpretability”? We would first get “informal interpretability” that says useful things about useful models.) Maybe some other AI approach will be more interpretable.
Fundamentally, the objection stands that you can’t prove anything about the real world without modeling, and modeling always yields a leaky abstraction. So we would have to figure out “assumptions that allow to prove that AI won’t kill us all while being only slightly false and in the right ways”. This doesn’t really solve the “you only get one try problem”. Maybe it could help a bit anyway?
I expect a first step might be an AI test lab with many layers of improving cybersecurity, ending at formally verified, air-gapped, no interaction to humans. However, it doesn’t look like people are currently worried enough to bother building something like this. I also don’t see such an “AI lab leak” as the main path towards AI takeover. Rather, I expect we will deploy the systems ourselves and on purpose, finding us at the mercy of competing intelligences that operate at faster timescales than us, and losing control.