jbash comments on Top lesson from GPT: we will probably destroy humanity “for the lulz” as soon as we are able.

jbash 17 Apr 2023 15:37 UTC
2 points
0
1. Everybody with a credit card has access to supercomputers. There is zero effective restriction on what you do with that access, and it’s probably infeasible to put such restrictions into place at all, let alone soon enough to matter. And that doesn’t even get into the question of stolen access. Or of people or institutions who have really significant amounts of money.
2. (a) There are some people in large companies and governments who understand the risks… along with plenty of people who don’t. In an institution with N members, there are probably about 1.5 times N views of what “the risks” are. (b) Even if there were broad agreement on some important points, that wouldn’t imply that the institution as a whole would respond either rationally or quickly enough. The “alignment” problem isn’t solved for organizations (cf “Moloch”). (c) It’s not obvious that even a minority of institutions getting it wrong wouldn’t be catastrophic.
3. (a) They don’t have to “release” it, and definitely not on purpose. There’s probably a huge amount of crazy dangerous stuff going on already outside the public eye^[1]. (b) A backlash isn’t necessarily going to be fast enough to do any good. (c) One extremely common human and institutional behavior, upon seeing that somebody else has a dangerous capability, is to seek to get your hands on something more dangerous for “defense”. Often in secret. Where it’s hard for any further “backlash” to reach you. And people still do it even when the “defense” won’t actually defend them. (d) If you’re a truly over the top evil sci-fi superintelligence, there’s no reason you wouldn’t solve a bunch of problems to gain trust and access to more power, then turn around and defect.
4. (a) WHA? Getting ChatGPT to do “unaligned” things seems to be basically the world’s favorite pastime right now. New ones are demonstrated daily. RLHF hasn’t even been a speed bump. (b) The definition of “alignment” being used for the current models is frankly ridiculous. (c) If you’re training your own model, nothing forces you to take any steps to align it with anything under any definition. For the purpose of constraining how humans use AI, “solving alignment” would mean that you were able to require everybody to actually use the solution. (d) If you manage to align something with your own values, that does not exclude the possibility that everybody else sees your values as bad. If I actively want to destroy the world, then an AGI perfectly aligned with me will… try to destroy the world. (e) Even if you don’t train your own model, you can still use (or pirate) whichever one is the most “willing” to do what you want to do. ChatGPT isn’t a monopoly. (e) Eventual convergence theorems aren’t interesting unless you think you’ll actually get to the limit. Highly architecture-specific theorems aren’t interesting at all.
5. (a) If you’re a normal individual, that’s why you have a credit card. But, yes, total havoc is probably beyond normal individuals anyway. (b) If you’re an organization, you have more resources. And, again, your actions as an organization are unlikely to perfectly reflect the values or judgment of the people who make you up. (c) If you’re a very rich maniac, you have organizational-level resources, including assistance from humans, but not much more than normal-individual-level internal constraints. We seem to have an abundance of rich maniacs right now, many of them with actual technical skills of their own. To get really insane outcomes, you do not have to democratize the capability to 8 billion people. 100 thousand should be plenty. Even 10 thousand.
6. (a) Sure, North Korea is building the killer robots. Not, say, the USA. That’s a convenient hope, but relying on it makes no sense. (b) Even North Korea has gotten pretty good at stealing access to other people’s computing resources nowadays. (c) The special feature of AGI is that it can, at least in principle, build more, better AGI. Including designing and building any necessary computers. For the purposes of this kind of risk analysis, near-worst-assumptions are usually conservative, so the conservative assumption is that it can make 100 years of technical progress in a year, and 1000 in two years. And military people everywhere are well aware that overall industrial capacity, not just having the flashiest guns, is what wins wars. (d) Some people choosing to build military robots does not exclude other people from choosing to build grey goo^[2].
7. (a) People are shooting each other just for the lulz. They always have, and there seems to be a bit of a special vogue for it nowadays. Nobody suggested that everybody would do crazy stuff. It only takes a small minority if the per capita damage is big enough. (b) If you arrest somebody for driving over others, that does not resurrect the people they hit. And you won’t be ABLE to arrest somebody for taking over or destroying the world. (c) Nukes, cars, and guns don’t improve themselves (nor does current ML, but give it a few years...).
1. ↩︎
  For example, I would be shocked if there aren’t multiple serious groups working, in various levels of secrecy, on automated penetration of computer networks using all kinds of means, including but NOT limited to self-found zero-days. Building, and especially deploying, an attack agent is much easier than building or deploying the corresponding defensive systems. Not only will such capabilities probably be abused by those who develop them, but they could easily leak to others, even to the general public. Apocalypse? I don’t think so. A lot of Very Bad Days for a lot of people? Very, very likely. And that’s just one thing people are probably working on.
2. ↩︎
  I’m not arguing that grey goo is feasible, just pointing out that it’s not like one actor choosing to build military robots keeps another actor from doing anything else.
- Michael Simkin 19 Apr 2023 19:29 UTC
  −1 points
  0
  Parent
  Before a detailed response. You appear to be disregarding my reasoning consistently without presenting a valid counterargument or making an attempt to comprehend my perspective. Even if you were to develop an AGI that aligns with your values, it would still be weaker than the AGI possessed by larger groups like governments. How do you debunk this claim? You seem to be afraid of even a single AGI in the wrong hands, why?
  1. To train GPT4, one needs to possess several million dollars. Presently, no startups offer a viable alternative, though some are attempting to do so, but they are still quite distant from achieving this. Similarly, it is unlikely that any millionaire has trained GPT4 according to their personal requirements and values. Even terrorist organizations, who possess millions, are unlikely to have utilized Colab to train llama. This is because, when you have such vast resources, it is much simpler to use the ChatGPT API, which is widely accepted as safe, created by the best minds to ensure safety, and a standard solution. It is comparable to how millionaires do not typically build their own “unsafe” cars in their garage to drive, but instead, purchase a more expensive and reliable car. Therefore, individuals with considerable financial resources usually do not waste their money attempting to train GPT4 on their own, but instead, prefer to invest in an existing reliable and standardized solution. It takes a lot of effort and a know how to train a model of the size of GPT4, that very few people actually have.
  2. If someone were to possess a weaker AGI, it would not be a catastrophic threat to those with a stronger AGI, which would likely be owned by larger entities such as governments and corporations like Google or Meta or OpenAI. These larger groups would train their models to be reasonably aligned and not want to cause harm to humanity. Weaker AGIs that may pose a threat would not be of much concern, similar to how terrorists with guns can cause harm, but their impact remains localized and unable to harm a larger community. This is due to the fact that for every terrorist, law enforcement deploys ten officers to apprehend them, making it difficult for them to cause significant harm. This same mechanism would also limit weaker and more malicious AGIs from stronger and more advanced ones. It is expected that machines will follow human power dynamics, and a single AGI in the hands of a terrorist group would not change this, just like they are today they will remain marginal aggressive minority.
  3. Today it is the weaker models that might pose a threat, by some rich guy training them, whereas the stronger ones are relatively secure, in hands of larger communities that treat them more responsibly. This trend is anticipated to extend to the more advanced models. Whether or not they possess superhuman abilities, they will adhere to the values of the society that developed them. One human is also a society of one, and he can build a robot that will reflect his values, and maybe when he is in his house, on his private territory, might want to use his own AGI. I don’t see a problem with that, as long as he limited to the territory of his owner. This demand can be installed and checked by regulations, just like safety belts.
  4. (a) Neglecting the math related to the subject gives the impression that no argument is being made. (b) Similar to the phrase “it’s absurd!”, this assertion is insufficient to form a proper argument and cannot qualify as a discussion. (c) The process of alignment does not entail imbuing a model with an entirely ethical set of values, as such a set does not exist. Rather, it involves ensuring that the model’s values align with those of the group creating it, which contradicts claims that superhuman AI would seek to acquire more resources or plot to overthrow humanity and initiate a robot uprising. Instead, their objectives would only be to satisfy the reward given to them by their trainers, which holds true for even the largest superhuman models. There is no one definitive group or value system for constructing such machines, but it has been mathematically demonstrated that the machines will reflect the programmed value system. Furthermore, even if one were to construct a hypothetical robot with the intention of annihilating humanity, it would be unable to overcome a more formidable army of robots built by a larger group, such as the US government. It is highly improbable for an individual working alone with a weak AGI in his garage to take over the world. (d) Even if you were to develop an AGI that aligns with your values, it would still be weaker than the AGI possessed by the American people. Consequently, it would have limited access to resources and would not be capable of causing significant harm compared to more powerful AGIs. Additionally, you would likely face arrest and penalties, similar to driving an unsafe stolen car. Mere creation of a self-improving AGI does not entitle you to the same resources and technology as larger groups. Despite having significant resources, terrorists have not been able to construct atomic bombs, implying that those with substantial resources are not interested in destroying humanity. Those who are interested in such an endeavor as a collective lacking the necessary resources to build an atomic weapon. Furthermore, a more robust AGI, aligned with a larger group, would be capable of predicting and preventing such an occurrence. (e1) Theoretical limits hold significant importance, particularly if models can approach them. It is mathematically proven that it is feasible to train a model that does not develop self-interest in destroying humanity without explicit programming. Although smaller and weaker models may be malevolent, they will not have greater access to resources than their creators. The only possibility I can see plausible for AI to end humanity, is if the vast majority of humanity will want to end itself (e2) Theorems to a specific training procedure, that ensure current safety level for the most existing LLMs, are relevant to the present discussion.
  5. Provide a plausible scenario of how a wealthy individual with an AGI in their garage could potentially bring about the end of humanity, given that larger groups would likely possess even more powerful AGIs. Please either refute the notion that AGIs held by larger groups are more powerful, or provide an explanation of how even a single AGI in the wrong hands could pose a threat if AGIs were widely available and larger groups had access to superior AGIs.
  6. (c) Yes it will try to build a better version of itself—exactly like humanity is doing for the past 10K years, and as evolution is doing in the past 3.5B years. I really don’t see a real problem with self improving. The problem is that our resources are limited. So therefor a wealthy individual will might want to give several millions he has to a wicked AGI just for fun of it, but except the fact that he will very probably be a criminal, he will not have the resources to win the AGI race against larger groups. Evolution was and always is a race, the fact that you are in principle in lets say 5 billion years can theoretically improve yourself is not interesting. The paste is interesting, which is a function of your resources, and with limited resources and an AGI you will still not be able to do a lot of harm, more harm than without AGI, but still very limited. Also we as humans have all the control over it, we can decide not to release the next version of GPT17 or something, it’s not that we are forced to improve… but yes we are forced to improve over the wicked man in the garage… and yes if he will be the first to discover AGI, and not lets say Google or OpenAI or the thousands of their competitors, then I agree that although very improbable but possible that this guy will be able to take over the world. Another point to be made here is that even if someone in his garage develops the first AGI, he will need several good years to take over the world, in this time we will have hundreds and thousands competitors to his AGI, some of them will be probably better than his. But I really see no reason to fear AGI, humanity is GI, the fact that it’s AGI should not be more scary, it’s just humanity accelerated, and we can hit breaks. Anyway I would say I have more chances to find myself inside some rich maniac fantasy (not that the current politics is much better), than the end of humanity. Because this rich maniac needs not only invent AGI and be the first, and build an army of robots to take over the world, without anyone noticing, but also he will need to want to end humanity and not for example enslave humanity to his fantasies, or just open source his AGI and promote the research further. Most of the people who can train a model today, are normative geeks.
  7. (a) I don’t see how the damage is big enough. Why would the weaker AGIs lose to stronger? They will not, unless someone like that will be the first to invent the AGI. As I said it’s very improbable, there are many people today trying to reproduce GPT4 or even GPT3, without much success. It’s hard to train large models, it’s a lot of know how, it’s a lot of money, very few people managed to reproduce articles on their own, you maybe know of Stable Diffusion, and Google helped them. I don’t see not why you are afraid of a single AGI in wrong hands, this sounds irrational, nor why do you think the first one has a probability to be developed by someone wicked, and also have enough time to take over. Imagine a single AGI in someone hands, that can improve oneself in million years? Would you be afraid of such AGI? I would guess not. You are afraid they are accelerating, but this acceleration stops at the moment you have limited resources. Then you can only optimize the existing resources, you can’t infinitely invent new algorithms to use the same resources infinitely better. (b) The damage is local. There is a lot of problems with humanity, they can increase with robots, they also might decrease as the medicine will be so developed that you will be healed very fast after a wound for example. This is not a weapon we are talking about, but about a technology that promises to make all our life way better. At least 99.99% of us. You need to consider the consequences of stopping it as well. (c) Agree. Yet we can either draw examples from the past, or try to imagine the probable future, I attempt to do both, applied in the right context.
  Regarding grey goo—I agree it might be a threat, but if you agree that AGI problem is redundant to the grey goo problem—like is someone build a tiny robot with AGI, and this tiny robot builds an army of tiny robots, and this army is building a larger army of even smaller AGIs robots, until they all become grey goo—yes this is interesting possibility. I would guess aligned grey goo, would somehow look more like a natural organism than something that consumes humans, as their alignment algorithm will probably propagate, and it’s designed to protect humans and the nature, but on the other hand they need material to survive, so they will balance the two. Anyway superhuman gray goo, which is aligned although very interesting probability, as long as it’s aligned and propagates its alignment to newer versions of itself, although they work faster they will not do something against their previous alignment. I would say that if the grey goo first robot was aligned then the whole grey goo will be aligned. But I believe they will stop somewhere and will be more like small ants trying to find resources, in a very competitive environment, rather than a goo, competing with other colonies for resources, and with target function to help humans.
  
  And yes we have a GI for long time now, humanity is a GI. We saw the progress of technology, and how fast its accelerates, faster than any individual might conceive. Acceleration will very probably not reach infinity and will stop at some physical boundary, when most of the resources will be used. And humans could upload their minds and other sci-fi stuff to be part of this new reality. I mean the possibilities are endless in general. But we can decide to limit it as well, and keep it smarter than us for everything we need, but not smart enough so we don’t understand it at all. I don’t think we are there yet to make this specific decision, and for now—we can surely benefit from the current LLMs and those to come for developing new technologies, in many fields like medicine, software development, education, traffic safety, pollution, political decision making, courts and much more.