Yes, that’s why it’s compromise—nobody will totally like it. But if Earth is going to exist for trillions of years, it will radically change too.
quetzal_rainbow
My honest opinion is that WMD evaluations of LLMs are not meaningfully related to X-risk in the sense of “kill literally everyone.” I guess current or next-generation models may be able to assist a terrorist in a basement in brewing some amount of anthrax, spraying it in a public place, and killing tens to hundreds of people. To actually be capable to kill everyone from a basement, you would need to bypass all the reasons industrial production is necessary at the current level of technology. A system capable to bypass the need for industrial production in a basement is called “superintelligence,” and if you have a superintelligent model on the loose, you have far bigger problems than schizos in basements brewing bioweapons.
I think “creeping WMD relevance”, outside of cyberweapons, is mostly bad, because it is concentrated on mostly fake problem, which is very bad for public epistemics, even if we forget about lost benefits from competent models.
Well, I have bioengineering degree, but my point is that “direct lab experience” doesn’t matter, because WMDs in quality and amount necessary to kill large numbers of enemy manpower are not produced in labs. They are produced in large industrial facilities and setting up large industrial facility for basically anything is on “hard” level of difficulty. There is a difference between large-scale textile industry and large-scale semiconductor industry, but if you are not government or rich corporation, all of them lie in “hard” zone.
Let’s take, for example, Saddam chemical weapons program. First, industrial yields: everything is counted in tons. Second: for actual success, Saddam needed a lot of existing expertise and machinery from West Germany.
Let’s look at Soviet bioweapons program. First, again, tons of yield (someone may ask yourself, if it’s easier to kill using bioweapons than conventional weaponry, why somebody needs to produce tons of them?). Second, USSR built the entire civilian biotech industry around it (many Biopreparat facilities are active today as civilian objects!) to create necessary expertise.
The difference with high explosives is that high explosives are not banned by international law, so there is a lot of existing production, therefore you can just buy them on black market or receive from countries which don’t consider you terrorist. If you really need to produce explosives locally, again, precursors, machinery and necessary expertise are legal and widespread sufficiently that they can be bought.
There is a list of technical challenges in bioweaponry where you are going to predictably fuck up if you have biological degree and you think you know what you are doing but in reality you do not, but I don’t write out lists of technical challenges on the way to dangerous capabilities, because such list can inspire someone. You can get an impression about easier and lower-stakes challenges from here.
The trick is that chem/bio weapons can’t, actually, “be produced simply with easily available materials”, if we talk about military-grade stuff, not “kill several civilians to create scary picture in TV”.
It’s very funny that Rorschach linguistic ability is totally unremarkable comparing to modern LLMs.
The real question is why does NATO have our logo.
This is LGBTESCREAL agenda
I think there is an abstraction between “human” and “agent”: “animal”. Or, maybe, “organic life”. Biological systematization (meaning all ways to systematize: phylogenetic, morphological, functional, ecological) is a useful case study for abstraction “in the wild”.
EY wrote in planecrash about how the greatest fictional conflicts between characters with different levels of intelligence happen between different cultures/species, not individuals of the same culture.
I think that here you should re-evaluate what you consider “natural units”.
Like, it’s clear due to Olbers’s paradox and relativity that we live in causally isolated pocket where stuff we can interact with is certainly finite. If the universe is a set of causally isolated bubbles all you have is anthropics over such bubbles.
I think it’s perfect ground for meme cross-pollination:
“After all this time?”
“Always.”
I’ll repeat myself that I don’t believe in Saint Petersburg lotteries:
my honest position towards St. Petersburg lotteries is that they do not exist in “natural units”, i.e., counts of objects in physical world.
Reasoning: if you predict with probability p that you encounter St. Petersburg lottery which creates infinite number of happy people on expectation (version of St. Petersburg lottery for total utilitarians), then you should put expectation of number of happy people to infinity now, because E[number of happy people] = p * E[number of happy people due to St. Petersburg lottery] + (1 - p) * E[number of happy people for all other reasons] = p * inf + (1 - p) * E[number of happy people for all other reasons] = inf.
Therefore, if you don’t think right now that expected number of future happy people is infinity, then you shouldn’t expect St. Petersburg lottery to happen in any point of the future.
Therefore, you should set your utility either in “natural units” or in some “nice” function of “natural units”.
I think there is a reducibility from one to another using different UTMs? I.e., for example, causal networks are Turing-complete, therefore, you can write UTM that explicitly takes description of initial conditions, causal time evolution law and every SI-simple hypothesis here will correspond to simple causal-network hypothesis. And you can find the same correspondence for arbitrary ontologies which allow for Turing-complete computations.
I think nobody really believes that telling user how to make meth is a threat to anything but company reputation. I would guess this is a nice toy task which recreates some obstacles on aligning superintelligence (i.e., superintelligence will probably know how to kill you anyway). The primary value of censoring dataset is to detect whether model can rederive doom scenario without them in training data.
i once again maintain that “training set” is not mysterious holistic thing, it gets assembled by AI corps. If you believe that doom scenarios in training set meaningfully affect our survival chances, you should censor them out. Current LLMs can do that.
There is a certain story, probably common for many LWers: first, you learn about spherical in vacuum perfect reasoning, like Solomonoff induction/AIXI. AIXI takes all possible hypotheses, predicts all possible consequences of all possible actions, weights all hypotheses by probability and computes optimal action by choosing one with the maximal expected value. Then, it’s not usually even told, it is implied in a very loud way, that this method of thinking is computationally untractable at best and uncomputable at worst and you need to do clever shortcuts. This is true in general, but approach “just list out all the possibilities and consider all the consequences (inside certain subset)” gets neglected as a result.
For example, when I try to solve puzzle from “Baba is You” and then try to analyze how I would be able to solve it faster, I usually come up to “I should have just write down all pairwise interactions between the objects to notice which one will lead to solution”.
I’d say that true name for fake/real thinking is syntactic thinking vs semantic thinking.
Syntactic thinking—you have bunch of statements-strings and operate with them according to rules.
Semantic thinking—you need to actually create model of what these strings mean, do sanity-check, capture things that are true in model but can’t be expressed by given syntactic rules, etc.
I’m more worried about counterfactual mugging and transparent Newcomb. Am I right that you are saying “in first iteration of transparent Newcomb austere decision theory gets no more than 1000$ but then learns that if it modifies its decision theory into more UDT-like it will get more money in similar situations”, turning it into something like son-of-CDT?
First of all, “the most likely outcome at given level of specificity” is not equal to “outcome with the most probability mass”. I.e., if one outcome has probability 2% and the rest of outcomes 1%, 98% is still “other outcome than the most likely”.
The second is that no, it’s not what evolutionary theory predicts. Most of traits are not adaptive, but randomly fixed, because if all traits are adaptive, then ~all mutations are detrimental. Because mutations are detrimental, they need to be removed from gene pool by preventing carriers from reproduction. Because most detrimental mutations do not kill carrier immediately, they have chance to randomly spread in popularion. Because we have “almost all mutations are detrimental” and “everybody has mutations in offspring”, for anything like human genome and human procreation pattern we have hard ceiling on how much of genome can be adaptive (which is like 20%).
Real evolutionary theory prediction is like “some random trait get fixed in the species with the most ecological power (i.e., ASI) and this trait is amortized against all the galaxies”.
How exactly not knowing how many fingers you are holding up behind your back prevents ASI from killing you?
I think the general problem with your metaphor is that we don’t know “relevant physics” of self-improvement. We can’t plot “physically realistic” trajectory of landing in “good values” land and say “well, we need to keep ourselves in direction of this trajectory”. BTW, MIRI has a dialogue with this metaphor.
And most of your suggestions are like “let’s learn physics of alignment”? I have nothing against that, but it is the hard part, and control theory doesn’t seem to provide a lot of insight here. It’s a framework at best.