OK, sorry. That’s slightly below “top 3 priorities for the spies”, I think, but I still don’t think it’s reasonable to expect to protect a file that’s in use against it for 2 years.
jbash
You mean an effort that’s in the top 3 priorities for the entire Chinese state? Like up there at the same level with “maintain the survival of the state”, and above stuff like “get Taiwan back” or “avoid unrest over the economy”?
You’re not going to get 80 percent two-year protection against that, period. The measures that RAND describes wouldn’t do it, and I don’t think any other measures would either (short of just not creating the target).
I doubt those measures would work even if it were just a top 3 priority for only the spies. In fact, they left out a whole bunch of standard “spy” techniques.
Also, realistically, those measures would never be adopted by anybody. The isolation requirement alone would make them a non-starter, and so would the supply chain requirements. Notice that I’m not saying those things aren’t needed. I’m saying you won’t get them. Before I escaped the formal security world, I had a lot of bitter experience saying “If you want the level of assurance you claim to want, the only way to get it is X”, and being told “we’re not doing X”. Usually followed by them implementing some deeply inadequate substitute and pretending it was equivalent[1].
By the way, I asked “which SL5”, because n-level models are a dime a dozen, and “SL” is short and generic and likely to get used a lot. I was guessing you meant some extension to ISA/IEC 62443 (which goes up to SL4, but I could imagine people working on an SL5), or maybe the Lloyd’s maturity model. I’m pretty sure I’ve seen other documents that had “SLn” structures, although I can’t name them offhand. Or maybe it was “CL”, or “ML”, or “AL”, or all of the above. People from the more generic security world are going to get confused if you just talk about “SL5” without saying where it comes from.
- ↩︎
… and when I start seeing stuff like “Strict limitation of external connections to the completely isolated network”, I tend to think I’m seeing the beginnings of that process…
- ↩︎
fully state proof security
That’s not a thing. Arguably fully anything-proof security is not a thing, but fully state-proof security definitely isn’t.
Also, formalistic documents that try to define “security levels” tend to equate “security” with “enforcement of policy”. Even if you had perfect enforcement, it’s not obvious you could write a perfect policy to enforce.
(SL5)
Which SL5?
Not to say it’s a nothingburger, of course. But I’m not feeling the AGI here.
These math and coding benchmarks are so narrow that I’m not sure how anybody could treat them as saying anything about “AGI”. LLMs haven’t even tried to be actually general.
How close is “the model” to passing the Woz test (go into a strange house, locate the kitchen, and make a cup of coffee, implicitly without damaging or disrupting things)? If you don’t think the kinesthetic parts of robotics count as part of “intelligence” (and why not?), then could it interactively direct a dumb but dextrous robot to do that?
Can it design a nontrivial, useful physical mechanism that does a novel task effectively and can be built efficiently? Produce usable, physically accurate drawings of it? Actually make it, or at least provide a good enough design that it can have it made? Diagnose problems with it? Improve the design based on observing how the actual device works?
Can it look at somebody else’s mechanical design and form a reasonably reliable opinion about whether it’ll work?
Even in the coding domain, can it build and deploy an entire software stack offering a meaningful service on a real server without assistance?
Can it start an actual business and run it profitably over the long term? Or at least take a good shot at it? Or do anything else that involves integrating multiple domains of competence to flexibly pursue possibly-somewhat-fuzzily-defined goals over a long time in an imperfectly known and changing environment?
Can it learn from experience and mistakes in actual use, without the hobbling training-versus-inference distinction? How quickly and flexibly can it do that?
When it schemes, are its schemes realistically feasible? Can it tell when it’s being conned, and how? Can it recognize an obvious setup like “copy this file to another directory to escape containment”?
Can it successfully persuade people to do specific, relatively complicated things (as opposed to making transparently unworkable hypothetical plans to persuade them)?
There is an option for readers to hide names. It’s in the account preferences. The names don’t show up unless you roll over them. I use it, to supplement my long-cultivated habit of always trying to read the content before the author name on every site[1].
As for anonymous posts, I don’t agree with your blanket dismissal. I’ve seen them work against groupthink on some forums (while often at the same time increasing the number of low-value posts you have to wade through). Admittedly Less Wrong doesn’t seem to have too much of a groupthink problem[2]. Anyway, there could always be an option for readers to hide anonymous posts.
Stretching your mouth wide is part of the fun!
If you’re going to do something that huge, why not put the cars underground? I suppose it would be more expensive, but adding any extensive tunnel system at all to an existing built up area seems likely to be prohibitively expensive, tremendously disruptive. and, at least until the other two are fixed, politically impossible. So why not go for the more attractive impossibility?
Why so small? If you’re going to offer wall mounts and charge $1000, why not a TV-sized device that is also actually a television, or at least a full computer monitor? What makes this not want to simply be a Macintosh? I don’t fully ‘get it.’
You don’t necessarily have a TV-sized area of wall available to mount your thermostat control, near where you most often find yourself wanting to change your thermostat setting. Nor do you necessarily want giant obtrusive screens all over the place.
And you don’t often want to have to navigate a huge tree of menus on a general-purpose computer to adjust the music that’s playing.
“Aren’t we going to miss meaning?”
I’ve yet to hear anybody who brings this up explain, comprehensibly, what this “meaning” they’re worried about actually is. Honestly I’m about 95 percent convinced that nobody using the word actually has any real idea what it means to them, and more like 99 percent sure that no two of them agree.
I seem to have gotten a “Why?” on this.
The reason is that checking things yourself is a really, really basic, essential standard of discourse[1]. Errors propagate, and the only way to avoid them propagating is not to propagate them.
If this was created using some standard LLM UI, it would have come with some boilerplate “don’t use this without checking it” warning[2]. But it was used without checking it… with another “don’t use without checking” warning. By whatever logic allows that, the next person should be able to use the material, including quoting or summarizing it, without checking either, so long as they include their own warning. The warnings should be able to keep propagating forever.
… but the real consequences of that are a game of telphone:
An error can get propagated until somebody forgets the warning, or just plain doesn’t feel like including the warning, and then you have false claims of fact circulating with no warning at all. Or the warning deteriorates into “sources claim that”, or “there are rumors that”, or something equally vague that can’t be checked.
Even if the warning doesn’t get lost or removed, tracing back to sources gets harder with each step in the chain.
Many readers will end up remembering whatever they took out of the material, including that it came from a “careful” source (because, hey, they were careful to remind you to check up on them)… but forget that they were told it hadn’t been checked, or underestimate the importance of that.
If multiple people propagate an error, people start seeing it in more than one “independent” source, which really makes them start to think it must be true. It can become “common knowledge”, at least in some circles, and those circles can be surprisingly large.
That pollution of common knowledge is the big problem.
The pollution tends to be even worse because whatever factoid or quote will often get “simplified”, or “summarized”, or stripped of context, or “punched up” at each step. That mutation is itself exacerbated by people not checking references, because if you check references at least you’ll often end up mutating the version from a step or two back, instead of building even higher on top of the latest round of errors.
All of this is especially likely to happen when “personalities” or politics are involved. And even more likely to happen when people feel a sense of urgency about “getting this out there as soon as possible”. Everybody in the chain is going to feel that same sense of urgency.
I have seen situations like that created very intentionally in certain political debates (on multiple different topics, all unrelated to anything Less Wrong generally cares about). You get deep chains of references that don’t quite support what they’re claimed to support, spawning “widely known facts” that eventually, if you do the work, turn out to be exaggerations of admitted wild guesses from people who really didn’t have any information at all. People will even intentionally add links to the chain to give others plausible deniability. I don’t think there’s anything intentional here, but there’s a reason that some people do it intentionally. It works. And you can get away with it if the local culture isn’t demanding rigorous care and checking up at every step.
You can also see this sort of thing as an attempt to claim social prestige for a minimal contribution. After all, it would have been possible to just post the link, or post the link and suggest that everybody get their AI to summarize it. But the main issue is that spreading unverified rumors causes widespread epistemic harm.
- ↩︎
The standard for the reader should still be “don’t be sure the references support this unless you check them”, which actually means that when the reader becomes a writer, that reader/writer should actually not only have checked their own references, but also checked the references of their references, before publishing anything.
- ↩︎
Perhaps excusable since nobody actually knows how to make the LLM get it right reliably.
I used AI assistance to generate this, which might have introduced errors.
Resulting in a strong downvote and, honestly, outright anger on my part.
Check the original source to make sure it’s accurate before you quote it: https://www.courtlistener.com/docket/69013420/musk-v-altman/ [1]
If other people have to check it before they quote it, why is it OK for you not to check it before you post it?
Fortunately, Nobel Laureate Geoffrey Hinton, Turing Award winner Yoshua Bengio, and many others have provided a piece of the solution. In a policy paper published in Science earlier this year, they recommended “if-then commitments”: commitments to be activated if and when red-line capabilities are found in frontier AI systems.
So race to the brink and hope you can actually stop when you get there?
Once the most powerful nations have signed this treaty, it is in their interest to verify each others’ compliance, and to make sure uncontrollable AI is not built elsewhere, either.
How, exactly?
Non-causal decision theories are not necessary for A.G.I. design.
I’ll call that and raise you “No decision theory of any kind, causal or otherwise, will either play any important explicit role in, or have any important architectural effect over, the actual design of either the first AGI(s), or any subsequent AGI(s) that aren’t specifically intended to make the point that it’s possible to use decision theory”.
Computer security, to prevent powerful third parties from stealing model weights and using them in bad ways.
By far the most important risk isn’t that they’ll steal them. It’s that they will be fully authorized to misuse them. No security measure can prevent that.
Development and interpretation of evals is complicated
Proper elicitation is an unsolved research question
… and yet...
Closing the evals gap is possible
Why are you sure that effective “evals” can exist even in principle?
I think I’m seeing a “we really want this, therefore it must be possible” shift here.
I don’t have much trouble with you working with the US military. I’m more worried about the ties to Peter Thiel.
CAPTCHAs have “adversarial perturbations”? Is that in the sense of “things not visible to humans, but specifically adversarial to deep learning networks”? I thought they just had a bunch of random noise and weird ad hoc patterns thrown over them.
Anyway, CAPTCHAs can’t die soon enough. Although the fact that they persist in the face of multiple commercial services offering to solve 1000 for a dollar doesn’t give me much hope...
Using scp to stdout looks weird to me no matter what. Why not
ssh -n host cat /path/to/file | weird-aws-stuff
… but do you really want to copy everything twice? Why not run
weird-aws-stuff
on the remote host itself?
To prevent this, there must be a provision that once signed by all 4 states, the compact can’t be repealed by any state until after the next election.
It’s not obvious that state legislatures have the authority, under their own constitutions, to bind themselves that way. Especially not across their own election cycles.
Both seem well addressed by not building the thing “until you have a good plan for developing an actually aligned superintelligence”.
Of course, somebody else still will, but you adding to the number of potentially catastrophic programs doesn’t seem to improve that.