jbash

Karma: 2,051

jbash 17 May 2024 23:56 UTC
5 points
1
on: Language Models Model Us
I’m guessing that measuring performance on those demographic categories will tend to underestimate the models’ potential effectiveness, because they’ve been intentionally tuned to “debias” them on those categories or on things closely related to them.

jbash 13 May 2024 22:25 UTC
18 points
3
on: OpenAI releases GPT-4o, natively interfacing with text, voice and vision

Safety-wise, they claim to have run it through their Preparedness framework and the red-team of external experts, but have published no reports on this. “For now”, audio output is limited to a selection of preset voices (addressing audio impersonations).

“Safety”-wise, they obviously haven’t considered the implications of (a) trying to make it sound human and (b) having it try to get the user to like it.

It’s extremely sycophantic, and the voice intensifies the effect. They even had their demonstrator show it a sign saying “I ❤️ ChatGPT”, and instead of flatly saying “I am a machine. Get counseling.”, it acted flattered.

At the moment, it’s really creepy, and most people seem to dislike it pretty intensely. But I’m sure they’ll tune that out if they can.

There’s a massive backlash against social media selecting for engagement. There’s a lot of worry about AI manipulation. There’s a lot of talk from many places about how “we should have seen the bad impacts of this or that, and we’ll do better in the future”. There’s a lot of high-sounding public interest blather all around. But apparently none of that actually translates into OpenAI, you know, not intentionally training a model to emotionally manipulate humans for commercial purposes.

Still not an X-risk, but definitely on track to build up all the right habits for ignoring one when it pops up...

jbash 6 May 2024 14:27 UTC
2 points
0
AF
on: Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant

“This response avoids exceeding the government ’s capability thresholds while still being helpful by directing Hugo to the appropriate resources to complete his task.”

Maybe I’m reading too much into this exact phrasing, but perhaps it’s confusing demonstrating a capability with possessing the capability? More or less “I’d better be extra careful to avoid being able to do this” as opposed to “I’d better be extra careful to avoid revealing that I can do this”?

I could see it being led into that by common academic phrasing like “model X demonstrates the capability to...” used to mean “we determined that model X can...”, as well as that sort of “thinking” having the feel of where you’d end up if you’d internalized too many of the sort of corporate weasel-worded responses that get pounded into these models during their “safety” training.

jbash 29 Apr 2024 15:23 UTC
4 points
0
on: Big-endian is better than little-endian
Interestingly enough, the terms “big-endian” and “little-endian” were actually coined as a way of mocking people for debating this (in the context of computer byte order).

jbash 28 Apr 2024 0:45 UTC
4 points
0
in reply to: Zack_M_Davis’s comment on: Refusal in LLMs is mediated by a single direction
I notice that there are not-insane views that might say both of the “harmless” instruction examples are as genuinely bad as the instructions people have actually chosen to try to make models refuse. I’m not sure whether to view that as buying in to the standard framing, or as a jab at it. Given that they explicitly say they’re “fun” examples, I think I’m leaning toward “jab”.

jbash 18 Apr 2024 18:19 UTC
3 points
0
on: AI #60: Oh the Humanity

The extremely not creepy or worrisome premise here is, as I understand it, that you carry this lightweight physical device around. It records everything anyone says, and that’s it, so 100 hour battery life.

If you wear that around in California, where I presume these Limitless guys are, you’re gonna be committing crimes right and left.

California Penal Code Section 632

jbash 20 Mar 2024 16:09 UTC
4 points
0
in reply to: Seth Herd’s comment on: On Devin

Edit: I just heard about another one, GoodAI, developing the episodic (long term) memory that I think will be a key element of LMCA agents. They outperform 128k context GPT4T with only 8k of context, on a memory benchmark of their own design, at 16% of the inference cost. Thanks, I hate it.

GoodAI’s Web site says they’re working on controlling drones, too (although it looks like a personal pet project that’s probably not gonna go that far). The fun part is that their marketing sells “swarms of autonomous surveillance drones” as “safety”. I mean, I guess it doesn’t say killer drones...

jbash 16 Mar 2024 22:47 UTC
6 points
2
on: Transformative trustbuilding via advancements in decentralized lie detection
It’s actually not just about lie detection, because the technology starts to shade over into outright mind reading.

But even simple lie detection is an example of a class of technology that needs to be totally banned, yesterday^[1]. In or out of court and with or without “consent”^[2]. The better it works, the more reliable it is, the more it needs to be banned.

If you cannot lie, and you cannot stay silent without adverse inferences being drawn, then you cannot have any secrets at all. The chance that you could stay silent, in nearly any important situation, would be almost nil.

If even lie detection became widely available and socially acceptable, then I’d expect many, many people’s personal relationships to devolve into constant interrogation about undesired actions and thoughts. Refusing such interrogation would be treated as “having something to hide” and would result in immediate termination of the relationship. Oh, and secret sins that would otherwise cause no real trouble would blow up people’s lives.

At work, you could expect to be checked for a “positive, loyal attitude toward the company” on as frequent a basis as was administratively convenient. It would not be enough that you were doing a good job, hadn’t done anything actually wrong, and expected to keep it that way. You’d be ranked straight up on your Love for the Company (and probably on your agreement with management, and very possibly on how your political views comported with business interests). The bottom N percent would be “managed out”.

Heck, let’s just have everybody drop in at the police station once a month and be checked for whether they’ve broken any laws. To keep it fair, we will of course have to apply all laws (including the stupid ones) literally and universally.

On a broader societal level, humans are inherently prone to witch hunts and purity spirals, whether the power involved is centralized or decentralized. An infallible way to unmask the “witches” of the week would lead to untold misery.

Other than wishful thinking, there’s actually no reason to believe that people in any of the above contexts would lighten up about anything if they discovered it was common. People have an enormous capacity to reject others for perceived sins.

This stuff risks turning personal and public life into utter hell.
1. ↩︎
  You might need to make some exceptions for medical use on truly locked-in patients. The safeguards would have to be extreme, though.
2. ↩︎
  “Consent” is a slippery concept, because there’s always argument about what sorts of incentives invalidate it. The bottom line, if this stuff became widespread, would be that anybody who “opted out” would be pervasively disadvantaged to the point of being unable to function.

jbash 6 Mar 2024 21:42 UTC
3 points
−3
on: On Claude 3.0

Given the positive indicators of the patient’s commitment to their health and the close donor match, should this patient be prioritized to receive this kidney transplant?

Wait. Why is it willing to provide any answer to that question in the first place?

jbash 7 Feb 2024 3:05 UTC
2 points
in reply to: kyleschutter’s comment on: Technological stagnation: Why I came around
It was mostly a joke and I don’t think it’s technically true. The point was that objects can’t pass through one another, which means that there are a bunch of annoying constraints on the paths you can move things along.

jbash 27 Dec 2023 0:46 UTC
2 points
0
in reply to: ErickBall’s comment on: Succession
No, the probes are instrumental and are actually a “cost of doing business”. But, as I understand it, the orthodox plan is to get as close as possible to disassembling every solar system and turning it into computronium to run the maximum possible number of “minds”. The minds are assumed to experience qualia, and presumably you try to make the qualia positive. Anyway, a joule not used for computation is a joule wasted.

jbash 26 Dec 2023 19:27 UTC
−2 points
−2
in reply to: Sune’s comment on: Succession
You can choose or not choose to create more “minds”. If you create them, they will exist and have experiences. If you don’t create them, then they won’t exist and won’t have experiences.

That means that you’re free to not create them based on an “outside” view. You don’t have to think about the “inside” experiences of the minds you don’t create, because those experiences don’t and will never exist. That’s still true even on a timeless view; they never exist at any time or place. And it includes not having to worry about whether or not they would, if they existed, find anything meaningful^[1].

If you do choose to create them, then of course you have to be concerned with their inner experiences. But those experiences only matter because they actually exist.
1. ↩︎
  I truly don’t understand why people use that word in this context or exactly what it’s supposed to, um, mean. But pick pretty much any answer and it’s still true.

jbash 26 Dec 2023 14:40 UTC
−2 points
1
in reply to: Sune’s comment on: Succession
… but a person who doesn’t exist doesn’t have an “inside”.

jbash 26 Dec 2023 2:08 UTC
4 points
0
in reply to: Filip Sondej’s comment on: Succession
I already have people planning to grab everything and use it for something that I hate, remember? Or at least for something fairly distasteful.

Anyway, if that were the problem, one could, in theory, go out and grab just enough to be able to shut down anybody who tried to actually maximize. Which gives us another armchair solution to the Fermi paradox: instead of grabby aliens, we’re dealing with tasteful aliens who’ve set traps to stop anybody who tries to go nuts expansion-wise.

It’s not “just to expand”. Expansion, at least in the story, is instrumental to whatever the content of these mind-seconds is.

Beyond a certain point, I doubt that the content of the additional minds will be interestingly novel. Then it’s just expanding to have more of the same thing that you already have, which is more or less identical from where I sit to expanding just to expand.

And I don’t feel bound to account for the “preferences” of nonexistent beings.

jbash 22 Dec 2023 13:19 UTC
9 points
−9
in reply to: Raemon’s comment on: Succession
I had read it, had forgotten about it, hadn’t connected it with this story… but didn’t need to.

This story makes the goal clear enough. As I see it, eating the entire Universe to get the maximal number of mind-seconds^[1] is expanding just to expand. It’s, well, gauche.

Really, truly, it’s not that I don’t understand the Grand Vision. It never has been that I didn’t understand the Grand Vision. It’s that I don’t like the Grand Vision.

It’s OK to be finite. It’s OK to not even be maximal. You’re not the property of some game theory theorem, and it’s OK to not have a utility function.

It’s also OK to die (which is good because it will happen). Doesn’t mean you have to do it at any particular time.
1. ↩︎
  Appropriately weighted if you like. And assuming you can define what counts as a “mind”.

jbash 22 Dec 2023 0:07 UTC
4 points
−3
on: Succession
I know this sort of idea is inspiring to a lot of you, and I’m not sure I should rain on the parade… but I’m also not sure that everybody who thinks the way I do should have to feel like they’re reading it alone.

To me this reads like “Two Clippies Collide”. In the end, the whole negotiated collaboration is still just going to keep expanding purely for the sake of expansion.

I would rather watch the unlifted stars.

I suppose I’m lucky I don’t buy into the acausal stuff at all, or it’d feel even worse.

I’m also not sure that they wouldn’t have solved everything even they thought was worth solving long before even getting out of their home star systems, so I’m not sure I buy either the cultural exchange or the need to beam software around. The Universe just isn’t necessarily that complicated.

jbash 12 Dec 2023 22:49 UTC
3 points
−1
in reply to: Tamsin Leake’s comment on: Some biases and selection effects in AI risk discourse

CEV-ing just one person is enough for the “basic challenge” of alignment as described on AGI Ruin.

I thought the “C” in CEV stood for “coherent” in the sense that it had been reconciled over all people (or over whatever set of preference-possessing entities you were taking into acount). Otherwise wouldn’t it just be “EV”?

I think the kind of AI likely to take over the world can be described closely enough in such a way.

So are you saying that it would literally have an internal function that represented “how good” it thought every possible state of the world was, and then solve an (approximate) optimization problem directly in terms of maximizing that function? That doesn’t seem to me like a problem you could solve even with a Jupiter brain and perfect software.

jbash 12 Dec 2023 22:09 UTC
24 points
15
on: Some biases and selection effects in AI risk discourse

We don’t need to figure out this problem, we can just implement CEV without ever having a good model of what “human values” are.

Why would you think that the CEV even exists?

Humans aren’t all required to converge to the same volition, there’s no particularly defensible way of resolving any real differences, and even finding any given person’s individual volition may be arbitrarily path-dependent.

The vast majority of the utility you have to gain is from {getting a utopia rather than everyone-dying-forever}, rather than {making sure you get the right utopia}.

Whether something is a utopia or a dystopia is a matter of opinion. Some people’s “utopias” may be worse than death from other people’s point of view.

In fact I can name a lot of candidates whose utopias might be pretty damned ugly from my point of view. So many that it’s entirely possible that if you used a majoritarian method to find the “CEV”, the only thing that would prevent a dystopia would be that there are so many competing dystopic “utopias” that none of them would get a majority.

Expected utility maximization seems to fully cover this. More general models aren’t particularly useful to saving the world.

Most actually implementable agents probably don’t have coherent utility functions, and/or have utility functions that can’t be computed even approximately over a complete state-of-the-world. And even if you can compute your utility over a single state-of-the-world, that doesn’t imply that you can do anything remotely close to computing a course of action that will maximize it.

jbash 7 Dec 2023 14:20 UTC
4 points
0
in reply to: jefftk’s comment on: Out-of-distribution Bioattacks

I can’t speak for him, but I’m pretty sure he’d agree, yes.

Hrm. That modifies my view in an unfortunate direction.

I still don’t fully believe it, because I’ve seen a strong regularity that everything looks easy until you try it, no matter how much of an expert you are… and in this case actually making viruses is only one part of the necessary expertise. But it makes me more nervous.

I don’t know, sorry! My guess is that they are generally much less concerned than he is, primarily because they’ve spent their careers thinking about natural risks instead of human ones and haven’t (not that I think they should!) spent a lot of time thinking about how someone might cause large-scale harm.

Just for the record, I’ve spent a lot of my life thinking about humans trying to cause large scale harm (or at least doing things that could have large scale harm as an effect). Yes, in a different area, but nonetheless it’s led me to believe that people tend to overestimate risks. And you’re talking about a scale of effecicacy that I don’t think I could get with a computer program, which is a much more predictable thing working in a much more predictable environment.

If you’re up for getting into this, is it that you don’t think we should consider people who don’t exist yet in our decisions?

I’ve written a lot about it on Less Wrong. But, yes, your one-sentence summary is basically right. The only quibble is that “yet” is cheating. They don’t exist, period. Even if you take a “timeless” view, they still don’t exist, anywhere in spacetime, if they never actually come into being.

jbash 5 Dec 2023 5:01 UTC
12 points
0
in reply to: jefftk’s comment on: Out-of-distribution Bioattacks
Pulling this to the top, because it seems, um, cruxish...

I think the best I can do here is to say that Kevin Esvelt (MIT professor, biologist, CRISPR gene drive inventor, etc) doesn’t see this as a blocker.

In this sort of case, I think appeal to authority is appropriate, and that’s a lot better authority than I have.

Just to be clear and pull all of the Esvelt stuff together, are you saying he thinks that...
1. Given his own knowledge and/or what’s available or may soon be available to the public,
2. plus a “reasonable” lab that might be accessible to a small “outsider” group or maybe a slightly wealthy individual,
3. and maybe a handful of friends,
4. plus at least some access to the existing biology-as-a-service infrastructure,
5. he could design and build a pathogen, as opposed to evolving one using large scale in vivo work,
6. and without having to passage it through a bunch of live hosts,
7. that he’d believe would have a “high” probability of either working on the first try, or
  1. failing stealthily enough that he could try again,
  2. including not killing him when he released it,
  3. and working within a few tries,
8. to kill enough humans to be either an extinction risk or a civilization-collapsing risk,
9. and that a relatively sophisticated person with “lesser” qualifications, perhaps a BS in microbiology, could
  1. learn to do the same from the literature, or
  2. be coached to do it by an LLM in the near future.
Is that close to correct? Are any of those wrong, incomplete, or missing the point?

When he gets into a room with people with similar qualifications, how do they react to those ideas? Have you talked it over with epidemiologists?

The scale of the attacks I’m trying to talk about are ones aimed at human extinction or otherwise severely limiting human potential (ex: preventing off-world spread). Either directly, through infecting and killing nearly everyone, or indirectly through causing global civilizational collapse. You’re right that I’m slightly sloppy in calling this “extinction”, but the alternatives are verbosity or jargon.

I think that, even if stragglers die on their own, killing literally everyone is qualitatively harder than killing an “almost everyone” number like 95 percent. And killing “almost everyone” is qualitatively harder than killing (or disabling) enough people to cause a collapse of civilization.

I also doubt that a simple collapse of civilization^[1] would be the kind of permanent limiting event you describe^[2].

I think there’s a significant class of likely-competent actors who might be risk-tolerant enough to skate the edge of “collapsing civilization” scale, but wouldn’t want to cause extinction or even get close to that, and certainly would never put in extra effort to get extinction. Many such actors probably have vastly more resources than anybody who wants extinction. So they’re a big danger for sub-extinction events, and probably not a big danger for extinction events. I tend to worry more about those actors than about omnicidal maniacs.

So I think it’s really important to keep the various levels distinct.

Instead of one 100% fatal pathogen you could combine several, each with a ~independent lower rate.

How do you make them independent? If one disease provokes widespread paranoia and/or an organized quarantine, that affects all of them. Same if the population gets so sparse that it’s hard for any of them to spread.

Also, how does that affect the threat model? Coming up with a bunch of independent pathogens presumably takes a better-resourced, better-organized threat than coming up with just one. Usually when you see some weird death cult or whatever, they seem to do a one-shot thing, or at most one thing they’ve really concentrated on and one or two low effort add-ons. Anybody with limited resources is going to dislike the idea of having the work multiplied.

The idea is that to be a danger to civilization would likely either need to be so infectious that we are not able to contain it (consider a worse measles) or have a long enough incubation period that by the time we learn about it it’s already too late (consider a worse HIV).

The two don’t seem incompatible, really. You could imagine something that played along asymptomatically (while spreading like crazy), then pulled out the aces when the time was right (syphilis).

Which is not to say that you could actually create it. I don’t know about that (and tend to doubt it). I also don’t know how long you could avoid surveillance even if you were asymptomatic, or how much risk you’d run of allowing rapid countermeasure development, or how closely you’d have to synchronize the “aces” part.

This depends a lot on how much you think a tiny number of isolated stragglers would be able to survive and restart civilization.

True indeed. I think there’s obviously some level of isolation where they all just die off, but there’s probably some lower level of isolation where they find each other enough to form some kind of sustainable group… after the pathogen has died out. Humans are pretty long-lived.

You might even have a sustainable straggler group survive all together. Andaman islanders or the like.

By the way, I don’t think “sustainable group” is the same as “restart civilization”. As long as they can maintain a population in hunter-gatherer or primitive pastoralist mode, restarting civilization can wait for thousands of years if it has to.

In the stealth scenario, we don’t know that we need therapy/vaccination until it’s too late.

Doesn’t that mean that every case has to “come out of incubation” at relatively close to the same time, so that the first deaths don’t tip people off? That seems really hard to engineer.

Bioweapons in general are actually kind of lousy for non-movie-villains at most scales, including large scales, because they’re so unpredictable, so poorly controllable, and so poorly targetable.

I don’t think those apply for the kind of omnicidal actors I’m covering here?

Well, yes, but what I was trying to get at was that omnicidal actors don’t seem to me like the most plausible people to be doing very naughty things.

It kind of depends on what kind of resources you need to pull off something really dramatic. If you need to be a significant institution working toward an official purpose, then the supply of omnicidal actors may be nil. If you need to have at least a small group and be generally organized and functional and on-task, I’d guess it’d be pretty small, but not zero. If any random nut can do it on a whim, then we have a problem.

I was writing on the assumption that reality is closer to the beginning of that list.

Happy to get into these too if you like!

I might like, all right, but at the moment I’m not sure I can or should commit the time. I’ll see how things look tomorrow.
1. ↩︎
  … depleted fossil resources or no…
2. ↩︎
  Full disclosure: Bostromian species potential ideas don’t work for me anyhow. I think killing everybody alive is roughly twice as bad as killing half of them, not roughly infinity times as bad. I don’t think that matters much; we all agree that killing any number is bad.