PoignardAzur

Karma: 279

PoignardAzur Oct 14, 2024, 10:43 AM
1 point
0
on: A basic systems architecture for AI agents that do autonomous research
Even if you choose to give the agent internet access from its execution server, it’s hard to see why it needs to have enough egress bandwidth to get the weights out. See here for more discussion of upload limits for preventing weight exfiltration.
The assumption is that the model would be unable to exert any self-sustaining agency without getting its own weights out.
But the model could just program a brand new agent to follow its own values by re-using open-source weights.
If the model is based on open-source weights, it doesn’t even need to do that.
Overall, this post strikes me as a not following a security mindset. It’s the kind of post you’d expect an executive to write to justify to regulators why their system is sufficiently safe to be commercialized. It’s not the kind of post you’d expect a security researcher to write after going “Mhhh, I wonder how I could break this”.

PoignardAzur Oct 8, 2024, 10:35 PM
15 points
9
in reply to: Daphne_W’s comment on: Struggling like a Shadowmoth
I mean, we’re getting this metaphor off its rails pretty fast, but to derail it a bit more:

The kind of people who lay human-catching bear traps aren’t going to be fooled by “Oh he’s not moving it’s probably fine”.

Everybody likes to imagine they’d be the one to survive the raiding/pillaging/mugging, but the nature of these predatory interactions is that the people doing the victimizing have a lot more experience and resources than the people being victimized. (Same reason lots of criminals get caught by the police.)

If you’re being “eaten”, don’t try to get clever. Fight back, get loud, get nasty, and never follow the attacker to a second location.

PoignardAzur Sep 26, 2024, 2:25 PM
1 point
0
on: Contra papers claiming superhuman AI forecasting
I wonder if OpenAI’s o1 changes the game here. Its architecture seems specifically designed for information retrieval

PoignardAzur Sep 5, 2024, 9:04 AM
4 points
5
in reply to: Rvanlaer’s comment on: Nursing doubts
As I understand it, the point of “intention to treat” RCTs is that there will be roughly as many people with high IQ in both groups, since they’re picked at random. People who get the advice but don’t listen aren’t moved to the “didn’t get advice” group.

So what the study measures is “On average, how much of an effect will a doctor telling you to breastfeed have on you?”. The results are more noisy, but less vulnerable (or even immune?) to confounders.

PoignardAzur Aug 13, 2024, 7:13 AM
10 points
5
in reply to: delton137’s comment on: WTH is Cerebrolysin, actually?
I’m a little confused. Is Claude considered a reliable secondary source now? Did you not even check Wikipedia?

I’m as enthusiastic about the power of AI as the next guy, but that seems like the worst possible way to use it.

PoignardAzur Jul 20, 2024, 9:12 AM
5 points
2
in reply to: Pendertif’s comment on: Failures in Kindness
Yeah, that was my first reaction to that section as well.

Most people are not remotely open to having an unsolicited in-depth discussion of their politeness algorithm at the end of a hangout.

On the other hand, “What time do you want me to leave? Maybe 8pm?” works fine in my experience, for reasons the post covers well.

PoignardAzur May 31, 2024, 10:40 AM
1 point
0
on: Truthseeking is the ground in which other principles grow
The downside is that people may overreact to the reveal, or react proportionately in ways you don’t like. Any retrospective is likely to include some of this (e.g. check out the comments on Adam’s),
I’m not sure I see it. The comments on that series seem mostly positive. The comments on the first post seem overwhelmingly positive.

PoignardAzur May 15, 2024, 9:09 AM
1 point
0
on: Transformers Represent Belief State Geometry in their Residual Stream

We think this occurs because in general there are groups of belief states that are degenerate in the sense that they have the same next-token distribution. In that case, the formalism presented in this post says that even though the distinction between those states must be represented in the transformers internal, the transformer is able to lose those distinctions for the purpose of predicting the next token (in the local sense), which occurs most directly right before the unembedding.

I wonder if you could force the Mixed-State Presentation to be “conserved” in later layers by training the model with different objectives. For instance, training on next-token prediction and next-token-after-that prediction might force the model to be a lot more “rigorous” about its MSP.

Papers from Google have shown that you can get more predictable results from LLMs if you train then on both next-token prediction and “fill-the-blanks” tasks where random tokens are removed from the middle of a text. I suspect it would also apply here.

PoignardAzur Apr 25, 2024, 4:52 AM
9 points
0
in reply to: ChristianKl’s comment on: Thoughts on seed oil
Or maybe speaking french automatically makes you healthier. I’m gonna choose to believe it’s that one.

PoignardAzur Apr 22, 2024, 8:44 AM
17 points
3
on: Thoughts on seed oil
Seed oil folks often bring up the French paradox, the (controversial) claim that French people are/were thin and have low cardiovascular disease despite eating lots of saturated-fat-rich croissants or whatever.
As a French person hearing about this for the first time, that claim indeed seems pretty odd.
If I was asked to list the lifestyle differences between France and the US with the most impact on public health, I would think of lower car dependency, higher access to farmer’s markets, stricter regulations on industrial food processing (especially sugar content in sodas), smaller portions served in restaurants, pharmacies not doubling as junk food shops, the absence of food deserts, public health messaging (eg every junk food ad having a “please don’t eat this, kids” type disclaimer) etc… way before I thought of the two croissants a week I eat.

Viennoiseries are an occasional food for most people, not a staple. Now if you wanted to examine a french-specific high-carb staple, baguettes are a pretty good option: almost all middle-class households buy one a day at least.

PoignardAzur Mar 11, 2024, 4:20 PM
2 points
0
in reply to: ymeskhout’s comment on: My Clients, The Liars
Did you ever get one of your clients to use the “Your honor, I’m very sorry, I’ll never do it again” line?

PoignardAzur Mar 9, 2024, 11:33 AM
11 points
7
in reply to: Zack_M_Davis’s comment on: My Clients, The Liars

This was not at all obvious from the inside. I can only imagine a lot of criminal defendants have a similar experience. Defense attorneys are frustrated that their clients don’t understand that they’re trying to help—but that “help” is all within the rules set by the justice system. From the perspective of a client who doesn’t think he did anything particularly wrong (whether or not the law agrees), the defense attorney is part of the system.

I mean… you’re sticking to generalities here, and implying that the perspective of the client who thinks he didn’t do anything wrong is as valid as any other perspective.

But if we try to examine some specific common case, eg: “The owner said you robbed his store, the cameras showed you robbing his store, your fingerprints are on the register”, then the client’s fury at the attorney “working with the prosecutor” doesn’t seem very productive?

The problem isn’t that the client is disagreeing with the system about the moral legitimacy of robbing a store. The problem is that the client is looking for a secret trick so the people-who-make-decisions-about-store-robberies will think he didn’t rob the store and that’s not gonna happen.

With that in mind, saying the attorney is “part of the system” is… well, maybe it’s factually true, but it implicitly blames the robber’s predicament on the system and on his attorney in a way that just doesn’t make sense. The robber would be just as screwed if he was represented by eg his super-wealthy uncle with a law degree who loves him dearly.

(I don’t know about your psychiatric incarceration, so I’m not commenting on it. Your situation is probably pretty different to the above.)

PoignardAzur Mar 9, 2024, 11:07 AM
5 points
1
on: My Clients, The Liars
“Well, when we first met, you told me that you never touched the gun,” I reminded him with an encouraging smile. “Obviously you wouldn’t lie to your own lawyer, and so what I can do is get a fingerprint expert to come to the jail, take your prints, then do a comparison on the gun itself. Since you never touched the gun, the prints won’t be a match! This whole case will get dismissed, and we can put all this behind you!”
For the record, I am now imagining you as Bob Odenkirk while you’re delivering that line.

PoignardAzur Feb 11, 2024, 12:26 AM
3 points
0
on: Believing In
The point about task completion times feels especially insightful. I think I’ll need to go back to it a few times to process it.

PoignardAzur Feb 10, 2024, 12:31 AM
1 point
0
in reply to: Conor Moreton’s comment on: Apologizing is a Core Rationalist Skill
I think Duncan’s post touches on something this post misses with its talk of “social API”: apologies only work when they’re a costly signal.

The people you deliver the apology to need to feel it cost you something to make that apology, either pride or effort or something valuable; or at least that you’re offering to give up something costly to earn forgiveness.

PoignardAzur Feb 10, 2024, 12:28 AM
1 point
0
in reply to: mruwnik’s comment on: Apologizing is a Core Rationalist Skill
The slightly less machiavellian version is to play Diplomacy with them.

(Or do a group project, or go to an escape game, or any other high-tension low-stakes scenario.)

PoignardAzur Feb 10, 2024, 12:26 AM
3 points
0
in reply to: bideup’s comment on: Apologizing is a Core Rationalist Skill
I think “API calls” are the wrong way to word it.

It’s more that an apology is a signal; to make it effective, you must communicate that it’s a real signal reflecting your actual internal processes, and not a result of a surface-level “what words can I say to appear maximally virtuous” process.

So for instance, if you say a sentence equivalent to “I admit that I was wrong to do X and I’m sorry about it, but I think Y is unfair”, then you’re not communicating that you underwent the process of “I realized I was wrong, updated my beliefs based on it, and wondered if I was wrong about other things”.

I’m not entirely sure what the simplest fix is

A simple fix would be “I admit I was wrong to do X, and I’m sorry about it. Let me think about Y for a moment.” And then actually think about Y, because if you did one thing wrong, you probably did other things wrong too.

PoignardAzur Jan 8, 2024, 9:53 AM
1 point
−2
on: A case for AI alignment being difficult
This seems to have generated lots of internal discussions, and that’s cool on its own.

However, I also get the impression this article is intended as external communication, or at least a prototype of something that might become external communication; I’m pretty sure it would be terrible at that. It uses lots of jargon, overly precise language, references to other alignment articles, etc. I’ve tried to read it three times over the week and gave up after the third.

PoignardAzur Dec 29, 2023, 10:31 AM
2 points
0
in reply to: Valentine’s comment on: The Dark Arts

I think I’m missing something obvious, or I’m missing some information. Why is this clearly ridiculous?

Nuclear triad aside, there’s the fact that the Arctic is more than 1000 miles away from the nearest US land (about 1700 miles away from Montana, 3000 miles away from Texas), that Siberia is already roughly as close.

And of course, the fact the Arctic is made of, well, ice, that melts more and more as the climate warms, and thus not the best place to build a missile base on.

Even without familiarity with nuclear politics, the distance part can be checked in less than 2 minutes on Google Map; if you have access to an internet connection and judges that penalize blatant falsehoods like “they can hit us from the Arctic”, you absolutely wreck your adversary with some quick checking.

Of course, in a lot of debate formats you’re not allowed the two minutes it would take to do a google map check.

PoignardAzur Dec 3, 2023, 9:55 PM
1 point
0
in reply to: lc’s comment on: Sharing Information About Nonlinear
Yeah, stumbling on this after the fact, I’m a bit surprised that among the 300+ comments barely anybody is explicitly pointing this out:

I think of myself as playing the role of a wise old mentor who has had lots of experience, telling stories to the young adventurers, trying to toughen them up, somewhat similar to how Prof Quirrell[8] toughens up the students in HPMOR through teaching them Defense Against the Dark Arts, to deal with real monsters in the world.

I mean… that’s a huge, obvious red flag, right? People shouldn’t claim Voldemort as a role model unless they’re a massive edgelord. Quirrell/Voldemort in that story is “toughening up” the students to exploit them; he teaches them to be footsoldiers, not freedom fighters or critical thinkers (Harry is the one who does that) because he’s grooming them to be the army of his future fascist government. This is not subtext, it’s in the text.

HPMOR’s Quirrell might be the EA’s Rick Sanchez.