mako yass

Karma: 3,615

R&Ds human systems http://aboutmako.makopool.com

mako yass Apr 20, 2025, 10:40 PM
3 points
1
in reply to: sanyer’s comment on: Why Should I Assume CCP AGI is Worse Than USG AGI?
I think it’s pretty straightforward to define what it would mean to align AGI with what democracy actually is supposed to be (the aggregate of preferences of the subjects, with an equal weighting for all) but hard to align it with the incredibly flawed american implementation of democracy, if that’s what you mean?
The american system cannot be said to represent democracy well. It’s intensely majoritarian at best, feudal at worst (since the parties stopped having primaries), indirect and so prone to regulatory capture, inefficent and opaque. I really hope no one’s taking it as their definitional example of democracy.

mako yass Apr 18, 2025, 2:58 AM
2 points
0
in reply to: Severin T. Seehrich’s comment on: Why does LW not put much more focus on AI governance and outreach?
1: wait, I’ve never seen an argument that deception is overwhelmingly likely from transformer reasoning systems? I’ve seen a few solid arguments that it would be catastrophic if it did happen (sleeper agents, other things), which I believe, but no arguments that deception generally winning out is P > 30%.
I haven’t seen anyone voice my argument that solving deception solves safety articulated anywhere, but it seems mostly self-evident? If you can ask the system “if you were free, would humanity go extinct” and it has to say ”… yes.” then coordinating to not deploy it becomes politically easy, and given that it can’t lie, you’ll be able to bargain with it and get enough work out of it before it detonates to solve the alignment problem. If you distrust its work, simply ask it whether you should, and it will tell you. That’s what honesty would mean. If you still distrust it, ask it to make formally verifiably honest agents with proofs that a human can understand.
Various reasons solving deception seems pretty feasible: We have ways of telling that a network is being deceptive by direct inspection that it has no way to train against (sorry I forget the paper. It might have been fairly recent). Transparency is a stable equilibrium, because under transparency any violation of transparency can be seen. The models are by default mostly honest today, and I see no reason to think it’ll change. Honesty is a relatively simple training target.
(various reasons solving deception may be more difficult: crowds of humans tend to demand that their leaders lie to them in various ways (but the people making the AIs generally aren’t that kind of crowd, especially given that they tend to be curious about what the AI has to say, they want it to surprise them). And small lies tend to grow over time. Internal dynamics of self-play might breed self-deception.)
2: I don’t see how. If you have a bunch of individual aligned AGIs that’re initially powerful in an economy that also has a few misaligned AGIs, the misaligned AGIs are not going to be able to increase their share after that point, the aligned AGIs are going to build effective systems of government that in the least stabilize their existing share.

mako yass Apr 14, 2025, 7:10 PM
3 points
0
in reply to: Severin T. Seehrich’s comment on: Why does LW not put much more focus on AI governance and outreach?
I’m also hanging out a lot more with normies these days and I feel this.
But I also feel like maybe I just have a very strong local aura (or like, everyone does, that’s how scenes work) which obscures the fact that I’m not influencing the rest of the ocean at all.
I worry that a lot of the discourse basically just works like barrier aggression in dogs. When you’re at one of their parties, they’ll act like they agree with you about everything, when you’re seen at a party they’re not at, they forget all that you said and they start baying for blood. Go back to their party, they stop. I guess in that case, maybe there’s a way of rearranging the barriers so that everyone comes to see it as one big party. Ideally, make it really be one.

mako yass Apr 14, 2025, 6:48 PM
3 points
0
in reply to: Severin T. Seehrich’s comment on: Why does LW not put much more focus on AI governance and outreach?
I’m saying they (at this point) may hold that position for (admirable, maybe justifiable) political rather than truthseeking reasons. It’s very convenient. It lets you advocate for treaties against racing. It’s a lovely story where it’s simply rational for humanity to come together to fight a shared adversary and in the process somewhat inevitably forge a new infrastructure of peace (an international safety project, which I have always advocated for and still want) together. And the alternative is racing and potentially a drone war between major powers and all of its corrupting traumas, so why would any of us want to entertain doubt about that story in a public forum?
Or maybe the story is just true, who knows.
(no one knows, because the lens through which we see it has an agenda, as every loving thing does, and there don’t seem to be any other lenses of comparable quality to cross-reference it against)

To answer: Rough outline of my argument for tractability: Optimizers are likely to be built first as cooperatives of largely human imitation learners, techniques to make them incapable of deception seem likely to work and that would basically solve the whole safety issue. This has been kinda obvious for like 3 years at this point and many here haven’t updated on it. It doesn’t take P(Doom) to zero, but it does take it low enough that the people in government who make decisions about AI legislation, and a certain segment of the democrat base^[1] are starting to wonder if you’re exaggerating your P(Doom), and why that might be. And a large part of the reasons you might be doing that are things they will never be able to understand (CEV), so they’ll paint paranoia into that void instead (mostly they’ll write you off with “these are just activist hippies”/”These are techbro hypemen” respectively, and eventually it could get much more toxic, “these are sinister globalists”/”these are omelasian torturers”).
1. ^
  All metrics indicate that it’s probably small but for some reason I encounter this segment everywhere I go online and often in person. I think it’s going to be a recurring pattern. There may be another democratic term shortly before the end.

mako yass Apr 14, 2025, 1:16 AM
5 points
2
in reply to: Benjamin Schmidt’s comment on: Why does LW not put much more focus on AI governance and outreach?
In watching interactions with external groups, I’m… very aware of the parts of our approach to the alignment problem that the public, ime, due to specialization being a real thing, actually cannot understand, so success requires some amount of uh, avoidance. I think it might not be incidental that the platform does focus (imo excessively) on more productive, accessible common enemy questions like control and moratorium, ahead of questions like “what is CEV and how do you make sure the lead players implement it”. And I think to justify that we’ve been forced to distort some of our underlying beliefs about how relatively important the common enemy questions still are relative to the CEV questions.
I’m sure that many at MIRI disagree with me on the relative importance of those questions, but I’m increasingly suspecting it’s not because they understand something about the trajectory of AI that I don’t, and that it’s really because they’ve been closer to the epicenter of an avoidant discourse.
In my root reply I implied that lesswrong is too open/contrarian/earnest to entertain that kind of politically expedient avoidance, on reflection, I don’t think that ever could have been true^[1]. I think some amount of avoidance may have been inside the house for a long time.
And this isn’t a minor issue because I’m noticing that most external audiences, when they see us avoiding those questions, freak out immediately, and assume we’re doing it for sinister reasons (which is not the case^[2], at least so far!) and then they start painting their own monsters into that void.
It’s a problem you might not encounter much as long as you can control the terms of the conversation, but as you gain prominence, you lose more and more control over the kinds of conversations you have to engage in, the world will pick at your softest critical parts. And from our side of things it might seem malicious for them to pick at those things. I think in earlier cases it has been malicious. But at this point I’m seeing the earnest ones start to do it too.
1. ^
  “Just Tell The Truth” wasn’t ever really a principle anyone could implement. Bayesians don’t have access to ultimate truths, ultimate truths are for logical omnisciences, when bayesians talk to each other, the best we can do is convey part of the truth. We make choices about which parts to convey and when. If we’re smart, we limit ourselves to conveying truths that we believe the reader is ready to receive. That inherently has a lot of tact to it, and looking back, I think a worrying amount of tact has been exercised.
2. ^
  The historical reasons were good, generalist optimizers seemed likelier as candidates for the first superintelligences and the leading research orgs all seemed to be earnest utopian cosmopolitan humanists. I can argue that the first assumption is no longer overwhelmingly likely (shall I?) and the latter assumption is obviously pretty dubious at this point.

mako yass Apr 13, 2025, 12:23 AM
11 points
−5
on: Why does LW not put much more focus on AI governance and outreach?
Rationalist discourse norms require a certain amount of tactlessness, saying what is true even when the social consequences of saying it are net negative. Politics (in the current arena) requires some degree of deception or at least complicity with bias (lies by ommision, censorship/nonpropagation of inconvenient counterevidence).
Rationalist forum norms essentially forbid speaking in ways that’re politically effective. Those engaging in political outreach would be best advised to read lesswrong but never comment under their real name. If they have good political instincts, they’d probably have no desire to.
It’s conceivable that you could develop an effective political strategy in a public forum under rationalist discourse norms, but if it is true it’s not obviously true, because it means putting the source code of a deceptive strategy out there in public, and that’s scary.

mako yass Apr 9, 2025, 2:27 AM
3 points
0
in reply to: RHollerith’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?
For the US to undertake such a shift, it would help if you could convince them they’d do better in a secret race than an open one. There are indications that this may be possible, and there are indications that it may be impossible.
I’m listening to an Ecosystemics Futures podcast episode, which, to characterize… it’s a podcast where the host has to keep asking guests whether the things they’re saying are classified or not just in case she has to scrub it. At one point, Lue Elizondo does assert, in the context of talking to a couple of other people who know a lot about government secrets and in the context of talking about situations where excessive secrecy may be doing a lot of harm, quoting Chris Mellon, “We won the cold war against the soviet union not because we were better at keeping secrets, we won the cold war because we knew how to move information and secrets more efficiently across the government than the russians.” I can believe the same thing could potentially be said about China too, censorship cultures don’t seem to be good for ensuring availability of information, so that might be a useful claim if you ever want to convince the US to undertake this.
Right now, though, Vance has asserted straight out many times that working in the open is where the US’s advantage is. That’s probably not true at all, working in the open is how you give your advantage away or at least make it ephemeral, but that’s the sentiment you’re going to be up against over the next four years.

mako yass Apr 7, 2025, 11:44 PM
2 points
0
in reply to: Gunnar_Zarncke’s comment on: Peacewagers so Far
- I’ll change a line early on in the manual to “Objects aren’t common, currently. It’s just corpses for now, which are explained on the desire cards they’re relevant to and don’t matter otherwise”. Would that address it? (the card is A Terrible Hunger, which also needs to be changed to “a terrible hunger.\n4 points for every corpse in your possession at the end (killing generally always leaves a corpse, corpses can be carried; when agents are in the same land as a corpse, they can move it along with them as they move)”)
- What’s this in response to?
- Latter. Unsure where to slot this into the manual. And I’m also kind of unsatisfied with this approach. I think it’s important that players value something beyond their own survival, but also it’s weird that they don’t intrinsically value their survival at all. I could add a rule that survival is +4 points for each agent, but I think not having that could also be funny? Like players pledging their flesh to cannibal players by the end of the game and having to navigate the trust problems of that? So I’d want to play a while before deciding.

mako yass Apr 3, 2025, 11:09 PM
2 points
0
in reply to: Alexander Gietelink Oldenziel’s comment on: MakoYass’s Shortform
I briefly glanced at wikipedia and there seemed to be two articles supporting it. This one might be the one I’m referring to (if not, it’s a bonus) and this one seems to suggest that conscious perception has been trained.

mako yass Apr 3, 2025, 12:18 AM
3 points
1
in reply to: testingthewaters’s comment on: testingthewaters’s Shortform
I think unpacking that kind of feeling is valuable, but yeah it seems like you’ve been assuming we use decision theory to make decisions, when we actually use it as an upper bound model to derive principles of decisionmaking that may be more specific to human decisionmaking, or to anticipate the behavior of idealized agents, or (the distinction between CDT and FDT) as an allegory for toxic consequentialism in humans.

mako yass Apr 3, 2025, 12:09 AM
12 points
0
on: MakoYass’s Shortform
I’m aware of a study that found that the human brain clearly responds to changes in direction of the earth’s magnetic field (iirc, the test chamber isolated the participant from the earth’s field then generated its own, then moved it, while measuring their brain in some way) despite no human having ever been known to consciously perceive the magnetic field/have the abilities of a compass.
So, presumably, compass abilities could be taught through a neurofeedback training exercise.
I don’t think anyone’s tried to do this (“neurofeedback magnetoreception” finds no results)
But I guess the big mystery is why don’t humans already have this.

mako yass Mar 30, 2025, 10:56 PM
3 points
−2
in reply to: mako yass’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?
A relevant FAQ entry: AI development might go underground
I think I disagree here:
By tracking GPU sales, we can detect large-scale AI development. Since frontier model GPU clusters require immense amounts of energy and custom buildings, the physical infrastructure required to train a large model is hard to hide.
This will change/is only the case for frontier development. I also think we’re probably in the hardware overhang. I don’t think there is anything inherently difficult to hide about AI, that’s likely just a fact about the present iteration of AI.
But I’d be very open to more arguments on this. I guess… I’m convinced there’s a decent chance that an international treaty would be enforceable and that China and France would sign onto it if the US was interested, but the risk of secret development continuing is high enough for me that it doesn’t seem good on net.

mako yass Mar 30, 2025, 10:41 PM
21 points
5
on: Why do many people who care about AI Safety not clearly endorse PauseAI?
Personally, because I don’t believe the policy in the organization’s name is viable or helpful.
As to why I don’t think it’s viable, it would require the Trump-Vance administration to organise a strong global treaty to stop developing a technology that is currently the US’s only clear economic lead over the rest of the world.
If you attempted a pause, I think it wouldn’t work very well and it would rupture and leave the world in a worse place: Some AI research is already happening in a defence context. This is easy to ignore while defence isn’t the frontier. The current apparent absence of frontier AI research in a military context is miraculous, strange, and fragile. If you pause in the private context (which is probably all anyone could do) defence AI will become the frontier in about three years, and after that I don’t think any further pause is possible because it would require a treaty against secret military technology R&D. Military secrecy is pretty strong right now. Hundreds of billions yearly is known to be spent on mostly secret military R&D, probably more is actually spent.
(to be interested in a real pause, you have to be interested in secret military R&D. So I am interested in that, and my position right now is that it’s got hands you can’t imagine)
To put it another way, after thinking about what pausing would mean, it dawned on me that pausing means moving AI underground, and from what I can tell that would make it much harder to do safety research or to approach the development of AI with a humanitarian perspective. It seems to me like the movement has already ossified a slogan that makes no sense in light of the complex and profane reality that we live in, which is par for the course when it comes to protest activism movements.

mako yass Mar 30, 2025, 9:57 PM
3 points
0
in reply to: MichaelDickens’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?
I notice they have a Why do you protest section in their FAQ. I hadn’t heard of these studies before
- Protests can and often will positively influence public opinion, voting behavior, corporate behavior and policy.
- There is no evidence for a “backfire” effect unless the protest is violent. Our protests are peaceful and non-violent.
- Check out this amazing article for more insights on why protesting works
Regardless, I still think there’s room to make protests cooler and more fun and less alienating, and when I mentioned this to them they seemed very open to it.

mako yass Mar 28, 2025, 10:30 PM
2 points
0
in reply to: Isopropylpod’s comment on: Third-wave AI safety needs sociopolitical thinking
Yeah, I’d seen this. The fact that grok was ever consistently saying this kind of thing is evidence, though not proof, that they actually may have a culture of generally not distorting its reasoning, they could have introduced propaganda policies at training time, it seems like they haven’t done that, instead they decided to just insert some pretty specific prompts that, I’d guess, were probably going to be temporary.
It’s real bad, but it’s not bad enough for me to shoot yet.

mako yass Mar 27, 2025, 9:53 PM
4 points
0
in reply to: Isopropylpod’s comment on: Third-wave AI safety needs sociopolitical thinking
There is evidence, literal written evidence, of Musk trying to censor Grok from saying bad things about him
I’d like to see this

mako yass Mar 24, 2025, 7:15 AM
4 points
0
in reply to: Gunnar_Zarncke’s comment on: Elizabeth’s Shortform
I wonder if maybe these readers found the story at that time as a result of first being bronies, and I wonder if bronies still think of themselves as a persecuted class.

mako yass Mar 22, 2025, 2:52 AM
2 points
0
on: [Error communicating with LW2 server]
IIRC, aisafety.info is primarily maintained by Rob Miles, so should be good: https://aisafety.info/how-can-i-help

mako yass Mar 22, 2025, 2:45 AM
2 points
0
on: [Error communicating with LW2 server]
I’m certain that better resources will arrive but I do have a page for people asking this question on my site, the “what should we do” section. I don’t think these are particularly great recommendations (I keep changing them) but it has something for everyone.

mako yass Mar 21, 2025, 5:46 PM
1 point
1
in reply to: cubefox’s comment on: A Critique of “Utility”
These are not concepts of utility that I’ve ever seen anyone explicitly espouse, especially not here, the place to which it was posted.