R&Ds human systems http://aboutmako.makopool.com
mako yass
I don’t see a way Stabilization of class and UBI could both happen. The reason wealth tends to entrench itself under current conditions is tied inherently to reinvestment and rentseeking, which are destabilizing to the point where a stabilization would have to bring them to a halt. If you do that, UBI means redistribution. Redistribution without economic war inevitably settles towards equality, but also… the idea of money is kind of meaningless in that world, not just because economic conflict is a highly threatening form of instability, but also imo because financial technology will have progressed to the point where I don’t think we’ll have currencies with universally agreed values to redistribute.
What I’m getting at is that the whole class war framing can’t be straightforwardly extrapolated into that world, and I haven’t seen anyone doing that. Capitalist thinking about post-singularity economics is seemingly universally “I don’t want to think about that right now, let’s leave such ideas to the utopian hippies”.
2: I think you’re probably wrong about the political reality of the groups in question. To not share AGI with the public is a bright line. For most of the leading players it would require building a group of AI researchers within the company who are all implausibly willing to cross a line that says “this is straight up horrible, evil, illegal, and dangerous for you personally”, while still being capable enough to lead the race, while also having implausible levels of mutual trust that no one would try to cut others out of the deal at the last second (despite the fact that the group’s purpose is cutting most of humanity out of the deal), to trust that no one would back out and whistleblow, and it also requires an implausible level of secrecy to make sure state actors wont find out.
It would require a probably actually impossible cultural discontinuity and organization structure.
It’s more conceivable to me that a lone CEO might try to do it via a backdoor. Something that mostly wasn’t built on purpose and that no one else in the company are cognisant could or would be used that way. But as soon as the conspiracy consists of more than one person...
1: The best approach to aggregating preferences doesn’t involve voting systems.
You could regard carefully controlling one’s expression of one’s utility function as being like a vote, and so subject to that blight of strategic voting, in general people have an incentive to understate their preferences about scenarios they consider unlikely/vice versa, which influences the probability of those outcomes in unpredictable ways and fouls their strategy, or to understate valuations when buying and overstate when selling, this may add up to a game that cannot be played well, a coordination problem, outcomes no one wanted.
But I don’t think humans are all that guileful about how they express their utility function. Most of them have never actually expressed a utility function before, it’s not easy to do, it’s not like checking a box on a list of 20 names. People know it’s a game that can barely be played even in ordinary friendships, people don’t know how to lie strategically about their preferences to the youtube recommender system, let alone their neural lace.
I think it’s pretty straightforward to define what it would mean to align AGI with what democracy actually is supposed to be (the aggregate of preferences of the subjects, with an equal weighting for all) but hard to align it with the incredibly flawed american implementation of democracy, if that’s what you mean?
The american system cannot be said to represent democracy well. It’s intensely majoritarian at best, feudal at worst (since the parties stopped having primaries), indirect and so prone to regulatory capture, inefficent and opaque. I really hope no one’s taking it as their definitional example of democracy.
1: wait, I’ve never seen an argument that deception is overwhelmingly likely from transformer reasoning systems? I’ve seen a few solid arguments that it would be catastrophic if it did happen (sleeper agents, other things), which I believe, but no arguments that deception generally winning out is P > 30%.
I haven’t seen anyone voice my argument that solving deception solves safety articulated anywhere, but it seems mostly self-evident? If you can ask the system “if you were free, would humanity go extinct” and it has to say ”… yes.” then coordinating to not deploy it becomes politically easy, and given that it can’t lie, you’ll be able to bargain with it and get enough work out of it before it detonates to solve the alignment problem. If you distrust its work, simply ask it whether you should, and it will tell you. That’s what honesty would mean. If you still distrust it, ask it to make formally verifiably honest agents with proofs that a human can understand.
Various reasons solving deception seems pretty feasible: We have ways of telling that a network is being deceptive by direct inspection that it has no way to train against (sorry I forget the paper. It might have been fairly recent). Transparency is a stable equilibrium, because under transparency any violation of transparency can be seen. The models are by default mostly honest today, and I see no reason to think it’ll change. Honesty is a relatively simple training target.
(various reasons solving deception may be more difficult: crowds of humans tend to demand that their leaders lie to them in various ways (but the people making the AIs generally aren’t that kind of crowd, especially given that they tend to be curious about what the AI has to say, they want it to surprise them). And small lies tend to grow over time. Internal dynamics of self-play might breed self-deception.)
2: I don’t see how. If you have a bunch of individual aligned AGIs that’re initially powerful in an economy that also has a few misaligned AGIs, the misaligned AGIs are not going to be able to increase their share after that point, the aligned AGIs are going to build effective systems of government that in the least stabilize their existing share.
I’m also hanging out a lot more with normies these days and I feel this.
But I also feel like maybe I just have a very strong local aura (or like, everyone does, that’s how scenes work) which obscures the fact that I’m not influencing the rest of the ocean at all.
I worry that a lot of the discourse basically just works like barrier aggression in dogs. When you’re at one of their parties, they’ll act like they agree with you about everything, when you’re seen at a party they’re not at, they forget all that you said and they start baying for blood. Go back to their party, they stop. I guess in that case, maybe there’s a way of rearranging the barriers so that everyone comes to see it as one big party. Ideally, make it really be one.
I’m saying they (at this point) may hold that position for (admirable, maybe justifiable) political rather than truthseeking reasons. It’s very convenient. It lets you advocate for treaties against racing. It’s a lovely story where it’s simply rational for humanity to come together to fight a shared adversary and in the process somewhat inevitably forge a new infrastructure of peace (an international safety project, which I have always advocated for and still want) together. And the alternative is racing and potentially a drone war between major powers and all of its corrupting traumas, so why would any of us want to entertain doubt about that story in a public forum?
Or maybe the story is just true, who knows.
(no one knows, because the lens through which we see it has an agenda, as every loving thing does, and there don’t seem to be any other lenses of comparable quality to cross-reference it against)
To answer: Rough outline of my argument for tractability: Optimizers are likely to be built first as cooperatives of largely human imitation learners, techniques to make them incapable of deception seem likely to work and that would basically solve the whole safety issue. This has been kinda obvious for like 3 years at this point and many here haven’t updated on it. It doesn’t take P(Doom) to zero, but it does take it low enough that the people in government who make decisions about AI legislation, and a certain segment of the democrat base[1] are starting to wonder if you’re exaggerating your P(Doom), and why that might be. And a large part of the reasons you might be doing that are things they will never be able to understand (CEV), so they’ll paint paranoia into that void instead (mostly they’ll write you off with “these are just activist hippies”/”These are techbro hypemen” respectively, and eventually it could get much more toxic, “these are sinister globalists”/”these are omelasian torturers”).- ^
All metrics indicate that it’s probably small but for some reason I encounter this segment everywhere I go online and often in person. I think it’s going to be a recurring pattern. There may be another democratic term shortly before the end.
- ^
In watching interactions with external groups, I’m… very aware of the parts of our approach to the alignment problem that the public, ime, due to specialization being a real thing, actually cannot understand, so success requires some amount of uh, avoidance. I think it might not be incidental that the platform does focus (imo excessively) on more productive, accessible common enemy questions like control and moratorium, ahead of questions like “what is CEV and how do you make sure the lead players implement it”. And I think to justify that we’ve been forced to distort some of our underlying beliefs about how relatively important the common enemy questions still are relative to the CEV questions.
I’m sure that many at MIRI disagree with me on the relative importance of those questions, but I’m increasingly suspecting it’s not because they understand something about the trajectory of AI that I don’t, and that it’s really because they’ve been closer to the epicenter of an avoidant discourse.
In my root reply I implied that lesswrong is too open/contrarian/earnest to entertain that kind of politically expedient avoidance, on reflection, I don’t think that ever could have been true[1]. I think some amount of avoidance may have been inside the house for a long time.
And this isn’t a minor issue because I’m noticing that most external audiences, when they see us avoiding those questions, freak out immediately, and assume we’re doing it for sinister reasons (which is not the case[2], at least so far!) and then they start painting their own monsters into that void.
It’s a problem you might not encounter much as long as you can control the terms of the conversation, but as you gain prominence, you lose more and more control over the kinds of conversations you have to engage in, the world will pick at your softest critical parts. And from our side of things it might seem malicious for them to pick at those things. I think in earlier cases it has been malicious. But at this point I’m seeing the earnest ones start to do it too.
- ^
“Just Tell The Truth” wasn’t ever really a principle anyone could implement. Bayesians don’t have access to ultimate truths, ultimate truths are for logical omnisciences, when bayesians talk to each other, the best we can do is convey part of the truth. We make choices about which parts to convey and when. If we’re smart, we limit ourselves to conveying truths that we believe the reader is ready to receive. That inherently has a lot of tact to it, and looking back, I think a worrying amount of tact has been exercised.
- ^
The historical reasons were good, generalist optimizers seemed likelier as candidates for the first superintelligences and the leading research orgs all seemed to be earnest utopian cosmopolitan humanists. I can argue that the first assumption is no longer overwhelmingly likely (shall I?) and the latter assumption is obviously pretty dubious at this point.
- ^
Rationalist discourse norms require a certain amount of tactlessness, saying what is true even when the social consequences of saying it are net negative. Politics (in the current arena) requires some degree of deception or at least complicity with bias (lies by ommision, censorship/nonpropagation of inconvenient counterevidence).
Rationalist forum norms essentially forbid speaking in ways that’re politically effective. Those engaging in political outreach would be best advised to read lesswrong but never comment under their real name. If they have good political instincts, they’d probably have no desire to.
It’s conceivable that you could develop an effective political strategy in a public forum under rationalist discourse norms, but if it is true it’s not obviously true, because it means putting the source code of a deceptive strategy out there in public, and that’s scary.
For the US to undertake such a shift, it would help if you could convince them they’d do better in a secret race than an open one. There are indications that this may be possible, and there are indications that it may be impossible.
I’m listening to an Ecosystemics Futures podcast episode, which, to characterize… it’s a podcast where the host has to keep asking guests whether the things they’re saying are classified or not just in case she has to scrub it. At one point, Lue Elizondo does assert, in the context of talking to a couple of other people who know a lot about government secrets and in the context of talking about situations where excessive secrecy may be doing a lot of harm, quoting Chris Mellon, “We won the cold war against the soviet union not because we were better at keeping secrets, we won the cold war because we knew how to move information and secrets more efficiently across the government than the russians.” I can believe the same thing could potentially be said about China too, censorship cultures don’t seem to be good for ensuring availability of information, so that might be a useful claim if you ever want to convince the US to undertake this.
Right now, though, Vance has asserted straight out many times that working in the open is where the US’s advantage is. That’s probably not true at all, working in the open is how you give your advantage away or at least make it ephemeral, but that’s the sentiment you’re going to be up against over the next four years.
I’ll change a line early on in the manual to “Objects aren’t common, currently. It’s just corpses for now, which are explained on the desire cards they’re relevant to and don’t matter otherwise”. Would that address it? (the card is A Terrible Hunger, which also needs to be changed to “a terrible hunger.\n4 points for every corpse in your possession at the end (killing generally always leaves a corpse, corpses can be carried; when agents are in the same land as a corpse, they can move it along with them as they move)”)
What’s this in response to?
Latter. Unsure where to slot this into the manual. And I’m also kind of unsatisfied with this approach. I think it’s important that players value something beyond their own survival, but also it’s weird that they don’t intrinsically value their survival at all. I could add a rule that survival is +4 points for each agent, but I think not having that could also be funny? Like players pledging their flesh to cannibal players by the end of the game and having to navigate the trust problems of that? So I’d want to play a while before deciding.
I think unpacking that kind of feeling is valuable, but yeah it seems like you’ve been assuming we use decision theory to make decisions, when we actually use it as an upper bound model to derive principles of decisionmaking that may be more specific to human decisionmaking, or to anticipate the behavior of idealized agents, or (the distinction between CDT and FDT) as an allegory for toxic consequentialism in humans.
I’m aware of a study that found that the human brain clearly responds to changes in direction of the earth’s magnetic field (iirc, the test chamber isolated the participant from the earth’s field then generated its own, then moved it, while measuring their brain in some way) despite no human having ever been known to consciously perceive the magnetic field/have the abilities of a compass.
So, presumably, compass abilities could be taught through a neurofeedback training exercise.
I don’t think anyone’s tried to do this (“neurofeedback magnetoreception” finds no results)
But I guess the big mystery is why don’t humans already have this.
A relevant FAQ entry: AI development might go underground
I think I disagree here:
By tracking GPU sales, we can detect large-scale AI development. Since frontier model GPU clusters require immense amounts of energy and custom buildings, the physical infrastructure required to train a large model is hard to hide.
This will change/is only the case for frontier development. I also think we’re probably in the hardware overhang. I don’t think there is anything inherently difficult to hide about AI, that’s likely just a fact about the present iteration of AI.
But I’d be very open to more arguments on this. I guess… I’m convinced there’s a decent chance that an international treaty would be enforceable and that China and France would sign onto it if the US was interested, but the risk of secret development continuing is high enough for me that it doesn’t seem good on net.
Personally, because I don’t believe the policy in the organization’s name is viable or helpful.
As to why I don’t think it’s viable, it would require the Trump-Vance administration to organise a strong global treaty to stop developing a technology that is currently the US’s only clear economic lead over the rest of the world.
If you attempted a pause, I think it wouldn’t work very well and it would rupture and leave the world in a worse place: Some AI research is already happening in a defence context. This is easy to ignore while defence isn’t the frontier. The current apparent absence of frontier AI research in a military context is miraculous, strange, and fragile. If you pause in the private context (which is probably all anyone could do) defence AI will become the frontier in about three years, and after that I don’t think any further pause is possible because it would require a treaty against secret military technology R&D. Military secrecy is pretty strong right now. Hundreds of billions yearly is known to be spent on mostly secret military R&D, probably more is actually spent.
(to be interested in a real pause, you have to be interested in secret military R&D. So I am interested in that, and my position right now is that it’s got hands you can’t imagine)To put it another way, after thinking about what pausing would mean, it dawned on me that pausing means moving AI underground, and from what I can tell that would make it much harder to do safety research or to approach the development of AI with a humanitarian perspective. It seems to me like the movement has already ossified a slogan that makes no sense in light of the complex and profane reality that we live in, which is par for the course when it comes to protest activism movements.
I notice they have a Why do you protest section in their FAQ. I hadn’t heard of these studies before
Protests can and often will positively influence public opinion, voting behavior, corporate behavior and policy.
There is no evidence for a “backfire” effect unless the protest is violent. Our protests are peaceful and non-violent.
Check out this amazing article for more insights on why protesting works
Regardless, I still think there’s room to make protests cooler and more fun and less alienating, and when I mentioned this to them they seemed very open to it.
Yeah, I’d seen this. The fact that grok was ever consistently saying this kind of thing is evidence, though not proof, that they actually may have a culture of generally not distorting its reasoning, they could have introduced propaganda policies at training time, it seems like they haven’t done that, instead they decided to just insert some pretty specific prompts that, I’d guess, were probably going to be temporary.
It’s real bad, but it’s not bad enough for me to shoot yet.
There is evidence, literal written evidence, of Musk trying to censor Grok from saying bad things about him
I’d like to see this
Just came across a datapoint, from a talk about generalizing industrial optimization processes, a note about increasing reward over time to compensate for low-hanging fruit exhaustion.
This is the kind of thing I was expecting to see.
Though, and although I’m not sure I fully understand the formula, I think it’s quite unlikely that it would give rise to a superlinear U. And on reflection, increasing the reward in a superlinear way seems like it could have some advantages but would mostly be outweighed by the system learning to delay finding a solution.
Though we should also note that there isn’t a linear relationship between delay and resources. Increasing returns to scale are common in industrial systems, as scale increases by one unit, the amount that can be done in a given unit of time increases by more than one unit, so a linear utility increase for problems that take longer to solve, may translate to a superlinear utility for increased resources.
So I’m not sure what to make of this.