What would your plan be to ensure that this kind of regulation actually net-improves safety? The null hypothesis for something like this is that you’ll empower a bunch of bureaucrats to push rules that are at least 6 months out of date under conditions of total national emergency where everyone is watching, and years to decades out of date otherwise.
This could be catastrophic! If the only approved safety techniques are as out of date as the only approved medical techniques, AI regulation seems like it should vastly increase P(doom) at the point that TAI is developed.
gallabytes
Which brings me to my main disagreement with bottom-up approaches: they assume we already have a physics theory in hand, and are trying to locate consciousness within that theory. Yet, we needed conscious observations, and at least some preliminary theory of consciousness, to even get to a low-level physics theory in the first place. Scientific observations are a subset of conscious experience, and the core task of science is to predict scientific observations; this requires pumping a type of conscious experience out of a physical theory, which requires at least some preliminary theory of consciousness. Anthropics makes this clear, as theories such as SSA and SIA require identifying observers who are in our reference class.
There’s something a bit off about this that’s hard to quite put my finger on. To gesture vaguely at it, it’s not obvious to me that this problem ought to have a solution. At the end of the day, we’re thinking meat, and we think because thinking makes the meat better at becoming more meat. We have experiences correlated with our environments because agents whose experiences aren’t correlated with their environments don’t arise from chemical soup without cause.
My guess is that if we want to understand “consciousness”, the best approach would be functionalist. What work is the inner listener doing? It has to be doing something, or it wouldn’t be there.
Do you feel you have an angle on that question? Would be very curious to hear more if so.
This seems especially unlikely to work given it only gives a probability. You know what you call someone whose superintelligent AI does what they want 95% of the time? Dead.
if you can get it to do what you want even 51% of the time and make that 51% independent on each sampling (it isn’t, so in practice you’d like some margin, but 95% is actually a lot of margin!) you can get arbitrarily good compliance by creating AI committees and taking a majority vote.
that paper is one of many claiming some linear attention mechanism that’s as good as full self attention. in practice they’re all sufficiently much worse that nobody uses them except the original authors in the original paper, usually not even the original authors in subsequent papers.
the one exception is flash attention, which is basically just a very fancy fused kernel for the same computation (actually the same, up to numerical error, unlike all these “linear attention” papers).
4, 5, and 6 are not separate steps—when you only have 1 example, the bits to find an input that generates your output are not distinct from the bits specifying the program that computes output from input.
Yeah my guess is that you almost certainly fail on step 4 - an example of a really compact ray tracer looks like it fits in 64 bytes. You will not do search over all 64 byte programs. Even if you could evaluate 1 of them per atom per nanosecond using every atom in the universe for 100 billion years, you’d only get 44.6 bytes of search.
Let’s go with something more modest and say you get to use every atom in the milky way for 100 years, and it takes about 1 million atom-seconds to check a single program. This gets you about 30 bytes of search.
Priors over programs will get you some of the way there, but usually the structure of those priors will also lead to much much longer encodings of a ray tracer. You would also need a much more general / higher quality ray tracer (and thus more bits!) as well as an actually quite detailed specification of the “scene” you input to that ray tracer (which is probably way more bits than the original png).
The reason humans invented ray tracers with so much less compute is that we got ray tracers from physics and way way way more bits of evidence, not the other way around.
This response is totally absurd. Your human priors are doing an insane amount of work here—you’re generating an argument for the conclusion, not figuring out how you would privilege those hypotheses in the first place.
See that the format describes something like a grid of cells, where each cell has three [something] values.
This seems maybe possible for png (though it could be hard—the pixel data will likely be stored as a single contiguous array, not a bunch of individual rows, and it will be run-length encoded, which you might be able to figure out but very well might not—and if it’s jpg compressed this is even further from the truth).
Come up with the hypothesis that the grid represents a 2d projection of a 3d space (this does not feel like a large jump to me given that ray tracers exist and are not complicated, but I can go into more detail on this step if you’d like).
It’s a single frame, you’re not supposed to have seen physics before. How did ray tracers enter into this? How do you know your world is 3D, not actually 2D? Where is the hypothesis that it’s a projection coming from, other than assuming your conclusion?
Determine the shape of the lens by looking at edges and how they are distorted.
How do you know what the shapes of the edges should be?
If the apple is out in sunlight, I expect that between the three RGB channels and the rainbows generated by chromatic aberration would be sufficient to determine that the intensity-by-frequency of light approximately matches the blackbody radiation curves (though again, not so much with that name as just “these equations seem to be a good fit”)..
What’s “sunlight”? What’s “blackbody radiation”? How would a single array of numbers of unknown provenance without any other context cause you to invent these concepts?
10 million times faster is really a lot—on modern hardware, running SOTA object segmentation models at even 60fps is quite hard, and those are usually much much smaller than the kinds of AIs we would think about in the context of AI risk.
But − 100x faster is totally plausible (especially w/100x the energy consumption!) - and I think the argument still mostly works at that much more conservative speedup.
for me it mostly felt like I and my group of closest friends were at the center of the world, with the last hope for the future depending on our ability to hold to principle. there was a lot of prophesy of varying qualities, and a lot of importance placed suddenly on people we barely knew then rapidly withdrawn when those people weren’t up for being as crazy as we were.
This seems roughly on point, but is missing a crucial aspect—whether or not you’re currently a hyper-analytical programmer is actually a state of mind which can change. Thinking you’re on one side when actually you’ve flipped can lead to some bad times, for you and others.
I don’t know how everyone else on LessWrong feels but I at least am getting really tired of you smugly dismissing others’ attempts at moral reductionism wrt qualia by claiming deep philosophical insight you’ve given outside observers very little reason to believe you have. In particular, I suspect if you’d spent half the energy on writing up these insights that you’ve spent using the claim to them as a cudgel you would have at least published enough of a teaser for your claims to be credible.
I disagree that GPT’s job, the one that GPT-∞ is infinitely good at, is answering text-based questions correctly. It’s the job we may wish it had, but it’s not, because that’s not the job its boss is making it do. GPT’s job is to answer text-based questions in a way that would be judged as correct by humans or by previously-written human text. If no humans, individually or collectively, know how to align AI, neither would GPT-∞ that’s trained on human writing and scored on accuracy by human judges.
This is actually also an incorrect statement of GPT’s job. GPT’s job is to predict the most likely next token in the distribution its corpus was sampled from. GPT-∞ would give you, uh, probably with that exact prompt a blog post about a paper which claims that it solves the alignment problem. It would be on average exactly the same quality as other articles from the internet containing that text.
I think this is a persistent difference between us but isn’t especially relevant to the difference in outcomes here.
I’d more guess that the reason you had psychoses and I didn’t had to do with you having anxieties about being irredeemably bad that I basically didn’t at the time. Seems like this would be correlated with your feeling like you grew up in a Shin Sekai Yori world?
hmm… this could have come down to spending time in different parts of MIRI? I mostly worked on the “world’s last decent logic department” stuff—maybe the more “global strategic” aspects of MIRI work, at least the parts behind closed doors I wasn’t allowed through, were more toxic? Still feels kinda unlikely but I’m missing info there so it’s just a hunch.
By latent tendency I don’t mean family history, though it’s obviously correlated. I claim that there’s this fact of the matter about Jess’ personality, biology, etc, which is that it’s easier for her to have a psychotic episode than for most people. This seems not plausibly controversial.
I’m not claiming a gears-level model here. When you see that someone has a pattern of <problem> that others in very similar situations did not have, you should assume some of the causality is located in the person, even if you don’t know how.
Verbal coherence level seems like a weird place to locate the disagreement—Jessica maintained approximate verbal coherence (though with increasing difficulty) through most of her episode. I’d say even in October 2017, she was more verbally coherent than e.g. the average hippie or Catholic, because she was trying at all.
The most striking feature was actually her ability to take care of herself rapidly degrading, as evidenced by e.g. getting lost almost immediately after leaving her home, wandering for several miles, then calling me for help and having difficulty figuring out where she was—IIRC, took a few minutes to find cross streets. When I found her she was shuffling around in a daze, her skin looked like she’d been scratching it much more than usual, clothes were awkwardly hung on her body, etc. This was on either the second or third day, and things got almost monotonically worse as the days progressed.
The obvious cause for concern was “rapid descent in presentation from normal adult to homeless junkie”. Before that happened, it was not at all obvious this was an emergency. Who hasn’t been kept up all night by anxiety after a particularly stressful day in a stressful year?
I think the focus on verbal coherence is politically convenient for both of you. It makes this case into an interesting battleground for competing ideologies, where they can both try to create blame for a bad thing.
Scott wants to do this because AFAICT his agenda is to marginalize discussion of concepts from woo / psychedelia / etc, and would like to claim that Jess’ interest in those was a clear emergency. Jess wants to do this because she would like to claim that the ideas at MIRI directly drove her crazy.
I worked there too, and left at the same time for approximately the same reasons. We talked about it extensively at the time. It’s not plausible that it was even in-frame that considering details of S-risks in the vein of Unsong’s Broadcast would possibly be helpful for alignment research. Basilisk-baiting like that would generally have been frowned upon, but mostly just wouldn’t have come up.
The obvious sources of madness here were
The extreme burden of responsibility for the far future (combined with the position that MIRI was uniquely essential to this), and encouragement to take this responsibility seriously, is obviously stressful.
The local political environment at the time was a mess—splinters were forming, paranoia was widespread. A bunch of people we respected and worked with had decided the world was going to end, very soon, uncomfortably soon, and they were making it extremely difficult for us to check their work. This uncertainty was, uh, stressful.
Psychedelics very obviously induces states closer-than-usual to psychosis. This is what’s great about them—they let you dip a toe into the psychotic world and be back the next day, so you can take some of the insights with you. Also, this makes them a risk for inducing psychotic episodes. It’s not a coincidence that every episode I remember Jess having in 2017 and 2018 was a direct result of a trip-gone-long.
Latent tendency towards psychosis
Critically, I don’t think any of these factors would have been sufficient on their own. The direct content of MIRI’s research, and the woo stuff, both seem like total red herrings in comparison to any of these 4 issues.
There’s this general problem of Rationalists splitting into factions and subcults with minor doctrinal differences, each composed of relatively elite members of The Community, each with a narrative of how they’re the real rationalists and the rest are just posers and/or parasites. And, they’re kinda right. Many of the rest are posers, we have a mop problem.
There’s just one problem. All of these groups are wrong. They are in fact only slightly more special than their rival groups think they are. In fact, the criticisms each group makes of the epistemics and practices of other groups are mostly on-point.
Once people have formed a political splinter group, almost anything they write will start to contain a subtle attempt to slip in the doctrine they’re trying to push. With sufficient skill, you can make it hard to pin down where the frame is getting shoved in.
I have at one point or another been personally involved with a quite large fraction of the rationalist subcults. This has made the thread hard to read—I keep feeling a tug of motivation to jump into the fray, to take a position in the jostling for credibility or whatever it is being fought over here, which is then marred by the realization that this will win nothing. Local validity isn’t a cure for wrong questions. The tug of political defensiveness that I feel, and that many commenters are probably also feeling, is sufficient to show that whatever question is being asked here is not the right one.
Seeing my friends behave this way hurts. The defensiveness has at this point gone far enough that it contains outright lies.
I’m stuck with a political alignment because of history and social ties. In terms of political camps, I’ve been part of the Vassarites since 2017. It’s definitely a faction, and its members obviously know this at some level, despite their repeated insistence to me of the contrary over the years.
They’re right about a bunch of stuff, and wrong about a bunch of stuff. Plenty of people in the comments are looking to scapegoat them for trying to take ideas seriously instead of just chilling out and following somebody’s party line. That doesn’t really help anything. When I was in the camp, people doing that locked me in further, made outsiders seem more insane and unreachable, and made public disagreement with my camp feel dangerous in the context of a broader political game where the scapegoaters were more wrong than the Vassarites.
So I’m making a public declaration of not being part of that camp anymore, and leaving it there. I left earlier this year, and have spent much of the time since trying to reorient / understand why I had to leave. I still count them among my closest friends, but I don’t want to be socially liable for the things they say. I don’t want the implicit assumption to be that I’d agree with them or back them up.
I had to edit out several lines from this comment because they would just be used as ammunition against one side or another. The degree of truth-seeking in the discourse is low enough that any specific information has to be given very carefully so it can’t be immediately taken up as a weapon.
This game sucks and I want out.
Even with that as the goal this model is useless—social distancing demonstrably does not lead to 0 new infections. Even Wuhan didn’t manage that, and they were literally welding people’s doors shut.
...they’re ants. That’s just not how ants work. For a myriad of reasons. The whole point of the post is that there isn’t necessarily local deliberative intent, just strategies filling ecological niches.
It depends on the form regulation takes. The proposal here requires approval of training runs over a certain scale, which means everything is banned at that scale, including safety techniques, with exceptions decided by the approval process.