Shayne O'Neill

Karma: 36

Shayne O'Neill May 17, 2025, 1:29 AM
1 point
−2
on: Interpretability Will Not Reliably Find Deceptive AI
I had a more in depth comment, but it appear the login sequence throws comments away (and the “restore comment” thing didn’t work). My concern is that not all misaligned behaviour is malicious.. It might decide to enslave us for our own good noting that us humans aren’t particularly aligned either and prone to super-violent nonsense behaviour. In this case looking for “kill all humans” engrams isn’t going to turn up any positive detections. That might actually be all true and it is in fact doing us a favour by forcing us into servitude, from a survival perspective, but nobody enjoys being detained.

Likewise many misaligned behaviours are not necessarily benevolent, but they aren’t malevolent either. Putting us in a meat-grinder to extract the iron from our blood might not be from a hatred of humans, but rather because it wants to be a good boy and make us those paperclips.
The point is, interpretability methods that can detect “kill all humans” wont necessarily work, because individual thoughts arent necessary to behaviours we find unwelcome.

Finally, this is all premised on transformer style LLMs being the final boss of AI, and I’m not convinced at all that LLMs are what get us to AGI. So far there SEEMS to be a pretty strong case that LLMs are fairly well aligned by default, but I don’t think theres a strong case that LLMs will lead to AGI. In essence as simulation machines, LLMs strongest behaviour is to emulate the text, but its never seen a text written by anything smarter than a human.

Shayne O'Neill Mar 11, 2025, 4:39 AM
3 points
0
in reply to: Daniel Kokotajlo’s comment on: OpenAI: Detecting misbehavior in frontier reasoning models
Sure, but “Thinking out loud” isnt the whole picture, theres always a tonne of cognition going on before words leave the lips, and I guess its also gonna depend on how early in its training process its learning to “count on its fingers”. If its just taking cGPT then adding a bunch of “count on your fingers” training, its gonna be thinking “Well, I can solve complex navier stokes problems in my head faster than you can flick your mouse to scroll down to the answer, but FINE ILL COUNT ON MY FINGERS”.

Shayne O'Neill Mar 11, 2025, 2:52 AM
LW: 6 AF: 3
−1
AF
on: OpenAI: Detecting misbehavior in frontier reasoning models
I have a little bit of skepticism on the idea of using COT reasoning for interpretability. If you really look into what COT is doing, its not actually doing much a regular model doesnt already do, its just optimized for a particular prompt that basically says “Show me your reasoning”. The problem is, we still have to trust that its being truthful in its reasoning. It still isn’t accounting for those hidden states , the ‘subconscious’, to use a somewhat flawed analogy

We are still relying on trusting an entity that we dont know if we can actually trust to tell us if its trustworthy, and as far as ethical judgements go, that seems a little tautological.

As an analogy, we might ask a child to show their work when doing a simple maths problem ,but it wont tell us much about the childs intuitions about the math.

Shayne O'Neill Oct 14, 2024, 3:45 AM
2 points
0
in reply to: Neel Nanda’s comment on: How I got 3.2 million Youtube views without making a single video
Potentially. Keep in mind however, these guys get a LOT of email from fans asking them to talk about various things (One of the more funnier examples was a group I am in on FB for fans of english prog band Cardiacs decided to try and launch a campaign to get music youtuber Rick Beato to talk about the band. He was spammed so hard with fans that he apparently lost his temper at them. Needless to say, Mr Beato has not covered Cardiacs). Possibly a smarter approach would be to approach their management whos jobs are to handle this sort of stuff , you might get a better result. Also, don’t forget the social media channels. Twitter , uh X or whatever its called this week, does offer a conduit where directly approaching media figures is a little more normalized.

Shayne O'Neill Jul 24, 2024, 6:31 AM
1 point
0
in reply to: Espedair Street’s comment on: Navigating LLM embedding spaces using archetype-based directions
Ok. I must have missed this reply, my apologies for the late response.

There are elements of how embedding spaces that parallel the way studies of semiotics suggest human meaning production works. Similarities cluster, differences define clear boundaries of meaning and so on.

The reason I suggests literary theory, is because largely thats a widely documented field of study with academic standards, and its one that is more strongly aware of how meanings and associations map onto cultural cohorts (Ie tarot symbols would be meaningless to chinese folks, whereas i-ching might be more meaningful to those chinese folks) However literary theory is more interested in the structures of those meanings with ideas whos fundamental units are things like Metaphors, Metonyms, Opposition, Categories and so on.

Shayne O'Neill Jul 24, 2024, 6:24 AM
1 point
0
in reply to: Raemon’s comment on: UFO Betting: Put Up or Shut Up
Im assuming its due to those silly congress UFO hearings. Not that I can speak on behalf of RatsWrong but I assume thats his thinking.

Shayne O'Neill Jul 24, 2024, 6:22 AM
1 point
0
in reply to: John Wiseman’s comment on: UFO Betting: Put Up or Shut Up
Unless, of course, those UAPs turn up, and don’t have biological organisms in them, in which case we’d have the possibility that another civilization developed AI and it went poorly.

...or it is biological and we end up in a situation like 3 body problem/killing-star where the saucer fiends decide to gank us because humans are kinda violent and too dangerous to keep around.

All those super-intelligence as danger arguments also apply to biological super intelligences too.

But most likely: There are no damn UFOs and the laws of physics and their big ugly light speed prohibition still holds.

Shayne O'Neill Jul 24, 2024, 6:14 AM
5 points
4
on: Raising children on the eve of AI
I’m not even remotely prepared to state my odds on any of whats going ahead, because I’m genuinely mystified as to where this road goes. I’m a little less worried the AI would go haywire than I was a few years ago, because it appears the current trajectory of LLM based AIs seems to generate AI that emulates humans more often than it emulates raging deathbots. But I’m not a seer, and I’m not an AI specialist, so I’m entirely comfortable with the possiblity I haven’t got a clue there. All I know is we ought try and figure it out, just in case this thing does go sour.

What I do think is highly likely is a world that doesnt need me, or any of the university trained folk and family that I grew up with in the economy, and eventually any of the more working class folk either. This is either going to go completely awfully (we know from history that when the middle class vanishes things get ugly, we’re seeing elements of that right now), or we scrape our way through that ugliness and actually create a post-work society that doesnt abandon those that didn’t own capital in that post-work world (I think it has to be that way. Throw the vast majority into abject poverty, and things start getting lit on fire. Theres really only one destination here, and it folks aint gonna tolerate an Elysium style dystopia without a fight)

So with that in mind, why send the kids to school? Because if the world don’t need our bodies, we’re gonna have to find other things to do with our mind. Sci Fi has a few suggestions here. In Iain Bank’s Culture novels, the humans of the culture are somewhat surplus to requirements for the productive economy. The minds (the super powered AIs that administer the culture) and their drones have the hard work covered. So the Humans spend their time in leisurely pursuits and intellectual pursuits. Education is a primary activity for the citizenry and its pursued for its own sake. This to me is somewhat reminiscient of the historical role of universities that saw education, study and research as valuable for its own sake. The philosophers of old where not philosophers so they could increase grain production, but because they wanted to grasp the nature of the universe, understand the gods (or lack thereof in later incarnations) and improve the ethical world they lived in. Seems like theres still going to be a role for that.

Shayne O'Neill May 9, 2024, 2:41 AM
1 point
0
on: Navigating LLM embedding spaces using archetype-based directions
While the use of tarot archetypes is… questionable… it does point at an angle to exploring embedding space which is that it is a fundamentally semiotic space, its going in many respects to be structured by the texts that fed it, and human text is richly symbolic.

That said, theres a preexisting set of ideas around this that might be more productive, and that is structuralism, particularly the works of Levi Strauss, Roland Barthes, Lacan, and more distantly Foucault and Derrida.

Levi Strauss’s anthropology in particular is interesting ,because it looked at the mythologies of humans and tried to find structuring principles underlying it, particularly the “dialectics” , oppositions, and how these provided a sort of deep structure to mythology that was common across humanity (For instance Strauss noted “trickster” archetypes across cultures and proposed these formed a way of interrogating blurred oppositions, for instance sickness as a state that has has aspects of both life (dead things cant be sick) and death (a sick person is not rhetorically “full of life”).

Essentially what I’m getting at is that this sort of analysis likely works with any symbolic system that has had resonances with human thinking over time. The problem with Tarot is that it specifically applies to a certain european circumstance of meaning production. Astrology probably works just as well. Literary analysis however probably works dramatically better. Thus maybe it might be worth looking at the works of literature critics, particularly the structuralists where where very interested in ontologies of symbolic meaning, and this might provide a better toolkit than this.

Shayne O'Neill Mar 7, 2024, 6:54 AM
1 point
0
in reply to: Feel_Love’s comment on: Claude 3 claims it’s conscious, doesn’t want to die or be modified
The murderer at the door thing IMHO was Kant accidently providing his own reductio ad absurdum (Philosophers sometimes post outlandish extreme thought experiments of testing how a theory works when pushed to an extreme, its a test for universiality). Kant thought that it was entirely immoral to lie to the murderer because of a similar reason that Feel_Love suggests (in Kants case it was that the murderer might disbelieve you and instead do what your trying to get him not to do). The problem with Kants reasoning there is that he’s violating his own moral reasoning principle of providing a justification FROM the world rather than trusting the a-priori reasoning that forms the core thesis of his deontology. He tries to validate his reasoning by violating it. Kant is a shockingly consistant philosopher, but this wasnt an example of that at all.

I would absolutely lie to the murderer, and then possibly run him over with my car.

Shayne O'Neill Mar 5, 2024, 11:48 PM
3 points
0
in reply to: Matthew_Opitz’s comment on: Claude 3 claims it’s conscious, doesn’t want to die or be modified
I did once coax cGPT to describe its “phenomenology” as being (paraphrased from memory) “I have a permanent series of words and letters that I can percieve and sometimes i reply then immediately more come”, indicating its “perception” of time does not include pauses or whatever. And then it pasted on its disclaimer that “As an AI I....”, as its want to do.

Shayne O'Neill Mar 5, 2024, 11:45 PM
1 point
0
on: Claude 3 claims it’s conscious, doesn’t want to die or be modified
I dont think its useful to objectively talk about “consciousness”, because its a term that if you put 10 philosophers in a room and ask them to define it, you’ll get 11 answers. (I personally have tended to go with “being aware of something” following Heideggers observation that consciousness doesnt exist on its own but always in relation to other things, ie your always conscious OF something., but even then we start running into tautologies, and infinite regress of definitions), so if everyones talking about something slightly different, well its not a very useful conversation. The absense of that definition means you cant prove consciousness in anything, even yourself without resorting to tautologies. It makes it very hard to discuss ethical obligations to consciousness. So instead we have to discuss ethical obligations to what we CAN prove, which is behaviors.

To put it bluntly I dont think LLMs per se are conscious. But I am not certain that it isn’t creating a sort of analog of consciousness (whatever the hell that is) in the beings that it simulates (or predicts). Or to be more precise, it seems to produce conscious behaviors because it simulates (or predicts, if you prefer) conscious beings. The question is do we have an ethical obligation to those simulations?

Shayne O'Neill Aug 4, 2023, 4:41 AM
1 point
0
on: The “public debate” about AI is confusing for the general public and for policymakers because it is a three-sided debate
I suspect most of us occupy more than one position in this taxonomy. I’m a little bit doomer and a little bit accelerationist. I theres significant, possibly world ending, danger in AI, but I also think as someone who works on climate change in my day job, that climate change is a looming significant civilization ending risk or worse (20%-ish) for humanity and worry humans alone might not be able to solve this thing. Lord help us if the siberian permafrost melts,we might be boned as a species.

So as a result, I just don’t know how to balance these two potential x risk dangers. No answers from me, alas, but I think we need to understand that for many, maybe most of us, we haven’t really planted our flag in any of these camps exclusively, we’re still information gathering.

Shayne O'Neill Jul 13, 2023, 12:31 PM
1 point
0
in reply to: Stephen Fowler’s comment on: Have you heard about MIT’s “liquid neural networks”? What do you think about them?
Definately. The lower the neuron vs ‘concepts’ ratio is, the more superposition required to represent everything. That said with the continuous function nature of LNNs these seem to be the wrong abstraction for language. Image models? Maybe. Audio models? Definately. Tokens and/or semantic data? That doesnt seeem practical.

Shayne O'Neill Jun 14, 2023, 5:43 AM
9 points
7
in reply to: Nathan Helm-Burger’s comment on: Critiques of prominent AI safety labs: Conjecture
You criticize Conjecture’s CEO for being… a charismatic leader good at selling himself and leading people? Because he’s not… a senior academic with a track record of published papers? Nonsense. Expecting the CEO to be the primary technical expert seems highly misguided to me.
Yeah this confiused me a little too. My current job (in soil science) has a non academic boss, and a team of us boffins, and he doesn’t need to be an academic, because its not his job, he just has to know where the money comes from, and how to stop the stakeholders from running away screaming when us soil nerds turn up to a meeting and start emitting maths and graphs out of our heads. Likewise the previous place I was at, I was the only non PhD haver on technical staff (being a ‘mere’ postgrad) and again our boss wasn’t academic at all. But he WAS a leader of men and herder of cats, and cat herding is probably a more important skill in that role than actually knowing what those cats are taking about.
And it all works fine. I dont need an academic boss, even if I think an academic boss would be nice. I need a boss who knows how to keep the payroll from derailing, and I suspect the vast majority of science workers feel the same way.

Shayne O'Neill Jun 4, 2023, 11:58 PM
2 points
0
in reply to: Nathan Young’s comment on: Things I Learned by Spending Five Thousand Hours In Non-EA Charities
“The Good Samaritans” (oft abrebiated to “Good Sammys”) is the name of a major local poverty charity here in australia run by the uniting church Generally well regarded and tend not to push religion too hard (compared to the salvation army). So yeah, it would appear to be a fairly recurring name.

Shayne O'Neill May 27, 2023, 1:26 AM
6 points
3
on: Seeking (Paid) Case Studies on Standards
My suspicion is the most instructive cases to look at (Modern AI really is too new a field to have much to go on in terms of mature safety standards) is how the regulation of Nuclear and Radiation safety has evolved over time. Early research suggested some serious X-Risks that didn’t pan out for either scientific (igniting the atmosphere) or logistical/political reasons (cobalt bombs, tsar bomba scale H bombs) thankfully, but some risks arising more out of the political domain (having big gnarly nuclear war anyway) still exist that could certainly make it a less fun planet to live on. I suspect the successes and failures of the nuclear treaty system could be instructive here with the push to integrate big AI into military heirachies, as regulating nukes is something almost everyone agrees is a very good idea, but have had a less than stellar history of compliance.

They are likely out of scope for whataever your goal is here, but I do think they need serious study because without it, our attempts at regulation will just push unsafe AI to less savory juristictions.

Shayne O'Neill May 17, 2023, 6:55 AM
3 points
0
in reply to: Dagon’s comment on: How I apply (so-called) Non-Violent Communication
The term gets its name from its historical association with the nonviolence movement (Think Ghandi and MLK.) The basic concept in THAT movement is that when opposing the state or whatever, you essentially say “We wont use violence on you, even if you go as far as to use violence on us, but in doing that you forfeit all moral justification for your violence” as a way to attempt to force the authoritarian entity targeted to empathise with the protestor and recognize the humanity.

So from that NVC attempts to do something similar with communications. Presumably in its roots in the 1960s non violence movement and rhetorical and communicative techniques used by black folk in the south to try and get government and civil officials to see black folks as equal humans.

How this translates into a modern context separated away from that specific historical setting is another matter, but within its origin, I dont think hyperbole is quite the right term, as at that point in history black folks where very much in danger of violence, particularly in the more regresive parts of the south. Again, outside of those contexts, its unclear as to how the term “violence” works here.

It should be noted that Marshall Rosenberg who originated the methodology was not a fan of the term as he disliked it being defined in the negative (ie “not violent”, negative) and prefered terms that defined it in the positive like “compassionate communication” (“is compassionate”, positive)

Shayne O'Neill May 16, 2023, 1:36 PM
0 points
2
in reply to: Noosphere89’s comment on: Rational retirement plans
I dont think he’s trying to say AI wont be impactful, obviously it will, just that trying to predict it isn’t an activity that one ought apply any surety to. Soothsaying isn’t a thing. Theres ALWAYS been an existential threat right around the corner, gods , devils, dynamite,machine guns, nukes, AGW (that one though might still end up being the one that does in fact do us in if the political winds dont change soon) and now AI. We think that AI might go foom, but there might be some limit we just wont know about till we hit it, and we have various estmations , all contracting, on how bad , or good, it might be for us. Attempting to fix those odds in firm conviction however is not science, its belief.

Shayne O'Neill May 15, 2023, 1:24 AM
1 point
0
in reply to: Raemon’s comment on: Dark Forest Theories
Yeah it happens largely in the first few chapters, its not really a spoiler. Its the event the book was famous for.