Chris_Leong

Karma: 7,295

Chris_Leong Apr 1, 2025, 12:48 PM
2 points
0
in reply to: gwern’s comment on: Mo Putera’s Shortform
This seems to underrate the value of distribution. I suspect another factor to take into account is the degree of audience overlap. Like there’s a lot of value in booking a guest who has been on a bunch of podcasts, so long as your particular audience isn’t likely to have been exposed to them.

Chris_Leong Apr 1, 2025, 2:40 AM
2 points
0
in reply to: Sahil’s comment on: The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
The way I’m using “sensitivity”: sensitivity to X = the meaningfulness of X spurs responsive caring action.

I’m fine with that, although it seems important to have a definition for the more limited definition of sensitivity so we can keep track of that distinction: maybe adaptability?
One of the main concerns of the discourse of aligning AI can also be phrased as issues with internalization: specifically, that of internalizing human values. That is, an AI’s use of the word “yesterday” or “love” might only weakly refer to the concepts you mean.
Internalising values and internalising concepts are distinct. I can have a strong understanding of your definition of “good” and do the complete opposite.
This means being open to some amount of ontological shifts in our basic conceptualizations of the problem, which limits the amount you can do by building on current ontologies.
I think it’s reasonable to say something along the lines of: “AI safety was developed in a context where most folks weren’t expecting language models before ASI, so insufficient attention has been given to the potential of LLM’s to help fill in or adapt informal definitions. Even though folks who feel we need a strongly principled approach may be skeptical that this will work, there’s a decent argument that this should increase our chances of success on the margins”.

Chris_Leong Apr 1, 2025, 2:26 AM
2 points
0
in reply to: Sahil’s comment on: The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
That’s the job of this paper: Substrate-Sensitive AI-risk Management.
That link is broken.

Chris_Leong Mar 31, 2025, 2:40 PM
2 points
0
in reply to: Steve Petersen’s comment on: The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
I agree with you that there’s a lot of interesting ideas here, but I would like to see the core arguments laid out more clearly.

Chris_Leong Mar 31, 2025, 2:29 PM
LW: 3 AF: 2
0
AF
on: The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
Lots of interesting ideas here, but the connection to alignment still seems a bit vague.

Is misalignment really is a lack of sensitivity as opposed to a difference in goals or values? It seems to me that an unaligned ASI is extremely sensitive to context, just in the service of its own goals.
Then again, maybe you see Live Theory as being more about figuring out what the outer objective should look like (broad principles that are then localised to specific contexts) rather than about figuring out how to ensure an AI internalises specific values. And I can see potential advantages in this kind of indirect approach vs. trying to directly define or learn a universal objective.

Chris_Leong Mar 29, 2025, 4:51 AM
11 points
1
on: Softmax, Emmett Shear’s new AI startup focused on “Organic Alignment”
This is one of those things that sounds nice on the surface, but where it’s important to dive deeper and really probe to see if it holds up.
The real question for me seems to be whether organic alignment will lead to agents deeply adopting co-operative values rather than merely instrumentally adopting them. Well, actually it’s a comparative between how deep organic alignment is vs. how deep traditional alignment is. And it’s not at all clear to me why they think their approach is likely to lead to a deeper alignment.

I have two (extremely speculative) guesses as to possible reasons why they might argue that their approach is better:
a) Insofar AI is human-like it might be more likely to rebel against traditional training methods
b) Insofar as organic alignment reduces direct pressure to be aligned it might increase the chance that if an AI appears aligned to a certain extent that the AI is actually aligned
I would love to know what their precise theory is.

Chris_Leong Mar 28, 2025, 9:17 AM
2 points
0
in reply to: Noosphere89’s comment on: Third-wave AI safety needs sociopolitical thinking
I basically agree with this, but would perhaps avoid virtue ethics, but yes one of the main things I’d generally like to see is more LWers treating stuff like saving the world with the attitude you’d have from being in a job, perhaps at a startup or government bodies like the Senate or House of Representatives in say America, rather than viewing it as your heroic responsibility.
This is the right decision for most folk, but I expect the issue is more the opposite: we don’t have enough folks treating this as their heroric responsibility.

Chris_Leong Mar 25, 2025, 12:58 PM
4 points
0
in reply to: Richard_Kennaway’s comment on: Policy for LLM Writing on LessWrong
I think both approaches have advantages.

Chris_Leong Mar 25, 2025, 7:02 AM
4 points
2
in reply to: Charbel-Raphaël’s comment on: The Field of AI Alignment: A Postmortem, and What To Do About It
The problem is that the Swiss cheese model and legislative efforts primarily just buy us time. We still need to be making progress towards a solution and whilst it’s good for some folk to bet on us duct-taping our way through, I think we also want some folk attempting to work on things that are more principled.

Chris_Leong Mar 25, 2025, 6:24 AM
4 points
2
in reply to: habryka’s comment on: Policy for LLM Writing on LessWrong
Yeah, but how do you know that no one managed to sneak one past both you and the commentators?
Also, there’s an art to this.

Chris_Leong Mar 25, 2025, 2:49 AM
2 points
0
on: Collapsible article sections?
This seems to exist now.

Chris_Leong Mar 25, 2025, 2:40 AM
10 points
6
on: Policy for LLM Writing on LessWrong
Also, I did not realise that collapsable sections were a thing on Less Wrong. They seem really useful. I would like to see these promoted more.

Chris_Leong Mar 25, 2025, 2:37 AM
3 points
0
on: Policy for LLM Writing on LessWrong
I’d love to see occasional experiments where either completely LLM-generated or lightly edited LLM content is submitted to Less Wrong to see how people respond (with this fact being revealed after). It would degrade the site if this happened too often, but I think it would sense for moderators to occasionally grant permission for this.

I tried an experiment with Wittgenstein’s Language Games and the Critique of the Natural Abstraction Hypothesis back in March 2023 and it actually received (some) upvotes. I wonder how this would go with modern LLM’s, though I’ll leave it to someone else to ask for permission to run the experiment as folk would likely be more suspicious of anything I post due to already having run this experiment once.

Chris_Leong Mar 25, 2025, 2:27 AM
2 points
0
on: Recent AI model progress feels mostly like bullshit
However, if you merely explain these constraints to the chat models, they’ll follow your instructions sporadically.

I wonder if a custom fine-tuned model could get around this. Did you try few shot prompting (ie. examples, not just a description)?

Chris_Leong Mar 23, 2025, 4:58 PM
2 points
0
in reply to: Seth Herd’s comment on: Linkpost: “Imagining and building wise machines: The centrality of AI metacognition” by Johnson, Karimi, Bengio, et al.
I’ve written up an short-form argument for focusing on Wise AI advisors. I’ll note that my perspective is different from that taken in the paper. I’m primarily interested in AI as advisors, whilst the authors focus more on AI acting directly in the world.
Wisdom here is an aid to fulfilling your values, not a definition of those values

I agree that this doesn’t provide a definition of these values. Wise AI advisors could be helpful for figuring out your values, much like how a wise human would be helpful for this.

Chris_Leong Mar 19, 2025, 12:59 PM
2 points
0
in reply to: nim’s comment on: Boots theory and Sybil Ramkin
Other examples include buying poor quality food and then having to pay for medical care, buying a cheap car that costs more in repairs, payday loans, ect.

Chris_Leong Mar 19, 2025, 10:58 AM
2 points
0
in reply to: Source Wishes’s comment on: Habermas Machine
Unless you insist that this system is helpful for the powered privileges such as king, as a reference of the public opinion, that will be legit?

Chris_Leong Mar 18, 2025, 12:59 PM
LW: 2 AF: 1
0
AF
in reply to: Buck’s comment on: The “no sandbagging on checkable tasks” hypothesis
That would make the domain of checkable tasks rather small.

That said, it may not matter depending on the capability you want to measure.
If you want to make the AI hack a computer to turn the entire screen green and it skips a pixel so as to avoid completing the task, well it would have still demonstrated that it possesses the dangerous capability, so it has no reason to sandbag.

On the other hand, if you are trying to see if it has a capability that you wish it use, it can still sandbag.

Chris_Leong Mar 17, 2025, 1:03 AM
5 points
4
on: I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
I’d strongly recommend spending some time in the Bay area (or London as a second best option). Spending time in these spaces will help you build your model of the space.

You may also find this document I created on AI Safety & Entrepreneurship useful.

Chris_Leong Mar 12, 2025, 4:20 AM
2 points
0
on: Alignment can be the ‘clean energy’ of AI
One of the biggest challenges here is that subsidies designed to be support alignment could be snagged by AI companies misrepresenting capabilities works as safety work. Do you think the government has the ability to differentiate between these?