Satron

Karma: 232

Satron Apr 6, 2025, 7:12 AM
1 point
0
in reply to: jwfiredragon’s comment on: Untrusted monitoring insights from watching ChatGPT play coordination games
This discussion is very interesting to read and I look forward to hearing @Fabien Roger’s thoughts on your latest comment.

Satron Feb 4, 2025, 12:00 AM
1 point
0
in reply to: Steven Byrnes’s comment on: “Sharp Left Turn” discourse: An opinionated review
I am not from the US, so I don’t know anything about the organizations that you have listed. However, we can look at three main conventional sources of existential risk (excluding AI safety, for now, we will come back to it later):
- Nuclear Warfare—Cooperation strategies + decision theory are active academic fields.
- Climate Change—This is a very hot topic right now, and a lot of research is being put into it.
- Pandemics—There was quite a bit of research before COVID, but even more now.
As to your point about Hassabis not doing other projects for reducing existential risk besides AI alignment:
Most of the people on this website (at least, that’s my impression) have rather short timelines. This means that work on a lot of existential risks becomes much less critical. For example, while supervolcanoes are scary, they are unlikely to wipe us out before we get an AI powerful enough to be able to solve this problem more efficiently than we can.
As to your point about Sam Altman choosing to cure cancer over reducing x-risk:
I don’t think Sam looks forward to the prospect of dying due to some x-risk, does he? After all, he is just as human as we are, and he, too, would hate the metaphorical eruption of a supervolcano. But that’s going to become much more critical after we have an aligned and powerful AI on our hands. This also applies to LeCun/Zuckerberg/random OpenAI researchers. They (or at least most of them) seem to be enjoying their lives and wouldn’t want to randomly lose them.
As to your last paragraph:
I think your formulation of AI response (“preventing existential risks is very hard and fraught, but hey, what if I do a global mass persuasion campaign”) really undersells it. It almost makes it sound like a mass persuasion campaign is a placebo to make AI’s owner feel good.
Consider another response “the best option for preventing existential risks, would be doing a mass persuasion campaign of such and such content, that would have such and such effects, and would prevent such and such risks”.
My (naive) reaction (assuming that I think that AI is properly aligned and is not hallucinating) would be to ask more questions about the details of this proposal and, if its answers make sense, to proceed with doing it. Is it that crazy? I think doing a persuasion campaign is in itself not necessarily bad (we deal with them every day in all sorts of contexts, ranging from your relatives convincing you to take a day off to people online making you change your mind on some topic to politicians running election campaigns), and is especially not bad when it is the best option for preventing x-risk.
In the scenario that you described, I am not sure at all that an average person, who would be able to get their hands on AI (this is an important point, because I believe these people to be generally smarter/more creative/more resourceful) would opt for a second option, especially after being told: “Well I could try something much more low-key and norm-following, but it probably won’t work”.
Option 1: Good odds of preventing x-risk, but apparently also doesn’t work in movies.
Option 2: Bad odds of preventing x-risk.
Perhaps, the crux is that I think that people who actually have good chances of controlling powerful AI (Sam Altman, Dario Amodei etc) would see that option #1 is obviously better if they themselves want to survive (something that is instrumental to whatever other goals they have like curing cancer).
Furthermore, there is a more sophisticated solution to this problem. I can ask this powerful AI to find the best representations of human preferences and then find the best way to align itself to it. Then, this AI would no longer be constrained by my cognitive limitations.
An average person probably wouldn’t come up with it by themselves, but they don’t need to. Presumably, if this is indeed a good solution, it would be suggested at the first stage where I ask AI to find best solutions for preventing existential risk. Would an average person take this option? I expect our intuitions to diverge here much strongly than in the previous case. Looking at myself, I was an average person just a few month ago. I have no expertise in AI whatsoever. I don’t consider myself to be among the smartest humans on Earth. I haven’t read much literature on AI safety. I was basically an exemplary “normie”.
And yet, when I read about the idea of an instruction-following AI, I almost immediately came up with this solution of making an AI aligned with human preferences using the aforementioned instruction-following AI. It sidesteps a lot of misuse concerns, which (at the time) seemed scary.
I grant that an average person might not come up with this solution by themselves, but presumably (if this solution is any good), it would be suggested by AI itself when we ask it to come up with solutions for x-risk.
An average person could ask this AI how to ensure that this endeavor won’t end up in a scenario like the one in the movie. Then it can cross-check its answer with a security team which has successfully aligned it.
My conclusion is basically that I would expect a good chunk of average people to opt for a sophisticated solution after AI suggesting it, an ever bigger chunk of average people to opt for a naive solution, and an even bigger chunk of people, who would most likely control powerful AI in the future (generally smarter than an average person), to opt for either solution.

Satron Feb 3, 2025, 7:08 PM
1 point
0
AF
in reply to: Steven Byrnes’s comment on: “Sharp Left Turn” discourse: An opinionated review
Your idea of “using instruction following AIs to implement a campaign of persuasion” relies (I claim) on the assumption that the people using the instruction-following AIs to persuade others are especially wise and foresighted people, and are thus using their AI powers to spread those habits of wisdom and foresight.
It’s fine to talk about that scenario, and I hope it comes to pass! But in addition to the question of what those wise people should do, if they exist, we should also be concerned about the possibility that the people with instruction-following AIs will not be spreading wisdom and foresight in the first place.
I don’t think that whoever is using these AI powers (let’s call him Alex) needs to be that wise (beyond the wiseness of an average person who could get their hands on a powerful AI, which is probably higher-than-average).
Alex doesn’t need to come up with @Noosphere89′s proposed solution of persuasion campaigns all by himself. Alex merely needs to ask his AI what are the best solutions for preventing existential risks. If Noosphere’s proposal is indeed wise, then AI would suggest it. Alex could then implement this solution.
Alex doesn’t necessarily need to want to spread wisdom and foresight in this scheme. He merely needs to want to prevent existential risks.

Satron Feb 2, 2025, 11:57 AM
4 points
3
in reply to: Kajus’s comment on: “Sharp Left Turn” discourse: An opinionated review
Not OP, but for me, it comes down to LLMs correctly interpreting the intent behind my questions/requests. In other words, I don’t need to be hyper specific in my prompts in order to get the results I want.

Satron Feb 1, 2025, 11:28 PM
3 points
0
on: Intent alignment as a stepping-stone to value alignment
By value alignment, which one of the following do you mean?
1) alignment with humanity’s CEV
2) alignment with some other specified way of interpreting humanity’s values
3) alignment with some unspecified way of interpreting humanity’s values that future AGI/ASI will hopefully come up with

Satron Jan 30, 2025, 9:58 PM
1 point
0
on: AI Control: Improving Safety Despite Intentional Subversion
This article seems to have found a big weakness of untrusted monitoring:
Synthetic inputs auditing is trivially defeated if the checker AI can distinguish synthetic inputs from real ones, as it could then always answer truthfully on the synthetic inputs to pass auditing, then collude with the generator on real inputs.
To create useful synthetic inputs, either the blue team must be able to match the generator’s style well enough to consistently fool the checker, or it must be able to redact the generator’s output to the point where the checker has no identifying details to distinguish inputs. The former seems impractical, since faking the inputs at that level of verisimilitude would require the blue team to have significant understanding of the AI’s inner workings, at which point there are probably more directly usable alignment strategies. The latter seems difficult, as by the protocol’s definition, the AI is working on problems/solutions that humans cannot easily understand, but effective redaction requires humans to understand the content of the solution to some extent.
Overall, I think this represents a significant flaw in the untrusted monitoring protocol.
Do you think that this weakness can be somehow circumvented? If not, do you think it is as fatal (to untrusted monitoring protocol) as the author of the article seems to think?

Satron Jan 30, 2025, 6:49 PM
1 point
1
on: Should you publish solutions to corrigibility?
Maybe it is a controversial take, but I am in favor of publishing all solutions to corrigibility.
If a company fails at coming up with solutions for AGI corrigibility, they wouldn’t stop building AGI. Instead, they would proceed with ramping up capabilities and end up with a misaligned AGI that (for all we know) will want to turn universe into paperclips.
Due to instrumental convergence, AGI, whose goals are not explicitly aligned with human goals, is going to engage in a very undesirable behavior by default.

Satron Jan 30, 2025, 11:33 AM
1 point
0
in reply to: Richard_Kennaway’s comment on: [Link] A community alert about Ziz
I think that, perhaps, after reading this clarifying comment from you, @quila would change his perception of your position.

Satron Jan 30, 2025, 11:23 AM
1 point
0
in reply to: Richard_Kennaway’s comment on: [Link] A community alert about Ziz
Let me put my attitudes in practical terms: I don’t kick dogs, but I have destroyed a wasp’s nest in my garage, and I don’t donate to animal charities. (I don’t donate to many other charities either, but there have been a few.) Let those who think saving the shrimps is worthwhile do so, but I do not travel on that path
This is what I expected. Your take when put in these terms seems pretty moderate. Whereas, when I read your original comment, this take (which presumably stayed the same) seemed very extreme.
In other words, my personal beliefs haven’t changed a single bit and yet my perception of your beliefs changed a lot. I can only imagine that your original comment has been so strongly disagree-voted because of the framing.

Satron Jan 29, 2025, 1:09 PM
1 point
0
in reply to: quila’s comment on: [Link] A community alert about Ziz
Agreed!
There is also the issue of clarity, I am not sure if Richard has a moderate position that sounds like a very extreme position due to the framing or if he genuinely shares this extreme position.
I do think that animals, the larger ones at least, can suffer. But I don’t much care.
Does this mean a more moderate take of “I don’t care enough to take any actions, because I don’t believe that I am personally causing much suffering to animals” or a very radical take of “I would rather take $10 than substantially improve the well-being of all animals”?

Satron Jan 29, 2025, 11:30 AM
1 point
0
in reply to: Richard_Kennaway’s comment on: [Link] A community alert about Ziz
I think there might be some double standard going on here.
You seem to not care much about animal well-being, and @quila evidently does, so would it not be fair from Quila’s perspective to not care much about your well-being? And if Quila doesn’t care about your well-being, then he might think that had you not existed, the world (in a utilitarian sense) would be better.
Quila can similarly say “I don’t care for lives of people who are actively destroying something that I value a lot
Bring on the death threats!”

How different LLMs answered PhilPapers 2020 survey

SatronJan 27, 2025, 9:41 PM

15 points

1 comment14 min readLW link

Satron Jan 22, 2025, 10:38 PM
1 point
0
in reply to: quila’s comment on: Why it’s so hard to talk about Consciousness
I will also reply to the edit here.
would you say that “we have qualia” is just a strong intuition imbued in us by the mathematical process of evolution, and your “realist” position is that you choose to accept {referring to it as if it is real} as a good way of communicating about that imbued intuition/process?
I wouldn’t say that about qualia and I wouldn’t say that about free will. What you described sounds like being only a nominal realist.
This is going into details, but I think there is a difference between semantic realism and nominal realism. Semantic realist would say that based on a reasonable definition of a term, the term is real. Most if not all realists are semantic realists. Almost nobody would an apple realist if the reasonable definition of apple would be something like “fruit that turns you into the unicorn”.
Whereas nominal realist is someone who would say that based on a reasonable definition of a term, the term probably isn’t real, but it’s a useful communication tool, so they are going to keep using it.
Using an analogy (specifically for free will), I would say that free will is as real as consent, whereas I would imagine that a nominal realist would say that free will is as real as karma.

Satron Jan 22, 2025, 10:26 PM
1 point
0
in reply to: quila’s comment on: Why it’s so hard to talk about Consciousness
By the way, I will quickly reply to your edit in this comment. Determinism certainly seems compatible with qualia being fundamental.
“Hard Determinism” is just a shorthand for a position (in a free will debate) that accepts the following claims:
- Determinism is true.
- Determinism is incompatible with free will
I am pretty sure that the term has no use outside of discussions about free will.

Satron Jan 22, 2025, 10:23 PM
1 point
0
in reply to: quila’s comment on: Why it’s so hard to talk about Consciousness
I will try reply to your edit:
Also, another thing I’m interested in is how someone could have helped past you (and camp #1 people generally) understand what camp #2 even means. There may be a formal-alignment failure mode not noticeable to people in camp #1 where a purely mathematical (it may help to read that as ‘3rd-person’)
I don’t remember the exact specifics, but I came across Mary’s Room thought experience (perhaps through this video). When presented in that way and when directly asked “does she learn anything new?” my surprising (to myself at the time) answer was an emphatic “yes”.

Satron Jan 22, 2025, 10:09 PM
1 point
0
in reply to: quila’s comment on: Why it’s so hard to talk about Consciousness
what is qualia, in your ontology
It the direct phenomenal experience of stuff like pain, pleasure, colors, etc.
what does it mean for something to be irreducible or metaphysically fundamental
Something is irreducible in a sense that it can’t be reduced to interactions between atoms. It can’t also be completely completely described from a 3rd person perspective (the perspective from which science usually operates).

Satron Jan 22, 2025, 9:54 PM
2 points
0
in reply to: quila’s comment on: Why it’s so hard to talk about Consciousness
I understand the ‘compatibilist’ view on free will to be something like this: determinism is true, but we still run some algorithm which deterministically picks between options, so the algorithm that we are is still ‘making a choice’, which I choose to conceptualize as “free will and determinism are compatible”
I would rather conceptualize it as acting according to my value system without excessive external coercion. But I would also say that I agree with all four claims that you made there, namely:
- Determinism is true
- We run some algorithm which deterministically picks between options
- We are making a choice
- Free will and determinism are compatible
Anti-realists about free will” and I have no actual disagreement about the underlying reality of determinism being true and us being predictable.
Yeah, I would agree that we are predictable in principle.
If so, is your view on qualia similar, in that you’ve mostly re-conceptualized it instead of having “noticed a metaphysically fundamental thing”?
I used to think that qualia was a reducible and non-fundamental phenomenon, whereas now I think it is an irreducible and fundamental phenomenon. So I did “notice a metaphysically fundamental thing”.

Satron Jan 22, 2025, 9:28 PM
1 point
0
in reply to: quila’s comment on: Why it’s so hard to talk about Consciousness
one who believes that those in camp #2 merely have a strong intuition that they have <mysterious-word>
Under this definition I think I would still qualify as a former illusionist about qualia.

Satron Jan 22, 2025, 9:22 PM
2 points
0
in reply to: quila’s comment on: Why it’s so hard to talk about Consciousness
Ah, I see. I usually think of illusionism as a position where one thinks that a certain phenomenon isn’t real but only appears real.
Under the terminology of your most recent comment, I would be a determinist and a realist about free will.

Satron Jan 22, 2025, 9:09 PM
1 point
0
in reply to: quila’s comment on: Why it’s so hard to talk about Consciousness
I used to be a hard determinist (I think that’s what people usually mean by “illusionism”), but currently I am a compatibilist.
So yeah, right now I am not an illusionist about free will either.

Satron

How differ­ent LLMs an­swered PhilPapers 2020 survey

How different LLMs answered PhilPapers 2020 survey