Kerrigan

Karma: 49

Kerrigan Apr 14, 2025, 6:26 AM
1 point
0
on: Coherent decisions imply consistent utilities
Since humans are not EU maximizers and are exploitable, can someone give an example of how they are exploitable?

Kerrigan Apr 14, 2025, 2:22 AM
3 points
0
on: What do coherence arguments actually prove about agentic behavior?
Is exploitability necessarily unstable? Could there be a tolerable level of exploitability, especially if it allows for tradeoffs with desirable characteristics that are only available to non-EU maximizers?”

Kerrigan Apr 8, 2025, 2:05 PM
1 point
0
on: Clarifying Power-Seeking and Instrumental Convergence
Why is this not true for most humans? Many religious people would not want to modify the lightcone as they think that it’s God’s territory to modify.

Kerrigan Apr 8, 2025, 7:12 AM
1 point
0
in reply to: Quintin Pope’s comment on: why assume AGIs will optimize for fixed goals?
The initial distribution of values need not be highly related to the resultant values after moral philosophy and philosophical self-reflection. Optimizing hedonistic utilitariansm, for example, looks very little like any values from the outer optimization loop of natural selection.

Kerrigan Apr 8, 2025, 5:56 AM
1 point
0
on: Coherent decisions imply consistent utilities
Although there would be pressure for an AI to not be exploitable, wouldn’t there also be pressure for adaptability and dynamism? The ability to alter preferences and goals given new environments?

Kerrigan Mar 27, 2025, 2:02 AM
1 point
0
on: Humans aren’t agents—what then for value learning?
Why can’t the true values live at the level of anatomy and chemistry?

Kerrigan Mar 17, 2025, 4:49 AM
1 point
0
on: The Anthropic Trilemma
Would this be solved if cresting a copy is creating someone functionally the same as you but who is someone else’s identity, and not you?

Kerrigan Jan 31, 2025, 7:08 AM
1 point
0
on: Stupid Questions—April 2023
Is there a page similar to this, but for alignment solutions?

Kerrigan Nov 4, 2024, 6:56 AM
1 point
0
on: The Assassination of Trump’s Ear is Evidence for Time-Travel
What about from a quantum immortality perspective?

Kerrigan Sep 25, 2024, 8:54 AM
1 point
−2
in reply to: Kerrigan’s comment on: Understanding and avoiding value drift
Could there not be AI value drift in our favor, from a paperclipper AI to a moral realist AI?

Kerrigan Sep 25, 2024, 6:59 AM
1 point
0
in reply to: Seth Herd’s comment on: The alignment stability problem
Both quotes are from your above post. Apologies for confusion.

Kerrigan Sep 19, 2024, 9:02 AM
3 points
0
on: The alignment stability problem
“A sufficiently intelligent agent will try to prevent its goals^[1] from changing, at least if it is consequentialist.”
It seems that in humans, smarter people are more able and likely to change their goals. A smart person may change his/her views about how the universe can best be arranged upon reading Nick Bostrom’s book Deep Utopia, for example.

’I think humans are stable, multi-objective systems, at least in the short term. Our goals and beliefs change, but we preserve our important values over most of those changes. Even when gaining or losing religion, most people seem to maintain their goal of helping other people (if they have such a goal); they just change their beliefs about how to best do that.“

A human may change from wanting to help people to not wanting to help people if he/she got 5 hours of sleep instead of 8.

Kerrigan Sep 19, 2024, 8:22 AM
12 points
0
on: Understanding and avoiding value drift
How do humans, for example, read a philosophy book and update their views about what they value about the world?

Kerrigan Sep 19, 2024, 6:18 AM
LW: 1 AF: 1
0
AF
on: Decision theory does not imply that we get to have nice things
“Similarly, it’s possible for LDT agents to acquiesce to your threats if you’re stupid enough to carry them out even though they won’t work. In particular, the AI will do this if nothing else the AI could ever plausibly meet would thereby be incentivized to lobotomize themselves and cover the traces in order to exploit the AI.
But in real life, other trading partners would lobotomize themselves and hide the traces if it lets them take a bunch of the AI’s lunch money. And so in real life, the LDT agent does not give you any lunch money, for all that you claim to be insensitive to the fact that your threats don’t work.”

Can someone please why trading partners would lobotomize themselves?

Kerrigan Dec 30, 2023, 6:52 AM
1 point
0
on: Stupid Questions—April 2023
How does inner misalignment lead to paperclips? I understand the comparison of paperclips to ice cream, and that after some threshold of intelligence is reached, then new possibilities can be created that satisfy desires better than anything in the training distribution, but humans want to eat ice cream, not spread the galaxies with it. So why would the AI spread the galaxies with paperclips, instead of create them and
”consume“ them? Please correct any misunderstandings of mine,

Kerrigan Dec 27, 2023, 1:28 AM
1 point
0
in reply to: ChristianKl’s comment on: Stupid Questions—April 2023
And a subset might value drift towards optimizing the internal experiences of all conscious minds?

Kerrigan Dec 26, 2023, 10:30 PM
1 point
0
on: Stupid Questions—April 2023
If an AGI achieves consciousness, why would its values not drift towards optimizing its own internal experience, and away from tiling the lightcone with something?

Kerrigan Dec 17, 2023, 11:03 PM
4 points
0
on: AGI Safety FAQ / all-dumb-questions-allowed thread
How can utility be a function of worlds, if an agent doesn‘t have access to the state of the world, but only the sense data?

Kerrigan Dec 17, 2023, 11:01 PM
1 point
0
on: AGI Safety FAQ / all-dumb-questions-allowed thread
How can utility be a function of worlds, if an agent doesn‘t have access to the state of the world, but only the sense data?

Kerrigan Dec 17, 2023, 11:00 PM
7 points
on: An Orthodox Case Against Utility Functions
How can utility be a function of worlds, if the agent doesn’t have access to the state of the world, but only the sense data?

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer