Alvin Ånestrand

Karma: 62

Emergent Misalignment and Emergent Alignment

Alvin ÅnestrandApr 3, 2025, 8:04 AM

5 points

0 comments8 min readLW link

Alvin Ånestrand Apr 1, 2025, 3:04 PM
1 point
0
on: Amazing Breakthrough Day: April 1st
Happy Amazing Breakthrough Day!

Forecasting AI Futures Resource Hub

Alvin ÅnestrandMar 19, 2025, 5:26 PM

2 points

0 comments2 min readLW link

(forecastingaifutures.substack.com)

Musings on Scenario Forecasting and AI

Alvin ÅnestrandMar 6, 2025, 12:28 PM

10 points

0 comments11 min readLW link

(forecastingaifutures.substack.com)

Forecasting Uncontrolled Spread of AI

Alvin ÅnestrandFeb 22, 2025, 1:05 PM

2 points

0 comments10 min readLW link

(forecastingaifutures.substack.com)

Alvin Ånestrand Feb 13, 2025, 10:19 AM
1 point
0
in reply to: Archimedes’s comment on: Probability of AI-Caused Disaster
Good observation. The only questions that don’t explicitly exclude it in the resolution criteria are “Will there be a massive catastrophe caused by AI before 2030?” and “Will an AI related disaster kill a million people or cause $1T of damage before 2070?”, but I think the question creators mean a catastrophic event that is more directly caused by the AI, rather than just a reaction to AI being released.

Manifold questions are sometimes somewhat subjective in nature, which is a bit problematic.

Probability of AI-Caused Disaster

Alvin ÅnestrandFeb 12, 2025, 7:40 PM

2 points

2 comments10 min readLW link

(forecastingaifutures.substack.com)

Forecasting AGI: Insights from Prediction Markets and Metaculus

Alvin ÅnestrandFeb 4, 2025, 1:03 PM

13 points

0 comments4 min readLW link

(forecastingaifutures.substack.com)

Alvin Ånestrand Feb 1, 2025, 9:51 AM
1 point
0
in reply to: johnswentworth’s comment on: The Case Against AI Control Research
I think my points argue more that control research might have higher expected value than some other approaches, that don’t address delegation at all or are much less tractable. But I agree, if slop is the major problem, then most current control research doesn’t adress it, though it’s nice to see that this might change if Buck is right.

And my point about formal verification was to work around the slop problem by verifying the safety approach to a high degree of certainty. I don’t know if it’s feasible, though, but some seem to think so. Why do you think it’s a bad idea?

Alvin Ånestrand Jan 31, 2025, 11:51 AM
4 points
0
on: The Case Against AI Control Research
I can think of a few reasons someone might think AI Control research should receive very high priority, apart from what is mentioned in the post or in Buck’s comment:
- You hope/expect early transformative AI to be used for provable safety approaches, using formal verification methods.
- You think AI control research is more tractable than other research agendas, or will have useful results faster, before they are too late to apply.
- Our only chance of aligning a superintelligence is to delegate the problem to AIs, either because it is too hard for humans, or it will arrive sooner than the proper alignment techniques can feasibly be developed.
- You expect a significant fraction of total AI safety research over all time to be done by early transformative AI, so control research has high leverage value in improving the probability of successfully getting the AI to do valuable safety research, even if slop is quite likely.
I agree with basically everything in the post but put enough probability on these points to think that control research has really high expected value anyway.

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety—A Pilot Retrospective

Alvin Ånestrand, Jonas Hallgren and Utilop

Jan 10, 2025, 4:22 PM

21 points

0 comments4 min readLW link

Alvin Ånestrand Feb 20, 2024, 9:38 AM
1 point
0
on: ACI#6: A Non-Dualistic ACI Model
Interesting!
I thought of a couple of things that I was wondering if you have considered.
It seems to me like when examining mutual information between two objects, there might be a lot of mutual information that an agent cannot use. Like there is a lot of mutual information between my present self and me in 10 minutes, but most of that is in information about myself that I am not aware of, that I cannot use for decision making.
Also, if you examine an object that is fairly constant, would you not get high mutual information for the object at different times, even though it is not very agentic? Can you differentiate autonomy and a stable object?

Alvin Ånestrand Nov 28, 2023, 2:09 PM
22 points
15
on: Social Dark Matter
I think my default response when I learn about [trait X] is almost the opposite of how it is described in the post, at least if I learn that someone I know has it.

My mind reflexively tries to explain how [trait X] is not that bad, or good in the certain context. I have had to force myself to not automatically defend it in my head. I might signal (consciously or unconsciously) dislike for the trait in general, but not when I am confronted with someone I know having it. There are probably exceptions to this though, maybe for more extreme traits. I hope I wouldn’t automatically try do internally defend rape for example, even if it was reflexive and only for one or two seconds.

I just wanted to note that people like me exist too, and in certain cultures it might be fairly common (though I’m just speculating here).

Alvin Ånestrand Apr 18, 2023, 12:55 PM
1 point
0
in reply to: Jayson_Virissimo’s comment on: Efficient Learning: Memorization
My apologies, when I started on the post I searched for the word “memorization”, and there were not many results. I forgot to change the statement when I realised there were more posts than I first thought.
Although, I still think there is too little discussion about memorization, perhaps with the exception of spaced repetition.
Thank you for pointing out the error.

Efficient Learning: Memorization

Alvin ÅnestrandApr 16, 2023, 5:58 PM

4 points

2 comments5 min readLW link

(forum.effectivealtruism.org)

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer