Martín Soto

Karma: 1,370

Doing AI Safety research for ethical reasons.

Leave me anonymous feedback.

I operate by Crocker’s Rules.

Tell me about yourself: LLMs are aware of their learned behaviors

Martín Soto and Owain_Evans

Jan 22, 2025, 12:47 AM

129 points

5 comments6 min readLW link

Near- and medium-term AI Control Safety Cases

Martín SotoDec 23, 2024, 5:37 PM

9 points

0 comments6 min readLW link

The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it

Martín SotoAug 27, 2024, 11:10 PM

145 points

15 comments3 min readLW link

The need for multi-agent experiments

Martín SotoAug 1, 2024, 5:14 PM

43 points

3 comments9 min readLW link

OpenAI releases GPT-4o, natively interfacing with text, voice and vision

Martín SotoMay 13, 2024, 6:50 PM

54 points

23 comments1 min readLW link

(openai.com)

Conflict in Posthuman Literature

Martín SotoApr 6, 2024, 10:26 PM

40 points

1 comment2 min readLW link

(twitter.com)

Comparing Alignment to other AGI interventions: Extensions and analysis

Martín SotoMar 21, 2024, 5:30 PM

7 points

0 comments4 min readLW link

Comparing Alignment to other AGI interventions: Basic model

Martín SotoMar 20, 2024, 6:17 PM

12 points

4 comments7 min readLW link

How disagreements about Evidential Correlations could be settled

Martín SotoMar 11, 2024, 6:28 PM

11 points

3 comments4 min readLW link

Evidential Correlations are Subjective, and it might be a problem

Martín SotoMar 7, 2024, 6:37 PM

26 points

6 comments14 min readLW link

Why does generalization work?

Martín SotoFeb 20, 2024, 5:51 PM

43 points

16 comments4 min readLW link

Natural abstractions are observer-dependent: a conversation with John Wentworth

Martín SotoFeb 12, 2024, 5:28 PM

39 points

13 comments7 min readLW link

The lattice of partial updatelessness

Martín SotoFeb 10, 2024, 5:34 PM

23 points

5 comments5 min readLW link

Updatelessness doesn’t solve most problems

Martín SotoFeb 8, 2024, 5:30 PM

135 points

45 comments12 min readLW link

Sources of evidence in Alignment

Martín SotoJul 2, 2023, 8:38 PM

20 points

0 comments11 min readLW link

Quantitative cruxes in Alignment

Martín SotoJul 2, 2023, 8:38 PM

19 points

0 comments23 min readLW link

Why are counterfactuals elusive?

Martín SotoMar 3, 2023, 8:13 PM

14 points

6 comments2 min readLW link

Martín Soto’s Shortform

Martín Soto11 Feb 2023 23:38 UTC

3 points

46 comments1 min readLW link

The Alignment Problems

Martín Soto12 Jan 2023 22:29 UTC

20 points

0 comments4 min readLW link

Brute-forcing the universe: a non-standard shot at diamond alignment

Martín Soto22 Nov 2022 22:36 UTC

9 points

2 comments20 min readLW link