RSS

Olli Järviniemi

Karma: 1,301

Schel­ling game eval­u­a­tions for AI control

Olli Järviniemi8 Oct 2024 12:01 UTC
65 points
4 comments11 min readLW link

Dist­in­guish worst-case anal­y­sis from in­stru­men­tal train­ing-gaming

5 Sep 2024 19:13 UTC
37 points
0 comments5 min readLW link

Un­trust­wor­thy mod­els: a frame for schem­ing evaluations

Olli Järviniemi19 Aug 2024 16:27 UTC
46 points
3 comments8 min readLW link

Near-mode think­ing on AI

Olli Järviniemi4 Aug 2024 20:47 UTC
127 points
8 comments5 min readLW link

An ex­per­i­ment on hid­den cognition

Olli Järviniemi22 Jul 2024 3:26 UTC
25 points
2 comments7 min readLW link

Brief notes on the Wikipe­dia game

Olli Järviniemi14 Jul 2024 2:28 UTC
67 points
9 comments4 min readLW link

Dialogue in­tro­duc­tion to Sin­gu­lar Learn­ing Theory

Olli Järviniemi8 Jul 2024 16:58 UTC
97 points
14 comments8 min readLW link

A civ­i­liza­tion ran by amateurs

Olli Järviniemi30 May 2024 17:57 UTC
61 points
7 comments6 min readLW link

Test­ing for par­allel rea­son­ing in LLMs

19 May 2024 15:28 UTC
3 points
7 comments9 min readLW link

Un­cov­er­ing De­cep­tive Ten­den­cies in Lan­guage Models: A Si­mu­lated Com­pany AI Assistant

6 May 2024 7:07 UTC
95 points
13 comments1 min readLW link
(arxiv.org)

On pre­cise out-of-con­text steering

Olli Järviniemi3 May 2024 9:41 UTC
9 points
6 comments3 min readLW link

In­stru­men­tal de­cep­tion and ma­nipu­la­tion in LLMs—a case study

Olli Järviniemi24 Feb 2024 2:07 UTC
39 points
13 comments12 min readLW link

Urg­ing an In­ter­na­tional AI Treaty: An Open Letter

Olli Järviniemi31 Oct 2023 11:26 UTC
48 points
2 comments1 min readLW link
(aitreaty.org)

Olli Järv­iniemi’s Shortform

Olli Järviniemi23 Mar 2023 10:59 UTC
3 points
22 comments1 min readLW link

Take­aways from cal­ibra­tion training

Olli Järviniemi29 Jan 2023 19:09 UTC
38 points
1 comment3 min readLW link