RSS

Olli Järviniemi

Karma: 1,687

Schel­ling game eval­u­a­tions for AI control

Olli JärviniemiOct 8, 2024, 12:01 PM
67 points
5 comments11 min readLW link

Dist­in­guish worst-case anal­y­sis from in­stru­men­tal train­ing-gaming

Sep 5, 2024, 7:13 PM
37 points
0 comments5 min readLW link

Trust­wor­thy and un­trust­wor­thy models

Olli JärviniemiAug 19, 2024, 4:27 PM
46 points
3 comments8 min readLW link

Near-mode think­ing on AI

Olli JärviniemiAug 4, 2024, 8:47 PM
128 points
9 comments5 min readLW link

An ex­per­i­ment on hid­den cognition

Olli JärviniemiJul 22, 2024, 3:26 AM
25 points
2 comments7 min readLW link

Brief notes on the Wikipe­dia game

Olli JärviniemiJul 14, 2024, 2:28 AM
68 points
9 comments4 min readLW link

Dialogue in­tro­duc­tion to Sin­gu­lar Learn­ing Theory

Olli JärviniemiJul 8, 2024, 4:58 PM
100 points
15 comments8 min readLW link

A civ­i­liza­tion ran by amateurs

Olli JärviniemiMay 30, 2024, 5:57 PM
61 points
7 comments6 min readLW link

Test­ing for par­allel rea­son­ing in LLMs

May 19, 2024, 3:28 PM
9 points
7 comments9 min readLW link

Un­cov­er­ing De­cep­tive Ten­den­cies in Lan­guage Models: A Si­mu­lated Com­pany AI Assistant

May 6, 2024, 7:07 AM
95 points
13 comments1 min readLW link
(arxiv.org)

On pre­cise out-of-con­text steering

Olli JärviniemiMay 3, 2024, 9:41 AM
9 points
6 comments3 min readLW link

In­stru­men­tal de­cep­tion and ma­nipu­la­tion in LLMs—a case study

Olli JärviniemiFeb 24, 2024, 2:07 AM
39 points
13 comments12 min readLW link

Urg­ing an In­ter­na­tional AI Treaty: An Open Letter

Olli JärviniemiOct 31, 2023, 11:26 AM
48 points
2 comments1 min readLW link
(aitreaty.org)

Olli Järv­iniemi’s Shortform

Olli JärviniemiMar 23, 2023, 10:59 AM
3 points
26 comments1 min readLW link

Take­aways from cal­ibra­tion training

Olli JärviniemiJan 29, 2023, 7:09 PM
43 points
2 comments3 min readLW link1 review