RSS

Tomáš Gavenčiak

Karma: 157

A researcher in CS theory, AI safety and other stuff.

Mea­sur­ing Beliefs of Lan­guage Models Dur­ing Chain-of-Thought Reasoning

Apr 18, 2025, 10:56 PM
1 point
0 comments13 min readLW link

An­nounc­ing Hu­man-al­igned AI Sum­mer School

May 22, 2024, 8:55 AM
50 points
0 comments1 min readLW link
(humanaligned.ai)

In­terLab – a toolkit for ex­per­i­ments with multi-agent interactions

Jan 22, 2024, 6:23 PM
69 points
0 comments8 min readLW link
(acsresearch.org)

Spar­sity and in­ter­pretabil­ity?

Jun 1, 2020, 1:25 PM
41 points
3 comments7 min readLW link

How can In­ter­pretabil­ity help Align­ment?

May 23, 2020, 4:16 PM
37 points
3 comments9 min readLW link