Walter Laurito

Karma: 111

Walter Laurito Apr 16, 2025, 8:39 AM
1 point
0
on: Try training token-level probes
I’m not aware of previous papers doing this but surely someone tried this before, I would welcome comments pointing to existing literature on this!
...
Context: I want a probe to tell me where in an LLM response the model might be lying, so that I can e.g. ask follow-up questions. Such “right there” probes^[1] would be awesome to assist LLM-based monitor
If I remember correctly, they are doing something like that in this paper here:
3.3 EXACT ANSWER TOKENS
Existing methods often overlook a critical nuance: the token selection for error detection, typically
focusing on the last generated token or taking a mean. However, since LLMs typically generate long-
form responses, this practice may miss crucial details (Brunner et al., 2020). Other approaches use
the last token of the prompt (Slobodkin et al., 2023, inter alia), but this is inherently inaccurate due to
LLMs’ unidirectional nature, failing to account for the generated response and missing cases where
different sampled answers from the same model vary in correctness. We investigate a previously
unexamined token location: the exact answer tokens, which represent the most meaningful parts
of the generated response. We define exact answer tokens as those whose modification alters the
answer’s correctness, disregarding subsequent generated content.

Finding the estimate of the value of a state in RL agents

Clément Dumas, Walter Laurito , KlaRo and Kaarel

Jun 3, 2024, 8:26 PM

8 points

4 comments4 min readLW link

Walter Laurito Sep 25, 2023, 10:11 AM
10 points
7
on: There should be more AI safety orgs
and Kaarel’s work on DLK
@Kaarel is the research lead at Cadenza Labs (previously called NotodAI), our research group which started during the first part of SERI MATS 3.0 (There will be more information about Cadenza Labs hopefully soon!)
Our team members broadly agree with the post!
Currently, we are looking for further funding to continue to work on our research agenda. Interested funders (or potential collaborators) can reach out to us at info@cadenzalabs.org.

Searching for a model’s concepts by their shape – a theoretical framework

Kaarel, gekaklam, Walter Laurito , Kay Kozaronek, AlexMennen and June Ku

Feb 23, 2023, 8:14 PM

51 points

0 comments19 min readLW link

[RFC] Possible ways to expand on “Discovering Latent Knowledge in Language Models Without Supervision”.

gekaklam, Walter Laurito , Kaarel and Kay Kozaronek

Jan 25, 2023, 7:03 PM

48 points

6 comments12 min readLW link

Walter Laurito Dec 16, 2021, 10:15 AM
1 point
in reply to: Alex_Altair’s comment on: AI Safety Needs Great Engineers
Should work again :)

Walter Laurito Dec 7, 2021, 10:26 AM
2 points
in reply to: Walter Laurito ’s comment on: AI Safety Needs Great Engineers
I’ve created a discord for the people interested in organizing / collaborating / self-study: https://discord.gg/Ckj4BKUChr People could start with the brief curriculum published in this document, until a full curriculum might be available :)

Walter Laurito Nov 25, 2021, 8:53 AM
5 points
in reply to: Walter Laurito ’s comment on: AI Safety Needs Great Engineers
Maybe, we could also send out an invitation to all the people who got rejected to join a Slack channel. (I could set that up, if necessary. Since I don’t have the emails, though, someone would need to send the invitations). There, based on the curriculum, people could form self-study groups on their own with others close-by (or remotely) and talk about difficulties, bugs, etc. Maybe, even the people who got not rejected could join the slack and help to answer questions (if they like and have time, of course)?

Walter Laurito Nov 24, 2021, 7:14 PM
3 points
in reply to: Randomized, Controlled’s comment on: AI Safety Needs Great Engineers
Same here (Not sure yet if I get accepted to AISC though). But I would be happy with helping or co-organizing something like Richard_Ngo suggested. (Although I’ve never organized something like that before) Maybe a virtual version in (Continental?) Europe, if there are enough people