RSS

Bogdan Ionut Cirstea

Karma: 1,441

Automated /​ strongly-augmented safety research.

A Lit­tle Depth Goes a Long Way: the Ex­pres­sive Power of Log-Depth Transformers

Bogdan Ionut Cirstea20 Nov 2024 11:48 UTC
15 points
0 comments1 min readLW link
(openreview.net)

The Com­pu­ta­tional Com­plex­ity of Cir­cuit Dis­cov­ery for In­ner Interpretability

Bogdan Ionut Cirstea17 Oct 2024 13:18 UTC
11 points
2 comments1 min readLW link
(arxiv.org)

Think­ing LLMs: Gen­eral In­struc­tion Fol­low­ing with Thought Generation

Bogdan Ionut Cirstea15 Oct 2024 9:21 UTC
7 points
0 comments1 min readLW link
(arxiv.org)

In­struc­tion Fol­low­ing with­out In­struc­tion Tuning

Bogdan Ionut Cirstea24 Sep 2024 13:49 UTC
17 points
0 comments1 min readLW link
(arxiv.org)

Val­i­dat­ing /​ find­ing al­ign­ment-rele­vant con­cepts us­ing neu­ral data

Bogdan Ionut Cirstea20 Sep 2024 21:12 UTC
7 points
0 comments1 min readLW link
(docs.google.com)

To CoT or not to CoT? Chain-of-thought helps mainly on math and sym­bolic reasoning

Bogdan Ionut Cirstea19 Sep 2024 16:13 UTC
21 points
1 comment1 min readLW link
(arxiv.org)

AlignedCut: Vi­sual Con­cepts Dis­cov­ery on Brain-Guided Univer­sal Fea­ture Space

Bogdan Ionut Cirstea14 Sep 2024 23:23 UTC
17 points
1 comment1 min readLW link
(arxiv.org)

Univer­sal di­men­sions of vi­sual representation

Bogdan Ionut Cirstea28 Aug 2024 10:38 UTC
8 points
0 comments1 min readLW link
(arxiv.org)

[Linkpost] Au­to­mated De­sign of Agen­tic Systems

Bogdan Ionut Cirstea19 Aug 2024 23:06 UTC
8 points
1 comment1 min readLW link
(arxiv.org)

[Linkpost] ‘The AI Scien­tist: Towards Fully Au­to­mated Open-Ended Scien­tific Dis­cov­ery’

Bogdan Ionut Cirstea15 Aug 2024 21:32 UTC
20 points
1 comment1 min readLW link
(arxiv.org)

[Linkpost] Tran­scen­dence: Gen­er­a­tive Models Can Out­perform The Ex­perts That Train Them

Bogdan Ionut Cirstea18 Jun 2024 11:00 UTC
19 points
3 comments1 min readLW link
(arxiv.org)

[Linkpost] The Ex­pres­sive Ca­pac­ity of State Space Models: A For­mal Lan­guage Perspective

Bogdan Ionut Cirstea28 May 2024 13:49 UTC
4 points
3 comments1 min readLW link
(arxiv.org)

[Linkpost] Towards a The­o­ret­i­cal Un­der­stand­ing of the ‘Rev­er­sal Curse’ via Train­ing Dynamics

Bogdan Ionut Cirstea11 May 2024 22:59 UTC
6 points
0 comments1 min readLW link
(arxiv.org)

[Linkpost] MindEye2: Shared-Sub­ject Models En­able fMRI-To-Image With 1 Hour of Data

Bogdan Ionut Cirstea10 Mar 2024 1:30 UTC
10 points
0 comments1 min readLW link
(openreview.net)

In­duc­ing hu­man-like bi­ases in moral rea­son­ing LMs

20 Feb 2024 16:28 UTC
23 points
3 comments14 min readLW link

AISC pro­ject: How promis­ing is au­tomat­ing al­ign­ment re­search? (liter­a­ture re­view)

Bogdan Ionut Cirstea28 Nov 2023 14:47 UTC
4 points
1 comment1 min readLW link
(docs.google.com)

[Linkpost] OpenAI’s In­terim CEO’s views on AI x-risk

Bogdan Ionut Cirstea20 Nov 2023 13:00 UTC
9 points
0 comments1 min readLW link

[Linkpost] Con­cept Align­ment as a Pr­ereq­ui­site for Value Alignment

Bogdan Ionut Cirstea4 Nov 2023 17:34 UTC
27 points
0 comments1 min readLW link
(arxiv.org)

[Linkpost] Gen­er­al­iza­tion in diffu­sion mod­els arises from ge­om­e­try-adap­tive har­monic representation

Bogdan Ionut Cirstea11 Oct 2023 17:48 UTC
4 points
3 comments1 min readLW link

[Linkpost] Large lan­guage mod­els con­verge to­ward hu­man-like con­cept organization

Bogdan Ionut Cirstea2 Sep 2023 6:00 UTC
22 points
1 comment1 min readLW link