Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Bogdan Ionut Cirstea
Karma:
1,441
Automated / strongly-augmented safety research.
All
Posts
Comments
New
Top
Old
Page
1
A Little Depth Goes a Long Way: the Expressive Power of Log-Depth Transformers
Bogdan Ionut Cirstea
20 Nov 2024 11:48 UTC
15
points
0
comments
1
min read
LW
link
(openreview.net)
The Computational Complexity of Circuit Discovery for Inner Interpretability
Bogdan Ionut Cirstea
17 Oct 2024 13:18 UTC
11
points
2
comments
1
min read
LW
link
(arxiv.org)
Thinking LLMs: General Instruction Following with Thought Generation
Bogdan Ionut Cirstea
15 Oct 2024 9:21 UTC
7
points
0
comments
1
min read
LW
link
(arxiv.org)
Instruction Following without Instruction Tuning
Bogdan Ionut Cirstea
24 Sep 2024 13:49 UTC
17
points
0
comments
1
min read
LW
link
(arxiv.org)
Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea
20 Sep 2024 21:12 UTC
7
points
0
comments
1
min read
LW
link
(docs.google.com)
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Bogdan Ionut Cirstea
19 Sep 2024 16:13 UTC
21
points
1
comment
1
min read
LW
link
(arxiv.org)
AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space
Bogdan Ionut Cirstea
14 Sep 2024 23:23 UTC
17
points
1
comment
1
min read
LW
link
(arxiv.org)
Universal dimensions of visual representation
Bogdan Ionut Cirstea
28 Aug 2024 10:38 UTC
8
points
0
comments
1
min read
LW
link
(arxiv.org)
[Linkpost] Automated Design of Agentic Systems
Bogdan Ionut Cirstea
19 Aug 2024 23:06 UTC
8
points
1
comment
1
min read
LW
link
(arxiv.org)
[Linkpost] ‘The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery’
Bogdan Ionut Cirstea
15 Aug 2024 21:32 UTC
20
points
1
comment
1
min read
LW
link
(arxiv.org)
[Linkpost] Transcendence: Generative Models Can Outperform The Experts That Train Them
Bogdan Ionut Cirstea
18 Jun 2024 11:00 UTC
19
points
3
comments
1
min read
LW
link
(arxiv.org)
[Linkpost] The Expressive Capacity of State Space Models: A Formal Language Perspective
Bogdan Ionut Cirstea
28 May 2024 13:49 UTC
4
points
3
comments
1
min read
LW
link
(arxiv.org)
[Linkpost] Towards a Theoretical Understanding of the ‘Reversal Curse’ via Training Dynamics
Bogdan Ionut Cirstea
11 May 2024 22:59 UTC
6
points
0
comments
1
min read
LW
link
(arxiv.org)
[Linkpost] MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
Bogdan Ionut Cirstea
10 Mar 2024 1:30 UTC
10
points
0
comments
1
min read
LW
link
(openreview.net)
Inducing human-like biases in moral reasoning LMs
Artyom Karpov
,
Austin Meek
,
Bogdan Ionut Cirstea
and
SCho
20 Feb 2024 16:28 UTC
23
points
3
comments
14
min read
LW
link
AISC project: How promising is automating alignment research? (literature review)
Bogdan Ionut Cirstea
28 Nov 2023 14:47 UTC
4
points
1
comment
1
min read
LW
link
(docs.google.com)
[Linkpost] OpenAI’s Interim CEO’s views on AI x-risk
Bogdan Ionut Cirstea
20 Nov 2023 13:00 UTC
9
points
0
comments
1
min read
LW
link
[Linkpost] Concept Alignment as a Prerequisite for Value Alignment
Bogdan Ionut Cirstea
4 Nov 2023 17:34 UTC
27
points
0
comments
1
min read
LW
link
(arxiv.org)
[Linkpost] Generalization in diffusion models arises from geometry-adaptive harmonic representation
Bogdan Ionut Cirstea
11 Oct 2023 17:48 UTC
4
points
3
comments
1
min read
LW
link
[Linkpost] Large language models converge toward human-like concept organization
Bogdan Ionut Cirstea
2 Sep 2023 6:00 UTC
22
points
1
comment
1
min read
LW
link
Back to top
Next