Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
lukemarks
Karma:
447
All
Posts
Comments
New
Top
Old
lukemarks’s Shortform
lukemarks
2 Jul 2024 6:56 UTC
4
points
19
comments
1
min read
LW
link
Beta Tester Request: Rallypoint Bounties
lukemarks
25 May 2024 9:11 UTC
25
points
4
comments
1
min read
LW
link
[Question]
Shouldn’t we ‘Just’ Superimitate Low-Res Uploads?
lukemarks
3 Nov 2023 7:42 UTC
15
points
2
comments
2
min read
LW
link
Early Experiments in Reward Model Interpretation Using Sparse Autoencoders
lukemarks
,
Amirali Abdullah
,
Rauno Arike
,
Fazl
and
nothoughtsheadempty
3 Oct 2023 7:45 UTC
17
points
0
comments
5
min read
LW
link
The Löbian Obstacle, And Why You Should Care
lukemarks
7 Sep 2023 23:59 UTC
18
points
6
comments
2
min read
LW
link
[Question]
What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?
lukemarks
8 Jul 2023 11:42 UTC
84
points
28
comments
2
min read
LW
link
Direct Preference Optimization in One Minute
lukemarks
26 Jun 2023 11:52 UTC
22
points
3
comments
2
min read
LW
link
Partial Simulation Extrapolation: A Proposal for Building Safer Simulators
lukemarks
17 Jun 2023 13:55 UTC
16
points
0
comments
10
min read
LW
link
Higher Dimension Cartesian Objects and Aligning ‘Tiling Simulators’
lukemarks
11 Jun 2023 0:13 UTC
22
points
0
comments
5
min read
LW
link
The Security Mindset, S-Risk and Publishing Prosaic Alignment Research
lukemarks
22 Apr 2023 14:36 UTC
39
points
7
comments
5
min read
LW
link
Select Agent Specifications as Natural Abstractions
lukemarks
7 Apr 2023 23:16 UTC
19
points
3
comments
5
min read
LW
link
Back to top