JoNeedsSleep

Karma: 41

JoNeedsSleep Apr 17, 2025, 6:51 PM
4 points
0
in reply to: zroe1’s comment on: College Advice For People Like Me
totally had Henry’s voice playing while reading your comment

Undergrad AI Safety Conference

JoNeedsSleepFeb 19, 2025, 3:43 AM

18 points

JoNeedsSleepFeb 18, 2025, 7:19 PM

9 points

JoNeedsSleep Jan 27, 2025, 11:05 PM
2 points
1
on: JoNeedsSleep’s Shortform
My best attempt at attempting to characterize Kant’s Transcendental Idealism - Kant’s idealism says that essence—not existence—is dependent on us. That is to say, what it is to be is dependent on how we understand. For example, the schema of classification in biology, such as genetic proximity, depends on what purposes they serve to us. What it is for animals to be depends, in other words, on the biologist. To draw the biology analogy ad absurdum, transcendental idealism says something like “the genetic composition is the condition of the possibility of how we are able to make sense of biological objects in the first place”. The existence of these classification schema is dependent on our mind a priori.

JoNeedsSleep Oct 24, 2024, 4:50 AM
4 points
1
on: JoNeedsSleep’s Shortform
The distinction between inner and outer alignment is quite unnatural. For example, even the concept of reward hacking implies the double-fold failure of a reward that is not robust enough to exploitation, and a model that develops instrumental capabilities as to find a way to trick the reward; indeed, in the case of reward hacking, it’s worth noting that depending on the autonomy of the system in question, we could attribute the misalignment as inner or outer. At its core, this distinction comes out of the policy <-> reward scheme of RL, though prediction <-> loss function in SL can be similarly characterized; I doubt how well this framing generalizes to other engineering choices.

JoNeedsSleepOct 24, 2024, 4:50 AM

1 point

JoNeedsSleepJul 3, 2024, 7:54 PM

9 points