That’s certainly an interesting position in discussion about what people want!
Namely, that actions and preferences are just conditionally-activated and those context activations are balanced against each other. That means that person’s preference system may be not only incomplete but incoherent in architecture, and moral systems and goals obtained via reflection are almost certainly not total (will lack in some contexts), creating problem in RLHF.
The first assumption, that part of neurons is basically randomly initialized, can’t be tested really well because all humans are born in similar gravity field, see similarly-structured images in first days (all “colorful patches” correspond to objects which are continuous, mostly flat or uniformly round), etc and that leaves a generic imprint.
That’s certainly an interesting position in discussion about what people want!
Namely, that actions and preferences are just conditionally-activated and those context activations are balanced against each other. That means that person’s preference system may be not only incomplete but incoherent in architecture, and moral systems and goals obtained via reflection are almost certainly not total (will lack in some contexts), creating problem in RLHF.
The first assumption, that part of neurons is basically randomly initialized, can’t be tested really well because all humans are born in similar gravity field, see similarly-structured images in first days (all “colorful patches” correspond to objects which are continuous, mostly flat or uniformly round), etc and that leaves a generic imprint.