I share habryka’s concerns re: “unaligned with yourself”, but, I think I was missing (or had forgotten) that part of the idea here was you’re using.… an uploaded clone of yourself, so you’re at least more likely to be aligned with yourself even if when scaled up you’re not aligned with anyone else.
Not sure if you were just being poetic, but FWIW I believe the idea (in HCH for example), is to use an ML system trained to produce the same answers that a human would produce, which is not strictly-speaking an upload (unless the only way to imitate is actually to simulate in detail, s.t. the ML system ends up growing an upload inside it?).
If it’s “a human”, I’m back to “humans are unfriendly by default” territory.
[Edit: But, I had in fact also not been tracking that it’s not a strict upload, it’s a trained on human actions. I think I recall reading that earlier but had forgotten. I did leave the ”...” in my summary because I wasn’t quite sure if upload was the right word though. That all said, being merely trained on human actions, whether mine or someone else’s, I think makes it even more likely to be unfriendly than an upload]
I share habryka’s concerns re: “unaligned with yourself”, but, I think I was missing (or had forgotten) that part of the idea here was you’re using.… an uploaded clone of yourself, so you’re at least more likely to be aligned with yourself even if when scaled up you’re not aligned with anyone else.
Not sure if you were just being poetic, but FWIW I believe the idea (in HCH for example), is to use an ML system trained to produce the same answers that a human would produce, which is not strictly-speaking an upload (unless the only way to imitate is actually to simulate in detail, s.t. the ML system ends up growing an upload inside it?).
Is it “a human” or “you specifically?”
If it’s “a human”, I’m back to “humans are unfriendly by default” territory.
[Edit: But, I had in fact also not been tracking that it’s not a strict upload, it’s a trained on human actions. I think I recall reading that earlier but had forgotten. I did leave the ”...” in my summary because I wasn’t quite sure if upload was the right word though. That all said, being merely trained on human actions, whether mine or someone else’s, I think makes it even more likely to be unfriendly than an upload]
To get sufficient training data, it must surely be “a human” (in generic, smushed together, ‘modelling an ensemble of humans’ sense)