(I think humans are unaligned, but assume that objection has been brought up before. Though I can still imagine that 10 minute humans provide a better starting point than other competitive tools, and may be the least bad option)
Like, most humans when given massive power over the universe would probably accidentally destroy themselves, and possibly all of humanity with it (Eliezer talks a bit about this in HPMOR in sections I don’t want to reference because of spoilers, and Wei Dai has talked about this a bit in a bunch of comments related to “the human alignment problem”). I think that maybe I could avoid doing that, but only because I am really mindful of the risk, and I don’t think me from 5 years ago would have been safe to drastically scale up, even with respect to just my own values.
I share habryka’s concerns re: “unaligned with yourself”, but, I think I was missing (or had forgotten) that part of the idea here was you’re using.… an uploaded clone of yourself, so you’re at least more likely to be aligned with yourself even if when scaled up you’re not aligned with anyone else.
Not sure if you were just being poetic, but FWIW I believe the idea (in HCH for example), is to use an ML system trained to produce the same answers that a human would produce, which is not strictly-speaking an upload (unless the only way to imitate is actually to simulate in detail, s.t. the ML system ends up growing an upload inside it?).
If it’s “a human”, I’m back to “humans are unfriendly by default” territory.
[Edit: But, I had in fact also not been tracking that it’s not a strict upload, it’s a trained on human actions. I think I recall reading that earlier but had forgotten. I did leave the ”...” in my summary because I wasn’t quite sure if upload was the right word though. That all said, being merely trained on human actions, whether mine or someone else’s, I think makes it even more likely to be unfriendly than an upload]
Gotcha.
(I think humans are unaligned, but assume that objection has been brought up before. Though I can still imagine that 10 minute humans provide a better starting point than other competitive tools, and may be the least bad option)
Unaligned with each other? Or… would you not consider you to be aligned with yourself?
(Btw, see my edit at the bottom of my comment above if you hadn’t notice it.)
I think also unaligned with yourself?
Like, most humans when given massive power over the universe would probably accidentally destroy themselves, and possibly all of humanity with it (Eliezer talks a bit about this in HPMOR in sections I don’t want to reference because of spoilers, and Wei Dai has talked about this a bit in a bunch of comments related to “the human alignment problem”). I think that maybe I could avoid doing that, but only because I am really mindful of the risk, and I don’t think me from 5 years ago would have been safe to drastically scale up, even with respect to just my own values.
I share habryka’s concerns re: “unaligned with yourself”, but, I think I was missing (or had forgotten) that part of the idea here was you’re using.… an uploaded clone of yourself, so you’re at least more likely to be aligned with yourself even if when scaled up you’re not aligned with anyone else.
Not sure if you were just being poetic, but FWIW I believe the idea (in HCH for example), is to use an ML system trained to produce the same answers that a human would produce, which is not strictly-speaking an upload (unless the only way to imitate is actually to simulate in detail, s.t. the ML system ends up growing an upload inside it?).
Is it “a human” or “you specifically?”
If it’s “a human”, I’m back to “humans are unfriendly by default” territory.
[Edit: But, I had in fact also not been tracking that it’s not a strict upload, it’s a trained on human actions. I think I recall reading that earlier but had forgotten. I did leave the ”...” in my summary because I wasn’t quite sure if upload was the right word though. That all said, being merely trained on human actions, whether mine or someone else’s, I think makes it even more likely to be unfriendly than an upload]
To get sufficient training data, it must surely be “a human” (in generic, smushed together, ‘modelling an ensemble of humans’ sense)