RHollerith comments on Jeremy Gillen’s Shortform

RHollerith 3 Sep 2024 18:49 UTC
2 points
0
By “non-world-destroying”, I assume you mean, “non-humanity ending”.

Well, yeah, if there were a way to keep AI models to roughly human capabilities that would be great because they would be unlikely to end humanity and because we could use them to do useful work with less expense (particularly, less energy expense and less CO2 emissions) than the expense of employing people.

But do you know of a safe way of making sure that, e.g., OpenAI’s next major training run will result in a model that is at most roughly human-level in every capability that can be used to end humanity or to put and to keep humanity in a situation that humanity would not want? I sure don’t—even if OpenAI were completely honest and cooperative with us.

The qualifier “safe” is present in the above paragraph / sentence because giving the model access to the internet (or to gullible people or to a compute farm where it can run any program it wants) then seeing what happens is only safe if we assume the thing to be proved, namely, that the model is not capable enough to impose its will on humanity.

But yeah, it is a source of hope (which I didn’t mention when I wrote, “what hope I have . . . comes mostly from the hope that someone will figure out how to make an ASI that genuinely wants the same things that we want”) that someone will develop a method to keep AI capabilities to roughly human level and all labs actually use the method and focus on making the human-level AIs more efficient in resource consumption even during a great-powers war or an arms race between great powers.

I’d be more hopeful if I had ever seen a paper or a blog post by a researcher trying to devise such a method.

For completeness’s sake, let’s also point out that we could ban large training runs now worldwide, then the labs could concentrate on running the models they have now more efficiently and that would be safe (not completely safe, but much much safer than any future timeline we can realistically hope for) and would allow us to derive some of the benefits of the technology.
- Milan W 3 Sep 2024 19:16 UTC
  1 point
  0
  Parent
  I do not know of such a way. I find it unlikely that OpenAI’s next training run wil result in a model that could end humanity, but I can provide no guarantees about that.
  
  You seem to be assuming that all models above a certain threshold of capabilites will either exercise strong optimization pressure on the world in pursuit of goals, or will be useless. Put another way, you seem to be conflating capabilities with actually exerted world-optimization pressures.
  
  While I agree that given a wide enough deployment it is likely that a given model will end up exercising its capabilities pretty much to their fullest extent, I hold that it is in principle possible to construct a mind that desires to help and is able to do so, yet also deliberately refrains from applying too much pressure.