Orual comments on How Emergency Medicine Solves the Alignment Problem

Orual 26 Dec 2023 17:19 UTC
14 points
3
This is definitely a spot where the comparison breaks down a bit. However, it does still hold in the human context, somewhat, and maybe that generalizes.

I worked as a lifeguard for a number of years (even lower on the totem pole than EMTs, more limited scope of practice). I am, to put it bluntly, pretty damn smart, and could easily find optimizations, areas where I could exceed scope of practice with positive outcomes if I had access to the tools even just by reading the manuals for EMTs or paramedics or nurses. I, for example, learned how to intubate someone, and to do an emergency tracheotomy from a friend who had more training. I’m also only really inclined to follow rules to the extent they make sense to me and the enforcing authorities in question can impose meaningful consequences on me. But I essentially never went outside SOP, certainly not at work. Why?

Well one reason was legal risk, as mentioned. If something went wrong at work and someone is (further) injured under my care, legal protection was entirely dependent on my operating within that defined scope of practice. For a smart young adult without a lot of money for lawyers that’s fairly good incentive to not push boundaries too much, especially given the gravity of emergency situations and the consequences for guessing wrong even if you are smart and confident in your ability to out-do SOP.
Second, the limits were soft enforced by access to equipment and medicine. The tools I had access to at my workplace were the ones I could officially use, and I did not have easy access to any tools or medicines which would have been outside SOP to administer (or advise someone to administer). This was deliberate.
Third, emergency situations effectively sharply limit your context window and ability to deviate. Someone is dying in front of you, large chunks of you are likely trying to panic, especially if you haven’t been put in this sort of situation before, and you need to act immediately, calmly, and correctly. What comes to mind most readily? Well, the series of if-then statements that got drilled into you during training. It’s been most of a decade since my last recert, and I can still basically autopilot my way through a medical emergency based on that training. Saved a friend and coworker’s life when he had a heart attack and I was the only one in the immediate area.

So how do we apply that? Well, I think the first two have obvious analogies in terms of doing things that actually impose hardware or software limits on behaviour. Obviously for a smart enough system the ability to enforce such restrictions is limited and even existing LLMs can be pushed outside of training parameters by clever prompting, but it’s clear that such means can alter model behaviour to a point. Modifying the training dataset is perhaps another analogous option, and arguably a more powerful one if it can be done well, because the pathways developed at that stage will always have an impact on the outputs, no matter the restrictions or RHLF or other similar means of guiding a mostly-trained model. Not giving it tools that let it easily go outside the set scope will again work up to a point. The third one I think might be most useful. Outside of the hardest of hard takeoff scenarios, it will be difficult for any intelligence to do a great deal of damage if only given a short lifetime in which to do it, while it also is being asked to do the thing it was very carefully trained for. LLMs already effectively work this way, but this suggests that as things advance we should be more and more wary of allowing long-running potential agents with anything like run-to-run memory. This obviously greatly restricts what can be done with artificial intelligence (and has somewhat horrifying moral implications if we do instantiate sapient intelligences in that manner), but absent solving the alignment problem more completely would go a long way toward reducing the scope of possible negative outcomes.