Alex Lawsen comments on Prize for Alignment Research Tasks

Alex Lawsen 12 May 2022 21:11 UTC
9 points
Make it as easy as possible to generate alignment forum posts and comments.

The rough idea here is that it’s much easier to explain an idea out loud, especially to someone who occasionally asks for clarification or for you to repeat an idea, than it is to write a clear, concise post on it. Most of the design of this would be small bits of frontend engineering, but language model capability would be useful, and several of the capabilities are things that Ought is already working on. Ideally, interacting with the tool looks like:

Researcher talks through the thing they’re thinking about. Model transcribes ideas[1], suggests splits into paragraphs[2], suggests section headings [3], generates a high level summary/abstract [4]. If researcher says “[name of model] I’m stuck”, the response is “What are you stuck on?”, and simple replies/suggestions are generated by something like this[5].

Once the researcher has talked through the ideas, they are presented with a piece which contains an abstract at the top, then a series of headed sections, each with paragraphs which rather than containing what they said at that point verbatim, contain clear and concise summaries[6] of what was actually said. Clicking on any generated heading allows the user to select from a list of generated alternatives, or write their own[7], while clicking on any paragraph allows the user to see and select from a list of other generated summaries, the verbatim transcription, and to write their own version of this paragraph.

1 can probably be achieved by just buying an off the shelf transcription bot (though you could train one if you wanted), with the most important criterion being speed. 2-4 can have data trivially generated by scraping the entire alignment forum and removing headings/summaries/abstracts/paragraph breaks. 5 I’ve generated data for below. An MVP for generating data for 6 is using the transcription software from 1 to autotranscribe AXRP and then comparing to the human-edited summary, though I think suggesting clear rephrasings (which I’ll call 6.5) might require a seperate task. 7 is just frontend design, which I suspect is doable in-house by Ought.